MolecularDiffusion.runmodes.analyze.featurize

Molecular featurization backends for 3D XYZ structures.

Backends

soap SOAP descriptor via dscribe — CPU, no external model needed. uma UMA backbone embeddings — requires vendored fairchem/src + checkpoint.

Attributes

Classes

BaseFeaturizer

Helper class that provides a standard way to create an ABC using

SOAPFeaturizer

Helper class that provides a standard way to create an ABC using

SSL3DFeaturizer

Helper class that provides a standard way to create an ABC using

UMAFeaturizer

Helper class that provides a standard way to create an ABC using

Functions

run_featurize(→ numpy.ndarray)

Discover XYZ files, featurize, save outputs, return (N, D) array.

Module Contents

class MolecularDiffusion.runmodes.analyze.featurize.BaseFeaturizer

Bases: abc.ABC

Helper class that provides a standard way to create an ABC using inheritance.

abstractmethod featurize(atoms_list: list[ase.Atoms]) numpy.ndarray

Return float32 array of shape (N, D).

class MolecularDiffusion.runmodes.analyze.featurize.SOAPFeaturizer(species: list[str], r_cut: float = 6.0, n_max: int = 8, l_max: int = 6, sigma: float = 0.1, pooling: str = 'mean', n_jobs: int = 1)

Bases: BaseFeaturizer

Helper class that provides a standard way to create an ABC using inheritance.

featurize(atoms_list: list[ase.Atoms]) numpy.ndarray

Return float32 array of shape (N, D).

desc
n_jobs = 1
pooling = 'mean'
class MolecularDiffusion.runmodes.analyze.featurize.SSL3DFeaturizer(checkpoint: str | pathlib.Path, device: str | None = None, batch_size: int = 16, pooling: str = 'mean', edge_radius: float = 5.0, atom_vocab: list[str] | None = None)

Bases: BaseFeaturizer

Helper class that provides a standard way to create an ABC using inheritance.

featurize(atoms_list: list[ase.Atoms]) numpy.ndarray

Return float32 array of shape (N, D).

atom_vocab
batch_size = 16
device = 'cuda'
edge_radius = 5.0
pooling = 'mean'
class MolecularDiffusion.runmodes.analyze.featurize.UMAFeaturizer(checkpoint: str | pathlib.Path, task_name: str = 'omol', device: str | None = None, batch_size: int = 8, pooling: str = 'mean', scalar_only: bool = True, charge: int = 0, spin: int = 1)

Bases: BaseFeaturizer

Helper class that provides a standard way to create an ABC using inheritance.

featurize(atoms_list: list[ase.Atoms]) numpy.ndarray

Return float32 array of shape (N, D).

kwargs
MolecularDiffusion.runmodes.analyze.featurize.run_featurize(input_dir: str | pathlib.Path, backend: str = 'soap', output_path: str | pathlib.Path | None = None, extensions: collections.abc.Sequence[str] = DEFAULT_STRUCTURE_EXTENSIONS, recursive: bool = False, species: list[str] | None = None, autodetect_species: bool = False, r_cut: float = 6.0, n_max: int = 8, l_max: int = 6, sigma: float = 0.1, pooling: str = 'mean', n_jobs: int = 1, checkpoint: str | pathlib.Path = 'training_outputs/uma-s-1p2.pt', task_name: str = 'omol', device: str | None = None, batch_size: int = 8, scalar_only: bool = True, charge: int = 0, spin: int = 1, ssl3d_checkpoint: str | pathlib.Path | None = None, ssl3d_edge_radius: float = 5.0, ssl3d_atom_vocab: list[str] | None = None) numpy.ndarray

Discover XYZ files, featurize, save outputs, return (N, D) array.

MolecularDiffusion.runmodes.analyze.featurize.DEFAULT_SOAP_SPECIES = ['H', 'B', 'C', 'N', 'O', 'F', 'Al', 'Si', 'P', 'S', 'Cl', 'As', 'Se', 'Br', 'I', 'Hg', 'Bi']