MolecularDiffusion.runmodes.data.featurization

Featurization module for MolCraft. Handles generating vectorial representations (Morgan fingerprints, SOAP) from 3D molecular data.

Attributes

Functions

featurize(input_source, output_dir[, method, format, ...])

Main featurization driver.

generate_morgan(→ Tuple[numpy.ndarray, List[str]])

Generates Morgan fingerprints for a list of entries.

generate_soap(→ Tuple[numpy.ndarray, List[str]])

Generates SOAP descriptors.

save_features(features, ids, output_dir[, format])

Saves features and indices.

Module Contents

MolecularDiffusion.runmodes.data.featurization.featurize(input_source: str, output_dir: str, method: str = 'morgan', format: str = 'npy', readout: str = 'mean', smilify_method: str = 'hybrid', radius: int = 2, nbits: int = 2048, rcut: float = 5.0, nmax: int = 8, lmax: int = 6)

Main featurization driver.

MolecularDiffusion.runmodes.data.featurization.generate_morgan(entries: List[Dict], radius: int = 2, nbits: int = 2048, smilify_method: str = 'hybrid', n_jobs: int = 1) Tuple[numpy.ndarray, List[str]]

Generates Morgan fingerprints for a list of entries. Entries is a list of dicts with ‘file’ (path) or ‘data’ (atoms/coords) and ‘id’.

MolecularDiffusion.runmodes.data.featurization.generate_soap(entries: List[Dict], rcut: float = 5.0, nmax: int = 8, lmax: int = 6, readout: str = 'mean', species: List[str] = None) Tuple[numpy.ndarray, List[str]]

Generates SOAP descriptors.

MolecularDiffusion.runmodes.data.featurization.save_features(features: numpy.ndarray, ids: List[str], output_dir: pathlib.Path, format: str = 'npy')

Saves features and indices.

MolecularDiffusion.runmodes.data.featurization.Chem = None
MolecularDiffusion.runmodes.data.featurization.logger
MolecularDiffusion.runmodes.data.featurization.save_safetensors = None