MolecularDiffusion.runmodes.data.preparation

Data preparation module for MolCraft. Handles conversion of molecular data to ASE databases, annotation, and mol block generation.

Attributes

Functions

align_atoms(ase_atoms, rdkit_mol)

Aligns RDKit molecule atoms to match ASE atoms order based on coordinates.

annotate_db(db_path, tag, value[, verbose])

Annotates all rows in an ASE database with one or multiple tags.

calculate_props(mol)

calculate_sa_score(m)

Calculates the Synthetic Accessibility Score (SA Score).

compile_to_asedb(input_source, db_path[, input_type, ...])

Compiles molecular data into an ASE database.

convert_props_to_float(data)

Tries to convert values in data dict to float appropriately and normalizes keys.

generate_mol_blocks_and_sdf(input_source[, ...])

Generates Mol Blocks, SMILES, and properties (SA, SC).

get_asset_path(→ pathlib.Path)

Resolves the path to an asset file.

get_scscore_model()

Initializes and returns the SCScorer model if available.

iter_files(root, pattern, recursive)

read_fragment_scores([name])

smilify_cell2mol(filename[, total_charge, timeout])

smilify_openbabel(filename[, total_charge, timeout])

smilify_structure(filename[, total_charge, timeout, ...])

Module Contents

MolecularDiffusion.runmodes.data.preparation.align_atoms(ase_atoms, rdkit_mol)

Aligns RDKit molecule atoms to match ASE atoms order based on coordinates. Tries to add Hs if atom counts differ.

MolecularDiffusion.runmodes.data.preparation.annotate_db(db_path: pathlib.Path, tag: str | List[str], value: Any | List[Any], verbose: bool = False)

Annotates all rows in an ASE database with one or multiple tags.

MolecularDiffusion.runmodes.data.preparation.calculate_props(mol)
MolecularDiffusion.runmodes.data.preparation.calculate_sa_score(m)

Calculates the Synthetic Accessibility Score (SA Score).

MolecularDiffusion.runmodes.data.preparation.compile_to_asedb(input_source: str, db_path: pathlib.Path, input_type: str = None, natoms_file: pathlib.Path = None, csv_file: pathlib.Path = None, sdf_path: pathlib.Path = None, fraction: float = 1.0, seed: int = None, relative_to: pathlib.Path = None, pattern: str = '*.xyz', recursive: bool = False)

Compiles molecular data into an ASE database.

MolecularDiffusion.runmodes.data.preparation.convert_props_to_float(data)

Tries to convert values in data dict to float appropriately and normalizes keys.

MolecularDiffusion.runmodes.data.preparation.generate_mol_blocks_and_sdf(input_source: str, output_sdf: pathlib.Path = None, natoms_file: pathlib.Path = None, csv_file: pathlib.Path = None, fraction: float = 1.0, timeout: int = 60, smilify_method: str = 'hybrid', n_jobs: int = 1, indices: str = None)

Generates Mol Blocks, SMILES, and properties (SA, SC). - If input is DB: Updates DB in-place. - If input is XYZ/NPY: Generates SDF file.

MolecularDiffusion.runmodes.data.preparation.get_asset_path(filename: str) pathlib.Path

Resolves the path to an asset file. Favors package-relative path, but falls back to local source path if not found.

MolecularDiffusion.runmodes.data.preparation.get_scscore_model()

Initializes and returns the SCScorer model if available.

MolecularDiffusion.runmodes.data.preparation.iter_files(root: pathlib.Path, pattern: str, recursive: bool)
MolecularDiffusion.runmodes.data.preparation.read_fragment_scores(name='fpscores')
MolecularDiffusion.runmodes.data.preparation.smilify_cell2mol(filename, total_charge=0, timeout=30)
MolecularDiffusion.runmodes.data.preparation.smilify_openbabel(filename, total_charge=0, timeout=30)
MolecularDiffusion.runmodes.data.preparation.smilify_structure(filename, total_charge=0, timeout=30, method='hybrid')
MolecularDiffusion.runmodes.data.preparation.Chem = None
MolecularDiffusion.runmodes.data.preparation.Molecule = None
MolecularDiffusion.runmodes.data.preparation.logger
MolecularDiffusion.runmodes.data.preparation.xyz2mol = None