MolecularDiffusion.runmodes.data.preparation¶

Data preparation module for MolCraft. Handles conversion of molecular data to ASE databases, annotation, and mol block generation.

Attributes¶

`align_atoms`(ase_atoms, rdkit_mol)	Aligns RDKit molecule atoms to match ASE atoms order based on coordinates.
`annotate_db`(db_path, tag, value[, verbose])	Annotates all rows in an ASE database with one or multiple tags.
`calculate_props`(mol)
`calculate_sa_score`(m)	Calculates the Synthetic Accessibility Score (SA Score).
`compile_to_asedb`(input_source, db_path[, input_type, ...])	Compiles molecular data into an ASE database.
`convert_props_to_float`(data)	Tries to convert values in data dict to float appropriately and normalizes keys.
`generate_mol_blocks_and_sdf`(input_source[, ...])	Generates Mol Blocks, SMILES, and properties (SA, SC).
`get_asset_path`(→ pathlib.Path)	Resolves the path to an asset file.
`get_scscore_model`()	Initializes and returns the SCScorer model if available.
`iter_files`(root, pattern, recursive)
`read_fragment_scores`([name])
`smilify_openbabel`(filename[, total_charge, timeout])
`smilify_structure`(filename[, total_charge, timeout, ...])
`smilify_xyz2mol`(filename[, total_charge, timeout])

MolecularDiffusion.runmodes.data.preparation.align_atoms(ase_atoms, rdkit_mol)¶: Aligns RDKit molecule atoms to match ASE atoms order based on coordinates. Tries to add Hs if atom counts differ.

MolecularDiffusion.runmodes.data.preparation.annotate_db(db_path: pathlib.Path, tag: str | List[str], value: Any | List[Any], verbose: bool = False)¶: Annotates all rows in an ASE database with one or multiple tags.

MolecularDiffusion.runmodes.data.preparation.calculate_sa_score(m)¶: Calculates the Synthetic Accessibility Score (SA Score).

MolecularDiffusion.runmodes.data.preparation.compile_to_asedb(input_source: str, db_path: pathlib.Path, input_type: str = None, natoms_file: pathlib.Path = None, csv_file: pathlib.Path = None, sdf_path: pathlib.Path = None, fraction: float = 1.0, seed: int = None, relative_to: pathlib.Path = None, pattern: str = '*.xyz', recursive: bool = False)¶: Compiles molecular data into an ASE database.

MolecularDiffusion.runmodes.data.preparation.convert_props_to_float(data)¶: Tries to convert values in data dict to float appropriately and normalizes keys.

MolecularDiffusion.runmodes.data.preparation.generate_mol_blocks_and_sdf(input_source: str, output_sdf: pathlib.Path = None, natoms_file: pathlib.Path = None, csv_file: pathlib.Path = None, fraction: float = 1.0, timeout: int = 60, smilify_method: str = 'hybrid', n_jobs: int = 1, indices: str = None)¶: Generates Mol Blocks, SMILES, and properties (SA, SC). - If input is DB: Updates DB in-place. - If input is XYZ/NPY: Generates SDF file.

MolecularDiffusion.runmodes.data.preparation.get_asset_path(filename: str) → pathlib.Path¶: Resolves the path to an asset file. Favors package-relative path, but falls back to local source path if not found.

MolecularDiffusion.runmodes.data.preparation.get_scscore_model()¶: Initializes and returns the SCScorer model if available.

MolecularDiffusion.runmodes.data.preparation.iter_files(root: pathlib.Path, pattern: str, recursive: bool)¶

MolecularDiffusion.runmodes.data.preparation.read_fragment_scores(name='fpscores')¶

MolecularDiffusion.runmodes.data.preparation.smilify_openbabel(filename, total_charge=0, timeout=30)¶

MolecularDiffusion.runmodes.data.preparation.smilify_structure(filename, total_charge=0, timeout=30, method='hybrid')¶

MolecularDiffusion.runmodes.data.preparation.smilify_xyz2mol(filename, total_charge=0, timeout=30)¶