MolecularDiffusion.runmodes.data.preparation¶
Data preparation module for MolCraft. Handles conversion of molecular data to ASE databases, annotation, and mol block generation.
Attributes¶
Functions¶
|
Aligns RDKit molecule atoms to match ASE atoms order based on coordinates. |
|
Annotates all rows in an ASE database with one or multiple tags. |
|
|
Calculates the Synthetic Accessibility Score (SA Score). |
|
|
Compiles molecular data into an ASE database. |
|
Tries to convert values in data dict to float appropriately and normalizes keys. |
|
Generates Mol Blocks, SMILES, and properties (SA, SC). |
|
Resolves the path to an asset file. |
Initializes and returns the SCScorer model if available. |
|
|
|
|
|
|
|
|
|
|
Module Contents¶
- MolecularDiffusion.runmodes.data.preparation.align_atoms(ase_atoms, rdkit_mol)¶
Aligns RDKit molecule atoms to match ASE atoms order based on coordinates. Tries to add Hs if atom counts differ.
- MolecularDiffusion.runmodes.data.preparation.annotate_db(db_path: pathlib.Path, tag: str | List[str], value: Any | List[Any], verbose: bool = False)¶
Annotates all rows in an ASE database with one or multiple tags.
- MolecularDiffusion.runmodes.data.preparation.calculate_props(mol)¶
- MolecularDiffusion.runmodes.data.preparation.calculate_sa_score(m)¶
Calculates the Synthetic Accessibility Score (SA Score).
- MolecularDiffusion.runmodes.data.preparation.compile_to_asedb(input_source: str, db_path: pathlib.Path, input_type: str = None, natoms_file: pathlib.Path = None, csv_file: pathlib.Path = None, sdf_path: pathlib.Path = None, fraction: float = 1.0, seed: int = None, relative_to: pathlib.Path = None, pattern: str = '*.xyz', recursive: bool = False)¶
Compiles molecular data into an ASE database.
- MolecularDiffusion.runmodes.data.preparation.convert_props_to_float(data)¶
Tries to convert values in data dict to float appropriately and normalizes keys.
- MolecularDiffusion.runmodes.data.preparation.generate_mol_blocks_and_sdf(input_source: str, output_sdf: pathlib.Path = None, natoms_file: pathlib.Path = None, csv_file: pathlib.Path = None, fraction: float = 1.0, timeout: int = 60, smilify_method: str = 'hybrid', n_jobs: int = 1, indices: str = None)¶
Generates Mol Blocks, SMILES, and properties (SA, SC). - If input is DB: Updates DB in-place. - If input is XYZ/NPY: Generates SDF file.
- MolecularDiffusion.runmodes.data.preparation.get_asset_path(filename: str) pathlib.Path¶
Resolves the path to an asset file. Favors package-relative path, but falls back to local source path if not found.
- MolecularDiffusion.runmodes.data.preparation.get_scscore_model()¶
Initializes and returns the SCScorer model if available.
- MolecularDiffusion.runmodes.data.preparation.iter_files(root: pathlib.Path, pattern: str, recursive: bool)¶
- MolecularDiffusion.runmodes.data.preparation.read_fragment_scores(name='fpscores')¶
- MolecularDiffusion.runmodes.data.preparation.smilify_cell2mol(filename, total_charge=0, timeout=30)¶
- MolecularDiffusion.runmodes.data.preparation.smilify_openbabel(filename, total_charge=0, timeout=30)¶
- MolecularDiffusion.runmodes.data.preparation.smilify_structure(filename, total_charge=0, timeout=30, method='hybrid')¶
- MolecularDiffusion.runmodes.data.preparation.Chem = None¶
- MolecularDiffusion.runmodes.data.preparation.Molecule = None¶
- MolecularDiffusion.runmodes.data.preparation.logger¶
- MolecularDiffusion.runmodes.data.preparation.xyz2mol = None¶