MolecularDiffusion.runmodes.data.ase_ops

ASE database operations module. Handles merging, inspecting, splitting, and sampling.

Attributes

Functions

inspect_db(db_path[, output_dir, keys_to_plot, ...])

Inspects an ASE DB, printing stats and optionally plotting distributions.

is_clean(row)

Verifies that the atom order in ASE atoms and RDKit mol from mol_block are identical.

merge_dbs(input_dir, output_db[, recursive, pattern])

Merges multiple ASE databases into one.

rename_db_attribute(db_path, old_name, new_name)

Renames a data attribute for all rows in an ASE database.

sample_db(input_db, output[, output_type, fraction, ...])

Samples a random fraction or number of entries from an ASE database.

split_db(db_path, output_dir[, n_splits])

Splits a DB into N smaller DBs.

verify_datapoint(atoms, mol_block)

Verifies that ASE Atoms match RDKit Mol block.

Module Contents

MolecularDiffusion.runmodes.data.ase_ops.inspect_db(db_path: pathlib.Path, output_dir: pathlib.Path = None, keys_to_plot: List[str] = None, sample_size: int = 5000, limit_print: int = 10)

Inspects an ASE DB, printing stats and optionally plotting distributions.

MolecularDiffusion.runmodes.data.ase_ops.is_clean(row)

Verifies that the atom order in ASE atoms and RDKit mol from mol_block are identical.

MolecularDiffusion.runmodes.data.ase_ops.merge_dbs(input_dir: pathlib.Path, output_db: pathlib.Path, recursive: bool = False, pattern: str = '*.db')

Merges multiple ASE databases into one.

MolecularDiffusion.runmodes.data.ase_ops.rename_db_attribute(db_path: pathlib.Path, old_name: str, new_name: str)

Renames a data attribute for all rows in an ASE database.

MolecularDiffusion.runmodes.data.ase_ops.sample_db(input_db: pathlib.Path, output: pathlib.Path, output_type: str = 'db', fraction: float = None, number: int = None, seed: int = None, verify_clean: bool = False)

Samples a random fraction or number of entries from an ASE database.

output_type:

‘db’ – write to an ASE SQLite database (default) ‘xyz’ – write one XYZ file per molecule into the output directory ‘npy’ – write positions.npy (M,N,3), numbers.npy (M,N), and

natoms.npy (M,) arrays into the output directory, where M is the number of sampled entries and N is padded to the maximum atom count in the sample.

MolecularDiffusion.runmodes.data.ase_ops.split_db(db_path: pathlib.Path, output_dir: pathlib.Path, n_splits: int = 2)

Splits a DB into N smaller DBs.

MolecularDiffusion.runmodes.data.ase_ops.verify_datapoint(atoms, mol_block)

Verifies that ASE Atoms match RDKit Mol block.

MolecularDiffusion.runmodes.data.ase_ops.Chem = None
MolecularDiffusion.runmodes.data.ase_ops.logger