MolecularDiffusion.cli.analyze¶
Analyze CLI subcommands for 3D molecule analysis.
Provides subcommands for: - optimize: XTB geometry optimization - metrics: Validity/connectivity metrics - compare: RMSD, energy, and optional bond analysis - xyz2mol: XYZ to SMILES conversion + fingerprints
Attributes¶
Functions¶
|
Analyze 3D molecular structures. |
|
Compare XYZ files with their optimized counterparts. |
|
Featurize 3D XYZ molecules into fixed-size feature vectors. |
|
Compute validity and connectivity metrics for XYZ files or ASE DB rows. |
|
Optimize molecular geometries (XYZ or ASE DB). |
|
Compute XTB electronic properties for XYZ files or ASE DB rows. |
|
Convert XYZ files to SMILES and extract fingerprints/scaffolds. |
Module Contents¶
- MolecularDiffusion.cli.analyze.analyze()¶
Analyze 3D molecular structures.
Subcommands:
optimize XTB geometry optimization metrics Validity/connectivity metrics compare RMSD, energy, and bond analysis xyz2mol Convert XYZ to SMILES + fingerprints
- MolecularDiffusion.cli.analyze.compare(directory, mol_converter, n_subsets, csv_path, charge, level, timeout)¶
Compare XYZ files with their optimized counterparts.
Computes RMSD, xTB Energy Difference, and Bond Geometry Metrics. Enforces strict connectivity checks.
Requires ‘optimized_xyz’ subdirectory with *_opt.xyz files.
- MolecularDiffusion.cli.analyze.featurize(input_dir, backend, output, recursive, r_cut, n_max, l_max, sigma, autodetect_species, species, pooling, soap_jobs, checkpoint, task_name, device, batch_size, all_components, charge, spin, ssl3d_checkpoint, edge_radius)¶
Featurize 3D XYZ molecules into fixed-size feature vectors.
Backends:
soap SOAP descriptor via dscribe — no GPU required uma UMA backbone embeddings — requires vendored fairchem/src + checkpoint ssl3d SSL3D backbone embeddings — requires a trained SSL3D .ckpt or .pkl
.. rubric:: Examples
MolCraftDiff analyze featurize gen_xyz/ MolCraftDiff analyze featurize gen_xyz/ –autodetect MolCraftDiff analyze featurize gen_xyz/ –species C –species H –species N –species O MolCraftDiff analyze featurize gen_xyz/ –backend soap –n-max 12 –l-max 9 MolCraftDiff analyze featurize gen_xyz/ –backend uma –device cuda MolCraftDiff analyze featurize gen_xyz/ –backend ssl3d –ssl3d-checkpoint runs/last.ckpt MolCraftDiff analyze featurize gen_xyz/ –backend ssl3d –ssl3d-checkpoint runs/last.ckpt –device cuda
- MolecularDiffusion.cli.analyze.metrics(input_path, output, filter_column, filtered_output, metrics_type, recheck_topo, check_strain, portion, mol_converter, skip_atoms, split, timeout, reference_mol, mol_idx)¶
Compute validity and connectivity metrics for XYZ files or ASE DB rows.
Metrics types:
all Run all metrics (core + posebuster + geom_revised + shepherd) core Basic validity checks (connectivity, atom stability) posebuster PoseBusters checks (bond lengths, angles, clashes) geom_revised Aromatic-aware stability metrics shepherd Drug-likeness and conditional similarity metrics
.. rubric:: Examples
MolCraftDiff analyze metrics gen_xyz/ MolCraftDiff analyze metrics molecules.db –metrics core –filter valid_connected MolCraftDiff analyze metrics gen_xyz/ –metrics posebuster MolCraftDiff analyze metrics gen_xyz/ –metrics geom_revised –mol-converter openbabel MolCraftDiff analyze metrics gen_xyz/ –metrics shepherd MolCraftDiff analyze metrics gen_xyz/ –split 4 MolCraftDiff analyze metrics gen_xyz/ –metrics shepherd -r data/shepherd_data/gdb/molblock_charges_9_test100.pkl –mol-idx 0
- MolecularDiffusion.cli.analyze.optimize(input_path, output_path, charge, level, timeout, scale_factor, csv_path, filter_column, inherit_attributes)¶
Optimize molecular geometries (XYZ or ASE DB).
If input is a directory, it processes all XYZ files. If input is a .db file, it processes all rows in the ASE database.
.. rubric:: Examples
MolCraftDiff analyze optimize gen_xyz/ MolCraftDiff analyze optimize gen_xyz/ –o optimized/ –level gfn2
- MolecularDiffusion.cli.analyze.xtb_electronic(input_path, output, method, charge, n_unpaired, auto_charge, solvent, properties, corrected, timeout, n_jobs, output_format, annotate_db, verbose)¶
Compute XTB electronic properties for XYZ files or ASE DB rows.
Uses morfeus to calculate quantum-chemical descriptors at the GFN-xTB level.
Property groups (molecular-level):
energy Total energy, HOMO, LUMO, gap, Fermi level dipole Dipole moment and vector reactivity IP, EA, electronegativity, hardness, softness global Electrophilicity, nucleophilicity, fugalities solvation Solvation energy, H-bond correction (requires –solvent)
Property groups (atomic-level):
charges Atomic charges (Mulliken) fukui Fukui indices (f+, f-, f, dual) bond_orders Bond orders between atom pairs
Output formats:
csv Molecular-level properties only (one row per molecule) json Full data including atomic-level properties ase ASE database with properties in atoms.info/arrays all Generate all three formats
.. rubric:: Examples
MolCraftDiff analyze xtb-electronic gen_xyz/ MolCraftDiff analyze xtb-electronic molecules.db -p all MolCraftDiff analyze xtb-electronic molecules.db -p all –annotate-db MolCraftDiff analyze xtb-electronic gen_xyz/ -p energy -p reactivity MolCraftDiff analyze xtb-electronic gen_xyz/ -s water -p solvation MolCraftDiff analyze xtb-electronic gen_xyz/ –method ptb –auto-charge MolCraftDiff analyze xtb-electronic gen_xyz/ -p all -f ase -o results.db
- MolecularDiffusion.cli.analyze.xyz2mol(xyz_dir, input_csv, label, timeout, bits, verbose)¶
Convert XYZ files to SMILES and extract fingerprints/scaffolds.
- Outputs are saved to xyz_dir/2d_reprs/:
smiles_processed.csv
fingerprints.npy
scaffolds.txt
substructures.json
.. rubric:: Examples
MolCraftDiff analyze xyz2mol gen_xyz/ MolCraftDiff analyze xyz2mol gen_xyz/ –bits 1024 -v
- MolecularDiffusion.cli.analyze.CONTEXT_SETTINGS¶