MolecularDiffusion.cli.analyze

Analyze CLI subcommands for 3D molecule analysis.

Provides subcommands for: - optimize: XTB geometry optimization - metrics: Validity/connectivity metrics - compare: RMSD, energy, and optional bond analysis - xyz2mol: XYZ to SMILES conversion + fingerprints

Attributes

Functions

analyze()

Analyze 3D molecular structures.

compare(directory, mol_converter, n_subsets, csv_path, ...)

Compare XYZ files with their optimized counterparts.

featurize(input_dir, backend, output, recursive, ...)

Featurize 3D XYZ molecules into fixed-size feature vectors.

metrics(input_path, output, filter_column, ...)

Compute validity and connectivity metrics for XYZ files or ASE DB rows.

optimize(input_path, output_path, charge, level, ...)

Optimize molecular geometries (XYZ or ASE DB).

xtb_electronic(input_path, output, method, charge, ...)

Compute XTB electronic properties for XYZ files or ASE DB rows.

xyz2mol(xyz_dir, input_csv, label, timeout, bits, verbose)

Convert XYZ files to SMILES and extract fingerprints/scaffolds.

Module Contents

MolecularDiffusion.cli.analyze.analyze()

Analyze 3D molecular structures.

 Subcommands:

optimize XTB geometry optimization metrics Validity/connectivity metrics compare RMSD, energy, and bond analysis xyz2mol Convert XYZ to SMILES + fingerprints

MolecularDiffusion.cli.analyze.compare(directory, mol_converter, n_subsets, csv_path, charge, level, timeout)

Compare XYZ files with their optimized counterparts.

Computes RMSD, xTB Energy Difference, and Bond Geometry Metrics. Enforces strict connectivity checks.

Requires ‘optimized_xyz’ subdirectory with *_opt.xyz files.

MolecularDiffusion.cli.analyze.featurize(input_dir, backend, output, recursive, r_cut, n_max, l_max, sigma, autodetect_species, species, pooling, soap_jobs, checkpoint, task_name, device, batch_size, all_components, charge, spin, ssl3d_checkpoint, edge_radius)

Featurize 3D XYZ molecules into fixed-size feature vectors.

 Backends:

soap SOAP descriptor via dscribe — no GPU required uma UMA backbone embeddings — requires vendored fairchem/src + checkpoint ssl3d SSL3D backbone embeddings — requires a trained SSL3D .ckpt or .pkl

 .. rubric:: Examples

MolCraftDiff analyze featurize gen_xyz/ MolCraftDiff analyze featurize gen_xyz/ –autodetect MolCraftDiff analyze featurize gen_xyz/ –species C –species H –species N –species O MolCraftDiff analyze featurize gen_xyz/ –backend soap –n-max 12 –l-max 9 MolCraftDiff analyze featurize gen_xyz/ –backend uma –device cuda MolCraftDiff analyze featurize gen_xyz/ –backend ssl3d –ssl3d-checkpoint runs/last.ckpt MolCraftDiff analyze featurize gen_xyz/ –backend ssl3d –ssl3d-checkpoint runs/last.ckpt –device cuda

MolecularDiffusion.cli.analyze.metrics(input_path, output, filter_column, filtered_output, metrics_type, recheck_topo, check_strain, portion, mol_converter, skip_atoms, split, timeout, reference_mol, mol_idx)

Compute validity and connectivity metrics for XYZ files or ASE DB rows.

 Metrics types:

all Run all metrics (core + posebuster + geom_revised + shepherd) core Basic validity checks (connectivity, atom stability) posebuster PoseBusters checks (bond lengths, angles, clashes) geom_revised Aromatic-aware stability metrics shepherd Drug-likeness and conditional similarity metrics

 .. rubric:: Examples

MolCraftDiff analyze metrics gen_xyz/ MolCraftDiff analyze metrics molecules.db –metrics core –filter valid_connected MolCraftDiff analyze metrics gen_xyz/ –metrics posebuster MolCraftDiff analyze metrics gen_xyz/ –metrics geom_revised –mol-converter openbabel MolCraftDiff analyze metrics gen_xyz/ –metrics shepherd MolCraftDiff analyze metrics gen_xyz/ –split 4 MolCraftDiff analyze metrics gen_xyz/ –metrics shepherd -r data/shepherd_data/gdb/molblock_charges_9_test100.pkl –mol-idx 0

MolecularDiffusion.cli.analyze.optimize(input_path, output_path, charge, level, timeout, scale_factor, csv_path, filter_column, inherit_attributes)

Optimize molecular geometries (XYZ or ASE DB).

If input is a directory, it processes all XYZ files. If input is a .db file, it processes all rows in the ASE database.

 .. rubric:: Examples

MolCraftDiff analyze optimize gen_xyz/ MolCraftDiff analyze optimize gen_xyz/ –o optimized/ –level gfn2

MolecularDiffusion.cli.analyze.xtb_electronic(input_path, output, method, charge, n_unpaired, auto_charge, solvent, properties, corrected, timeout, n_jobs, output_format, annotate_db, verbose)

Compute XTB electronic properties for XYZ files or ASE DB rows.

Uses morfeus to calculate quantum-chemical descriptors at the GFN-xTB level.

 Property groups (molecular-level):

energy Total energy, HOMO, LUMO, gap, Fermi level dipole Dipole moment and vector reactivity IP, EA, electronegativity, hardness, softness global Electrophilicity, nucleophilicity, fugalities solvation Solvation energy, H-bond correction (requires –solvent)

 Property groups (atomic-level):

charges Atomic charges (Mulliken) fukui Fukui indices (f+, f-, f, dual) bond_orders Bond orders between atom pairs

 Output formats:

csv Molecular-level properties only (one row per molecule) json Full data including atomic-level properties ase ASE database with properties in atoms.info/arrays all Generate all three formats

 .. rubric:: Examples

MolCraftDiff analyze xtb-electronic gen_xyz/ MolCraftDiff analyze xtb-electronic molecules.db -p all MolCraftDiff analyze xtb-electronic molecules.db -p all –annotate-db MolCraftDiff analyze xtb-electronic gen_xyz/ -p energy -p reactivity MolCraftDiff analyze xtb-electronic gen_xyz/ -s water -p solvation MolCraftDiff analyze xtb-electronic gen_xyz/ –method ptb –auto-charge MolCraftDiff analyze xtb-electronic gen_xyz/ -p all -f ase -o results.db

MolecularDiffusion.cli.analyze.xyz2mol(xyz_dir, input_csv, label, timeout, bits, verbose)

Convert XYZ files to SMILES and extract fingerprints/scaffolds.

Outputs are saved to xyz_dir/2d_reprs/:
  • smiles_processed.csv

  • fingerprints.npy

  • scaffolds.txt

  • substructures.json

 .. rubric:: Examples

MolCraftDiff analyze xyz2mol gen_xyz/ MolCraftDiff analyze xyz2mol gen_xyz/ –bits 1024 -v

MolecularDiffusion.cli.analyze.CONTEXT_SETTINGS