MolecularDiffusion.modules.models.tabasco.utils.metrics¶

Attributes¶

log

Classes¶

`AtomFractionMetric`	Proportion of atoms matching a given element symbol (e.g. 'C' or 'c').
`AtomTypeDistribution`	Similarity of atom-type histograms between generated and training sets.
`MolecularConnectivity`	Share of molecules whose graph is a single connected component.
`MolecularDiversity`	Mean pair-wise fingerprint distance (higher ⇒ more diverse).
`MolecularLipinski`	Use Lipinski's rules to compute the lipinski score of a list of molecules.
`MolecularLogP`	Average logP (hydrophobicity) via Crippen.MolLogP.
`MolecularNovelty`	Fraction of generated SMILES absent from the training set.
`MolecularQEDValue`	Average QED score (0-1) over valid molecules.
`MolecularUniqueness`	Ratio of unique canonical SMILES among all valid samples.
`MolecularValidity`	Fraction of valid RDKit molecules among generated samples.
`PoseBustersValidity`	Fraction of molecules passing PoseBusters checks.
`PoseCheckStrainEnergy`	Average or median strain energy computed by PoseCheck.

Module Contents¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.AtomFractionMetric(atom_symbol, **kwargs)¶

Bases: torchmetrics.Metric

Proportion of atoms matching a given element symbol (e.g. ‘C’ or ‘c’).

compute()¶

update(molecules: List[rdkit.Chem.Mol])¶: Update the fraction of carbons with a list of molecules.

atom_symbol¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.AtomTypeDistribution(original_smiles: List[str], atom_names: List[str] | None = ATOM_NAMES, **kwargs)¶

Bases: torchmetrics.Metric

Similarity of atom-type histograms between generated and training sets.

compute()¶: Compute the atom type distribution.

distribution_similarity(histo1, histo2)¶: Compute the similarity between two histograms.

name_to_idx(atom_type: str) → int¶: Convert an atom type to an index.

update(molecules: List[rdkit.Chem.Mol])¶: Update the atom type distribution with a list of molecules.

atom_names = ['C', 'N', 'O', 'F', 'S', 'Cl', 'Br', 'I', '*']¶

atom_type_dict¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularConnectivity(*args, **kwargs)¶

Bases: torchmetrics.Metric

Share of molecules whose graph is a single connected component.

Initialize the connectivity metric.

The connectivity metric measures whether generated molecules are fully connected graphs. A molecule is considered connected if all atoms belong to a single component, with no isolated fragments.

This metric is important for molecular generation since valid molecules should not have disconnected components floating in space.

compute()¶: Compute the connectivity metric.

update(molecules: List[rdkit.Chem.Mol])¶

Update the connectivity metric with a list of molecules.

Counts the number of fully-connected molecules.

higher_is_better = True¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularDiversity(fp_size: int = 2048, fp_type: str = 'ecfp', **kwargs)¶

Bases: torchmetrics.Metric

Mean pair-wise fingerprint distance (higher ⇒ more diverse).

Uses datamol.pdist with fingerprints such as ECFP; see datamol.list_supported_fingerprints() for available types.

compute()¶

update(molecules: List[rdkit.Chem.Mol])¶: Update the diversity metric with a list of molecules.

fp_size = 2048¶

fp_type = 'ecfp'¶

higher_is_better = True¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularLipinski(**kwargs)¶

Bases: torchmetrics.Metric

Use Lipinski’s rules to compute the lipinski score of a list of molecules.

compute()¶

update(molecules: List[rdkit.Chem.Mol])¶

higher_is_better = True¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularLogP(*args, **kwargs)¶

Bases: torchmetrics.Metric

Average logP (hydrophobicity) via Crippen.MolLogP.

compute()¶

update(molecules: List[rdkit.Chem.Mol])¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularNovelty(original_smiles: List[str], **kwargs)¶

Bases: torchmetrics.Metric

Fraction of generated SMILES absent from the training set.

compute()¶

update(molecules: List[rdkit.Chem.Mol])¶: Compute smiles of generated molecules and compare to original smiles.

higher_is_better = True¶

original_smiles¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularQEDValue(*args, **kwargs)¶

Bases: torchmetrics.Metric

Average QED score (0-1) over valid molecules.

compute()¶

update(molecules: List[rdkit.Chem.Mol])¶

higher_is_better = True¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularUniqueness(*args, sync_on_compute=True, **kwargs)¶

Bases: torchmetrics.Metric

Ratio of unique canonical SMILES among all valid samples. Uses hashing to be fancy and compatible with distributed training.

compute()¶: Calculate uniqueness ratio.

reset()¶: Reset the metric state.

update(molecules: List[rdkit.Chem.Mol])¶: Update the uniqueness metric with a list of molecules.

higher_is_better = True¶

sync_on_compute = True¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularValidity(*args, **kwargs)¶

Bases: torchmetrics.Metric

Fraction of valid RDKit molecules among generated samples.

This and the other metrics are mostly based on the code from the diffusion-hopping repo: https://github.com/jostorge/diffusion-hopping/blob/main/diffusion_hopping/analysis/metrics.py

compute()¶: Compute the validity metric.

update(molecules: List[rdkit.Chem.Mol])¶: Update the validity metric with a list of molecules.

higher_is_better = True¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.PoseBustersValidity(**kwargs)¶

Bases: torchmetrics.Metric

Fraction of molecules passing PoseBusters checks.

Parameters:: **kwargs – Forwarded to Metric; may include cfg_file to override the default PoseBusters YAML.

Note

Strain-energy evaluation is very slow—omit it during training unless strictly required (see the YAML in utils/posebusters_no_strain.yaml).

compute()¶

update(molecules: List[rdkit.Chem.Mol])¶: Update the PoseBusters validity metric with a list of molecules, bad molecules cause the list to fail.

higher_is_better = True¶

class MolecularDiffusion.modules.models.tabasco.utils.metrics.PoseCheckStrainEnergy(mode='median', num_confs=50, **kwargs)¶

Bases: torchmetrics.Metric

Average or median strain energy computed by PoseCheck.

Initialize the PoseCheck strain energy metric.

Parameters:

mode – Either “mean” or “median” to determine how the strain energy is aggregated
num_confs – Number of conformations to use for strain energy calculation
**kwargs – Additional arguments to pass to the Metric constructor

compute()¶: Compute the strain energy according to the specified mode.

update(molecules: List[rdkit.Chem.Mol])¶: Update the PoseCheck strain energy metric with a list of valid molecules.

higher_is_better = False¶

mode = 'median'¶

num_confs = 50¶

MolecularDiffusion.modules.models.tabasco.utils.metrics.log¶