MolecularDiffusion.modules.models.tabasco.utils.metrics

Attributes

log

Classes

AtomFractionMetric

Proportion of atoms matching a given element symbol (e.g. 'C' or 'c').

AtomTypeDistribution

Similarity of atom-type histograms between generated and training sets.

MolecularConnectivity

Share of molecules whose graph is a single connected component.

MolecularDiversity

Mean pair-wise fingerprint distance (higher ⇒ more diverse).

MolecularLipinski

Use Lipinski's rules to compute the lipinski score of a list of molecules.

MolecularLogP

Average logP (hydrophobicity) via Crippen.MolLogP.

MolecularNovelty

Fraction of generated SMILES absent from the training set.

MolecularQEDValue

Average QED score (0-1) over valid molecules.

MolecularUniqueness

Ratio of unique canonical SMILES among all valid samples.

MolecularValidity

Fraction of valid RDKit molecules among generated samples.

PoseBustersValidity

Fraction of molecules passing PoseBusters checks.

PoseCheckStrainEnergy

Average or median strain energy computed by PoseCheck.

Module Contents

class MolecularDiffusion.modules.models.tabasco.utils.metrics.AtomFractionMetric(atom_symbol, **kwargs)

Bases: torchmetrics.Metric

Proportion of atoms matching a given element symbol (e.g. ‘C’ or ‘c’).

compute()
update(molecules: List[rdkit.Chem.Mol])

Update the fraction of carbons with a list of molecules.

atom_symbol
class MolecularDiffusion.modules.models.tabasco.utils.metrics.AtomTypeDistribution(original_smiles: List[str], atom_names: List[str] | None = ATOM_NAMES, **kwargs)

Bases: torchmetrics.Metric

Similarity of atom-type histograms between generated and training sets.

compute()

Compute the atom type distribution.

distribution_similarity(histo1, histo2)

Compute the similarity between two histograms.

name_to_idx(atom_type: str) int

Convert an atom type to an index.

update(molecules: List[rdkit.Chem.Mol])

Update the atom type distribution with a list of molecules.

atom_names = ['C', 'N', 'O', 'F', 'S', 'Cl', 'Br', 'I', '*']
atom_type_dict
class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularConnectivity(*args, **kwargs)

Bases: torchmetrics.Metric

Share of molecules whose graph is a single connected component.

Initialize the connectivity metric.

The connectivity metric measures whether generated molecules are fully connected graphs. A molecule is considered connected if all atoms belong to a single component, with no isolated fragments.

This metric is important for molecular generation since valid molecules should not have disconnected components floating in space.

compute()

Compute the connectivity metric.

update(molecules: List[rdkit.Chem.Mol])

Update the connectivity metric with a list of molecules.

Counts the number of fully-connected molecules.

higher_is_better = True
class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularDiversity(fp_size: int = 2048, fp_type: str = 'ecfp', **kwargs)

Bases: torchmetrics.Metric

Mean pair-wise fingerprint distance (higher ⇒ more diverse).

Uses datamol.pdist with fingerprints such as ECFP; see datamol.list_supported_fingerprints() for available types.

compute()
update(molecules: List[rdkit.Chem.Mol])

Update the diversity metric with a list of molecules.

fp_size = 2048
fp_type = 'ecfp'
higher_is_better = True
class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularLipinski(**kwargs)

Bases: torchmetrics.Metric

Use Lipinski’s rules to compute the lipinski score of a list of molecules.

compute()
update(molecules: List[rdkit.Chem.Mol])
higher_is_better = True
class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularLogP(*args, **kwargs)

Bases: torchmetrics.Metric

Average logP (hydrophobicity) via Crippen.MolLogP.

compute()
update(molecules: List[rdkit.Chem.Mol])
class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularNovelty(original_smiles: List[str], **kwargs)

Bases: torchmetrics.Metric

Fraction of generated SMILES absent from the training set.

compute()
update(molecules: List[rdkit.Chem.Mol])

Compute smiles of generated molecules and compare to original smiles.

higher_is_better = True
original_smiles
class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularQEDValue(*args, **kwargs)

Bases: torchmetrics.Metric

Average QED score (0-1) over valid molecules.

compute()
update(molecules: List[rdkit.Chem.Mol])
higher_is_better = True
class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularUniqueness(*args, sync_on_compute=True, **kwargs)

Bases: torchmetrics.Metric

Ratio of unique canonical SMILES among all valid samples. Uses hashing to be fancy and compatible with distributed training.

compute()

Calculate uniqueness ratio.

reset()

Reset the metric state.

update(molecules: List[rdkit.Chem.Mol])

Update the uniqueness metric with a list of molecules.

higher_is_better = True
sync_on_compute = True
class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularValidity(*args, **kwargs)

Bases: torchmetrics.Metric

Fraction of valid RDKit molecules among generated samples.

This and the other metrics are mostly based on the code from the diffusion-hopping repo: https://github.com/jostorge/diffusion-hopping/blob/main/diffusion_hopping/analysis/metrics.py

compute()

Compute the validity metric.

update(molecules: List[rdkit.Chem.Mol])

Update the validity metric with a list of molecules.

higher_is_better = True
class MolecularDiffusion.modules.models.tabasco.utils.metrics.PoseBustersValidity(**kwargs)

Bases: torchmetrics.Metric

Fraction of molecules passing PoseBusters checks.

Parameters:

**kwargs – Forwarded to Metric; may include cfg_file to override the default PoseBusters YAML.

Note

Strain-energy evaluation is very slow—omit it during training unless strictly required (see the YAML in utils/posebusters_no_strain.yaml).

compute()
update(molecules: List[rdkit.Chem.Mol])

Update the PoseBusters validity metric with a list of molecules, bad molecules cause the list to fail.

higher_is_better = True
class MolecularDiffusion.modules.models.tabasco.utils.metrics.PoseCheckStrainEnergy(mode='median', num_confs=50, **kwargs)

Bases: torchmetrics.Metric

Average or median strain energy computed by PoseCheck.

Initialize the PoseCheck strain energy metric.

Parameters:
  • mode – Either “mean” or “median” to determine how the strain energy is aggregated

  • num_confs – Number of conformations to use for strain energy calculation

  • **kwargs – Additional arguments to pass to the Metric constructor

compute()

Compute the strain energy according to the specified mode.

update(molecules: List[rdkit.Chem.Mol])

Update the PoseCheck strain energy metric with a list of valid molecules.

higher_is_better = False
mode = 'median'
num_confs = 50
MolecularDiffusion.modules.models.tabasco.utils.metrics.log