MolecularDiffusion.modules.models.tabasco.utils.metrics¶
Attributes¶
Classes¶
Proportion of atoms matching a given element symbol (e.g. 'C' or 'c'). |
|
Similarity of atom-type histograms between generated and training sets. |
|
Share of molecules whose graph is a single connected component. |
|
Mean pair-wise fingerprint distance (higher ⇒ more diverse). |
|
Use Lipinski's rules to compute the lipinski score of a list of molecules. |
|
Average logP (hydrophobicity) via Crippen.MolLogP. |
|
Fraction of generated SMILES absent from the training set. |
|
Average QED score (0-1) over valid molecules. |
|
Ratio of unique canonical SMILES among all valid samples. |
|
Fraction of valid RDKit molecules among generated samples. |
|
Fraction of molecules passing PoseBusters checks. |
|
Average or median strain energy computed by PoseCheck. |
Module Contents¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.AtomFractionMetric(atom_symbol, **kwargs)¶
Bases:
torchmetrics.MetricProportion of atoms matching a given element symbol (e.g. ‘C’ or ‘c’).
- compute()¶
- update(molecules: List[rdkit.Chem.Mol])¶
Update the fraction of carbons with a list of molecules.
- atom_symbol¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.AtomTypeDistribution(original_smiles: List[str], atom_names: List[str] | None = ATOM_NAMES, **kwargs)¶
Bases:
torchmetrics.MetricSimilarity of atom-type histograms between generated and training sets.
- compute()¶
Compute the atom type distribution.
- distribution_similarity(histo1, histo2)¶
Compute the similarity between two histograms.
- update(molecules: List[rdkit.Chem.Mol])¶
Update the atom type distribution with a list of molecules.
- atom_names = ['C', 'N', 'O', 'F', 'S', 'Cl', 'Br', 'I', '*']¶
- atom_type_dict¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularConnectivity(*args, **kwargs)¶
Bases:
torchmetrics.MetricShare of molecules whose graph is a single connected component.
Initialize the connectivity metric.
The connectivity metric measures whether generated molecules are fully connected graphs. A molecule is considered connected if all atoms belong to a single component, with no isolated fragments.
This metric is important for molecular generation since valid molecules should not have disconnected components floating in space.
- compute()¶
Compute the connectivity metric.
- update(molecules: List[rdkit.Chem.Mol])¶
Update the connectivity metric with a list of molecules.
Counts the number of fully-connected molecules.
- higher_is_better = True¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularDiversity(fp_size: int = 2048, fp_type: str = 'ecfp', **kwargs)¶
Bases:
torchmetrics.MetricMean pair-wise fingerprint distance (higher ⇒ more diverse).
Uses datamol.pdist with fingerprints such as ECFP; see datamol.list_supported_fingerprints() for available types.
- compute()¶
- update(molecules: List[rdkit.Chem.Mol])¶
Update the diversity metric with a list of molecules.
- fp_size = 2048¶
- fp_type = 'ecfp'¶
- higher_is_better = True¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularLipinski(**kwargs)¶
Bases:
torchmetrics.MetricUse Lipinski’s rules to compute the lipinski score of a list of molecules.
- compute()¶
- update(molecules: List[rdkit.Chem.Mol])¶
- higher_is_better = True¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularLogP(*args, **kwargs)¶
Bases:
torchmetrics.MetricAverage logP (hydrophobicity) via Crippen.MolLogP.
- compute()¶
- update(molecules: List[rdkit.Chem.Mol])¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularNovelty(original_smiles: List[str], **kwargs)¶
Bases:
torchmetrics.MetricFraction of generated SMILES absent from the training set.
- compute()¶
- update(molecules: List[rdkit.Chem.Mol])¶
Compute smiles of generated molecules and compare to original smiles.
- higher_is_better = True¶
- original_smiles¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularQEDValue(*args, **kwargs)¶
Bases:
torchmetrics.MetricAverage QED score (0-1) over valid molecules.
- compute()¶
- update(molecules: List[rdkit.Chem.Mol])¶
- higher_is_better = True¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularUniqueness(*args, sync_on_compute=True, **kwargs)¶
Bases:
torchmetrics.MetricRatio of unique canonical SMILES among all valid samples. Uses hashing to be fancy and compatible with distributed training.
- compute()¶
Calculate uniqueness ratio.
- reset()¶
Reset the metric state.
- update(molecules: List[rdkit.Chem.Mol])¶
Update the uniqueness metric with a list of molecules.
- higher_is_better = True¶
- sync_on_compute = True¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.MolecularValidity(*args, **kwargs)¶
Bases:
torchmetrics.MetricFraction of valid RDKit molecules among generated samples.
This and the other metrics are mostly based on the code from the diffusion-hopping repo: https://github.com/jostorge/diffusion-hopping/blob/main/diffusion_hopping/analysis/metrics.py
- compute()¶
Compute the validity metric.
- update(molecules: List[rdkit.Chem.Mol])¶
Update the validity metric with a list of molecules.
- higher_is_better = True¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.PoseBustersValidity(**kwargs)¶
Bases:
torchmetrics.MetricFraction of molecules passing PoseBusters checks.
- Parameters:
**kwargs – Forwarded to Metric; may include cfg_file to override the default PoseBusters YAML.
Note
Strain-energy evaluation is very slow—omit it during training unless strictly required (see the YAML in utils/posebusters_no_strain.yaml).
- compute()¶
- update(molecules: List[rdkit.Chem.Mol])¶
Update the PoseBusters validity metric with a list of molecules, bad molecules cause the list to fail.
- higher_is_better = True¶
- class MolecularDiffusion.modules.models.tabasco.utils.metrics.PoseCheckStrainEnergy(mode='median', num_confs=50, **kwargs)¶
Bases:
torchmetrics.MetricAverage or median strain energy computed by PoseCheck.
Initialize the PoseCheck strain energy metric.
- Parameters:
mode – Either “mean” or “median” to determine how the strain energy is aggregated
num_confs – Number of conformations to use for strain energy calculation
**kwargs – Additional arguments to pass to the Metric constructor
- compute()¶
Compute the strain energy according to the specified mode.
- update(molecules: List[rdkit.Chem.Mol])¶
Update the PoseCheck strain energy metric with a list of valid molecules.
- higher_is_better = False¶
- mode = 'median'¶
- num_confs = 50¶
- MolecularDiffusion.modules.models.tabasco.utils.metrics.log¶