MolecularDiffusion.data.component.feature

Attributes

Classes

NodeFeaturizer

Unified node featurization for both GraphDataset and PointCloudDataset.

Functions

ExtendedConnectivityFingerprint(mol[, radius, length])

Extended Connectivity Fingerprint molecule feature.

atom_center_identification(atom)

Reaction center identification atom feature.

atom_default(atom)

Default atom feature.

atom_explicit_property_prediction(atom)

Explicit property prediction atom feature.

atom_geom(z, coords)

atom_geom_compact(z, coords[, scale_factor])

atom_geom_opt(z, coords[, scale_factor])

atom_geom_v2(z, coords)

atom_pretrain(atom)

Atom feature for pretraining.

atom_property_prediction(atom)

Property prediction atom feature.

atom_symbol(atom)

Symbol atom feature.

atom_synthon_completion(atom)

Synthon completion atom feature.

atom_topological(z, coords)

bond_default(bond)

Default bond feature.

bond_length(bond)

Bond length in the molecular conformation.

bond_pretrain(bond)

Bond feature for pretraining.

bond_property_prediction(bond)

Property prediction bond feature.

molecule_default(mol)

Default molecule feature.

onehot(x, vocab[, allow_unknown])

Module Contents

class MolecularDiffusion.data.component.feature.NodeFeaturizer(atom_vocab: list, use_ohe: bool = True, geom_feature: str = None, allow_unknown: bool = False)

Unified node featurization for both GraphDataset and PointCloudDataset.

Supports: - OHE (one-hot encoding) of atom types - Geom-based features from atomic numbers and coordinates

Usage:

featurizer = NodeFeaturizer(atom_vocab, use_ohe=True, geom_feature=”geometric_fast”) features = featurizer.featurize_all(atom_symbols, charges, coords)

Parameters:
  • atom_vocab – List of atom symbols for OHE encoding

  • use_ohe – Whether to include one-hot encoding of atom type

  • geom_feature – Name of geom feature type (None for OHE only)

  • allow_unknown – Whether to allow unknown atoms in OHE

classmethod available_features() list

Return list of available feature names.

compute_geom(charges: torch.Tensor, coords: torch.Tensor) torch.Tensor

Compute geom features from atomic numbers and coordinates.

Parameters:
  • charges – Atomic numbers tensor (N,)

  • coords – Coordinates tensor (N, 3)

Returns:

Geom features tensor (N, F)

compute_ohe(atom_symbols: list) torch.Tensor

Compute one-hot encoding for atom symbols.

Parameters:

atom_symbols – List of atom symbols (e.g., [‘C’, ‘H’, ‘O’])

Returns:

OHE tensor (N, len(vocab)) or (N, len(vocab)+1) if allow_unknown

featurize(charges: torch.Tensor, coords: torch.Tensor) torch.Tensor

Compute geom features only (legacy interface).

featurize_all(atom_symbols: list, charges: torch.Tensor, coords: torch.Tensor) torch.Tensor

Compute all features: OHE + geom (if specified).

Parameters:
  • atom_symbols – List of atom symbols

  • charges – Atomic numbers tensor (N,)

  • coords – Coordinates tensor (N, 3)

Returns:

Combined features tensor (N, total_dim) or None if no features

GEOM_FEATURES = None
allow_unknown = False
atom_vocab
geom_feature = None
use_ohe = True
MolecularDiffusion.data.component.feature.ExtendedConnectivityFingerprint(mol, radius=2, length=1024)

Extended Connectivity Fingerprint molecule feature.

Features:

GetMorganFingerprintAsBitVect(): a Morgan fingerprint for a molecule as a bit vector

MolecularDiffusion.data.component.feature.atom_center_identification(atom)

Reaction center identification atom feature.

Features:

GetSymbol(): one-hot embedding for the atomic symbol

GetTotalNumHs(): one-hot embedding for the total number of Hs (explicit and implicit) on the atom

GetTotalDegree(): one-hot embedding for the degree of the atom in the molecule including Hs

GetTotalValence(): one-hot embedding for the total valence (explicit + implicit) of the atom

GetIsAromatic(): whether the atom is aromatic

IsInRing(): whether the atom is in a ring

MolecularDiffusion.data.component.feature.atom_default(atom)

Default atom feature.

Features:

GetSymbol(): one-hot embedding for the atomic symbol

GetChiralTag(): one-hot embedding for atomic chiral tag

GetTotalDegree(): one-hot embedding for the degree of the atom in the molecule including Hs

GetFormalCharge(): one-hot embedding for the number of formal charges in the molecule

GetTotalNumHs(): one-hot embedding for the total number of Hs (explicit and implicit) on the atom

GetNumRadicalElectrons(): one-hot embedding for the number of radical electrons on the atom

GetHybridization(): one-hot embedding for the atom’s hybridization

GetIsAromatic(): whether the atom is aromatic

IsInRing(): whether the atom is in a ring

MolecularDiffusion.data.component.feature.atom_explicit_property_prediction(atom)

Explicit property prediction atom feature.

Features:

GetSymbol(): one-hot embedding for the atomic symbol

GetDegree(): one-hot embedding for the degree of the atom in the molecule

GetTotalValence(): one-hot embedding for the total valence (explicit + implicit) of the atom

GetFormalCharge(): one-hot embedding for the number of formal charges in the molecule

GetIsAromatic(): whether the atom is aromatic

MolecularDiffusion.data.component.feature.atom_geom(z, coords)
MolecularDiffusion.data.component.feature.atom_geom_compact(z, coords, scale_factor=1.3)
MolecularDiffusion.data.component.feature.atom_geom_opt(z, coords, scale_factor=1.3)
MolecularDiffusion.data.component.feature.atom_geom_v2(z, coords)
MolecularDiffusion.data.component.feature.atom_pretrain(atom)

Atom feature for pretraining.

Features:

GetSymbol(): one-hot embedding for the atomic symbol

GetChiralTag(): one-hot embedding for atomic chiral tag

MolecularDiffusion.data.component.feature.atom_property_prediction(atom)

Property prediction atom feature.

Features:

GetSymbol(): one-hot embedding for the atomic symbol

GetDegree(): one-hot embedding for the degree of the atom in the molecule

GetTotalNumHs(): one-hot embedding for the total number of Hs (explicit and implicit) on the atom

GetTotalValence(): one-hot embedding for the total valence (explicit + implicit) of the atom

GetFormalCharge(): one-hot embedding for the number of formal charges in the molecule

GetIsAromatic(): whether the atom is aromatic

MolecularDiffusion.data.component.feature.atom_symbol(atom)

Symbol atom feature.

Features:

GetSymbol(): one-hot embedding for the atomic symbol

MolecularDiffusion.data.component.feature.atom_synthon_completion(atom)

Synthon completion atom feature.

Features:

GetSymbol(): one-hot embedding for the atomic symbol

GetTotalNumHs(): one-hot embedding for the total number of Hs (explicit and implicit) on the atom

GetTotalDegree(): one-hot embedding for the degree of the atom in the molecule including Hs

IsInRing(): whether the atom is in a ring

IsInRingSize(3, 4, 5, 6): whether the atom is in a ring of a particular size

IsInRing() and not IsInRingSize(3, 4, 5, 6): whether the atom is in a ring and not in a ring of 3, 4, 5, 6

MolecularDiffusion.data.component.feature.atom_topological(z, coords)
MolecularDiffusion.data.component.feature.bond_default(bond)

Default bond feature.

Features:

GetBondType(): one-hot embedding for the type of the bond

GetBondDir(): one-hot embedding for the direction of the bond

GetStereo(): one-hot embedding for the stereo configuration of the bond

GetIsConjugated(): whether the bond is considered to be conjugated

MolecularDiffusion.data.component.feature.bond_length(bond)

Bond length in the molecular conformation.

Note it takes much time to compute the conformation for large molecules.

MolecularDiffusion.data.component.feature.bond_pretrain(bond)

Bond feature for pretraining.

Features:

GetBondType(): one-hot embedding for the type of the bond

GetBondDir(): one-hot embedding for the direction of the bond

MolecularDiffusion.data.component.feature.bond_property_prediction(bond)

Property prediction bond feature.

Features:

GetBondType(): one-hot embedding for the type of the bond

GetIsConjugated(): whether the bond is considered to be conjugated

IsInRing(): whether the bond is in a ring

MolecularDiffusion.data.component.feature.molecule_default(mol)

Default molecule feature.

MolecularDiffusion.data.component.feature.onehot(x, vocab, allow_unknown=False)
MolecularDiffusion.data.component.feature.ECFP