MolecularDiffusion.modules.models.tabasco.data.components.lmdb_unconditional

Attributes

Classes

UnconditionalLMDBDataset

Adapted from Charlie Harris' code, originally based on a PyG dataset of

Module Contents

class MolecularDiffusion.modules.models.tabasco.data.components.lmdb_unconditional.UnconditionalLMDBDataset(data_dir: str, split: str, add_random_rotation: bool = True, add_random_permutation: bool = True, reorder_to_smiles_order: bool = False, remove_hydrogens: bool = True, single_sample: bool = False, limit_samples: int = None, lmdb_dir: str = None)

Bases: MolecularDiffusion.modules.models.tabasco.data.components.lmdb_base.BaseLMDBDataset

Adapted from Charlie Harris’ code, originally based on a PyG dataset of protein-ligand complexes for experimental structures found in CrossDocked2020 dataset. Unconditional ligand dataset backed by an LMDB file.

Initialize the dataset.

Parameters:
  • data_dir – Path to the serialized list of (Optional[Protein], rdkit.Chem.Mol) tuples.

  • split – Name of the dataset split.

  • add_random_rotation – Randomly rotate ligand coordinates on access.

  • add_random_permutation – Randomly permute atom order on access.

  • reorder_to_smiles_order – Sort atoms to match canonical SMILES before tensor conversion.

  • remove_hydrogens – Strip explicit hydrogens prior to processing.

  • single_sample – When True, always return the first entry (useful for debugging).

  • limit_samples – Optional cap on number of molecules processed.

  • lmdb_dir – Directory in which the LMDB and stats YAML will be stored.

compute_stats()

Populate summary statistics and write to disk.

get_data_dict(index: int) tensordict.TensorDict

Return the raw LMDB entry (no tensor conversion).

get_stats()

Get the dataset statistics.

add_random_permutation = True
add_random_rotation = True
all_smiles = []
data_dir
db = None
keys = None
mol_converter
mol_num_atoms_list = []
remove_hydrogens = True
reorder_to_smiles_order = False
split
MolecularDiffusion.modules.models.tabasco.data.components.lmdb_unconditional.logger