MolecularDiffusion.modules.models.tabasco.data.components.lmdb_unconditional¶
Attributes¶
Classes¶
Adapted from Charlie Harris' code, originally based on a PyG dataset of |
Module Contents¶
- class MolecularDiffusion.modules.models.tabasco.data.components.lmdb_unconditional.UnconditionalLMDBDataset(data_dir: str, split: str, add_random_rotation: bool = True, add_random_permutation: bool = True, reorder_to_smiles_order: bool = False, remove_hydrogens: bool = True, single_sample: bool = False, limit_samples: int = None, lmdb_dir: str = None)¶
Bases:
MolecularDiffusion.modules.models.tabasco.data.components.lmdb_base.BaseLMDBDatasetAdapted from Charlie Harris’ code, originally based on a PyG dataset of protein-ligand complexes for experimental structures found in CrossDocked2020 dataset. Unconditional ligand dataset backed by an LMDB file.
Initialize the dataset.
- Parameters:
data_dir – Path to the serialized list of (Optional[Protein], rdkit.Chem.Mol) tuples.
split – Name of the dataset split.
add_random_rotation – Randomly rotate ligand coordinates on access.
add_random_permutation – Randomly permute atom order on access.
reorder_to_smiles_order – Sort atoms to match canonical SMILES before tensor conversion.
remove_hydrogens – Strip explicit hydrogens prior to processing.
single_sample – When True, always return the first entry (useful for debugging).
limit_samples – Optional cap on number of molecules processed.
lmdb_dir – Directory in which the LMDB and stats YAML will be stored.
- compute_stats()¶
Populate summary statistics and write to disk.
- get_stats()¶
Get the dataset statistics.
- add_random_permutation = True¶
- add_random_rotation = True¶
- all_smiles = []¶
- data_dir¶
- db = None¶
- keys = None¶
- mol_converter¶
- mol_num_atoms_list = []¶
- remove_hydrogens = True¶
- reorder_to_smiles_order = False¶
- split¶
- MolecularDiffusion.modules.models.tabasco.data.components.lmdb_unconditional.logger¶