MolecularDiffusion.runmodes.train.data

Classes

DataModule

DataModule to load, optionally save/load pickle, and split datasets for diffusion or predictive tasks.

Module Contents

class MolecularDiffusion.runmodes.train.data.DataModule(root: str, filename: str, task_type: str, atom_vocab: list, with_hydrogen: bool, node_feature_choice=None, max_atom: int = 200, target_fields: list = None, xyz_dir: str = None, coord_file: str = None, natoms_file: str = None, ase_db_path: str = None, forbidden_atom: list = None, data_efficient_collator: bool = False, train_ratio: float = 0.8, load_pkl: str = None, save_pkl: str = None, data_type: str = 'pointcloud', allow_unknown: bool = False, batch_size: int = 32, num_workers: int = 0, dataset_name: str = 'suisei', edge_type: str = 'fully_connected', radius: float = 4.0, n_neigh: int = 5, use_ohe_feature: bool = True)

DataModule to load, optionally save/load pickle, and split datasets for diffusion or predictive tasks.

Usage:
module = DataModule(

filename=”data.pkl”, task_type=”diffusion”, atom_vocab=atom_vocab_list, with_hydrogen=True, node_feature_choice=”geom”, max_atom=50, xyz_dir=”xyz/”, coord_file=”coords.csv”, natoms_file=”natoms.csv”, ase_db_path=None, forbidden_atom=None, data_efficient_collator=False, train_ratio=0.8, load_pkl=None, # path to load dataset pickle save_pkl=”cached_dataset.pkl” # path to save dataset pickle

) module.load() train_ds, valid_ds, test_ds = module.train_set, module.valid_set, module.test_set

load()

Load dataset (from pickle if available), optionally save pickle, then split into train/valid/test.

allow_unknown = False
ase_db_path = None
atom_vocab
batch_size = 32
coord_file = None
data_efficient_collator = False
data_type = ''
dataset_name = 'suisei'
edge_type = 'fully_connected'
filename
forbidden_atom = None
load_pkl = None
max_atom = 200
n_neigh = 5
natoms_file = None
node_feature_choice = None
num_workers = 0
radius = 4.0
root
root_path
save_pkl = None
target_fields = None
task_type
test_set = None
train_ratio = 0.8
train_set = None
use_ohe_feature = True
valid_set = None
with_hydrogen
xyz_dir = None