MolecularDiffusion.runmodes.train.data¶
Classes¶
DataModule to load, optionally save/load pickle, and split datasets for diffusion or predictive tasks. |
Module Contents¶
- class MolecularDiffusion.runmodes.train.data.DataModule(root: str, filename: str, task_type: str, atom_vocab: list, with_hydrogen: bool, node_feature_choice=None, max_atom: int = 200, target_fields: list = None, xyz_dir: str = None, coord_file: str = None, natoms_file: str = None, ase_db_path: str = None, forbidden_atom: list = None, data_efficient_collator: bool = False, train_ratio: float = 0.8, load_pkl: str = None, save_pkl: str = None, data_type: str = 'pointcloud', allow_unknown: bool = False, batch_size: int = 32, num_workers: int = 0, dataset_name: str = 'suisei', edge_type: str = 'fully_connected', radius: float = 4.0, n_neigh: int = 5, use_ohe_feature: bool = True)¶
DataModule to load, optionally save/load pickle, and split datasets for diffusion or predictive tasks.
- Usage:
- module = DataModule(
filename=”data.pkl”, task_type=”diffusion”, atom_vocab=atom_vocab_list, with_hydrogen=True, node_feature_choice=”geom”, max_atom=50, xyz_dir=”xyz/”, coord_file=”coords.csv”, natoms_file=”natoms.csv”, ase_db_path=None, forbidden_atom=None, data_efficient_collator=False, train_ratio=0.8, load_pkl=None, # path to load dataset pickle save_pkl=”cached_dataset.pkl” # path to save dataset pickle
) module.load() train_ds, valid_ds, test_ds = module.train_set, module.valid_set, module.test_set
- load()¶
Load dataset (from pickle if available), optionally save pickle, then split into train/valid/test.
- allow_unknown = False¶
- ase_db_path = None¶
- atom_vocab¶
- batch_size = 32¶
- coord_file = None¶
- data_efficient_collator = False¶
- data_type = ''¶
- dataset_name = 'suisei'¶
- edge_type = 'fully_connected'¶
- filename¶
- forbidden_atom = None¶
- load_pkl = None¶
- max_atom = 200¶
- n_neigh = 5¶
- natoms_file = None¶
- node_feature_choice = None¶
- num_workers = 0¶
- radius = 4.0¶
- root¶
- root_path¶
- save_pkl = None¶
- target_fields = None¶
- task_type¶
- test_set = None¶
- train_ratio = 0.8¶
- train_set = None¶
- use_ohe_feature = True¶
- valid_set = None¶
- with_hydrogen¶
- xyz_dir = None¶