MolecularDiffusion.runmodes.data.augmentation¶
Data augmentation module for MolCraft. Handles random charge, distortion, and size augmentation.
Attributes¶
Functions¶
|
Adds a single hydrogen to a non-halogen heteroatom. |
|
Main function for random charge augmentation via H addition/removal + XTB re-optimization. |
|
Applies random distortion to atomic coordinates. |
|
Augments DB by adding rotated/inverted versions to balance molecule sizes. |
|
Corrects edges based on covalent radii. |
|
Creates a PyTorch Geometric graph. |
|
|
|
Calculates number of connected components. |
|
|
|
|
|
Removes a single hydrogen. |
|
Module Contents¶
- MolecularDiffusion.runmodes.data.augmentation.add_h_unit(atoms: List[str], coords: numpy.ndarray) Tuple[List[str], numpy.ndarray, int] | None¶
Adds a single hydrogen to a non-halogen heteroatom.
- MolecularDiffusion.runmodes.data.augmentation.augment_charge(input_source: str, output_target: str, max_h_change: int = 1, fraction: float = 1.0, seed: int = None, is_db_mode: bool = False)¶
Main function for random charge augmentation via H addition/removal + XTB re-optimization.
- MolecularDiffusion.runmodes.data.augmentation.augment_distortion(input_source: pathlib.Path, output_target: pathlib.Path, sigma: float = 0.1, fraction: float = 1.0, is_db_mode: bool = None, freeze_indices: List[int] = None)¶
Applies random distortion to atomic coordinates. Also calculates graph connectivity.
- MolecularDiffusion.runmodes.data.augmentation.augment_size(input_db: pathlib.Path, output_db: pathlib.Path, s_start: int, t_start: int, s_end: int, t_end: int, strength: float = 1.0, decay_power: float = 1.0, invert_prob: float = 0.5, plot_prefix: pathlib.Path = None)¶
Augments DB by adding rotated/inverted versions to balance molecule sizes.
- MolecularDiffusion.runmodes.data.augmentation.correct_edges(data, scale_factor=1.3)¶
Corrects edges based on covalent radii.
- MolecularDiffusion.runmodes.data.augmentation.create_pyg_graph(cartesian_coordinates_tensor, atomic_numbers_tensor, r=2.5)¶
Creates a PyTorch Geometric graph.
- MolecularDiffusion.runmodes.data.augmentation.get_electronegativity(element_symbol)¶
- MolecularDiffusion.runmodes.data.augmentation.get_num_graphs(data)¶
Calculates number of connected components.
- MolecularDiffusion.runmodes.data.augmentation.get_target_count(s, s_start, t_start, s_end, t_end, decay_power=1.0)¶
- MolecularDiffusion.runmodes.data.augmentation.read_xyz(filename) Tuple[List[str], numpy.ndarray]¶
- MolecularDiffusion.runmodes.data.augmentation.remove_h_unit(atoms: List[str], coords: numpy.ndarray) Tuple[List[str], numpy.ndarray, int] | None¶
Removes a single hydrogen.
- MolecularDiffusion.runmodes.data.augmentation.write_xyz(filename, atoms, coords)¶
- MolecularDiffusion.runmodes.data.augmentation.CSV_FIELDS = ['filename', 'total_charge', 'num_atoms', 'num_graph']¶
- MolecularDiffusion.runmodes.data.augmentation.CUTOFF = 3¶
- MolecularDiffusion.runmodes.data.augmentation.SCALE_FACTOR = 1.2¶
- MolecularDiffusion.runmodes.data.augmentation.logger¶