MolecularDiffusion.runmodes.data.augmentation

Data augmentation module for MolCraft. Handles random charge, distortion, and size augmentation.

Attributes

Functions

add_h_unit(→ Optional[Tuple[List[str], numpy.ndarray, ...)

Adds a single hydrogen to a non-halogen heteroatom.

augment_charge(input_source, output_target[, ...])

Main function for random charge augmentation via H addition/removal + XTB re-optimization.

augment_distortion(input_source, output_target[, ...])

Applies random distortion to atomic coordinates.

augment_size(input_db, output_db, s_start, t_start, ...)

Augments DB by adding rotated/inverted versions to balance molecule sizes.

correct_edges(data[, scale_factor])

Corrects edges based on covalent radii.

create_pyg_graph(cartesian_coordinates_tensor, ...[, r])

Creates a PyTorch Geometric graph.

get_electronegativity(element_symbol)

get_num_graphs(data)

Calculates number of connected components.

get_target_count(s, s_start, t_start, s_end, t_end[, ...])

read_xyz(→ Tuple[List[str], numpy.ndarray])

remove_h_unit(→ Optional[Tuple[List[str], ...)

Removes a single hydrogen.

write_xyz(filename, atoms, coords)

Module Contents

MolecularDiffusion.runmodes.data.augmentation.add_h_unit(atoms: List[str], coords: numpy.ndarray) Tuple[List[str], numpy.ndarray, int] | None

Adds a single hydrogen to a non-halogen heteroatom.

MolecularDiffusion.runmodes.data.augmentation.augment_charge(input_source: str, output_target: str, max_h_change: int = 1, fraction: float = 1.0, seed: int = None, is_db_mode: bool = False)

Main function for random charge augmentation via H addition/removal + XTB re-optimization.

MolecularDiffusion.runmodes.data.augmentation.augment_distortion(input_source: pathlib.Path, output_target: pathlib.Path, sigma: float = 0.1, fraction: float = 1.0, is_db_mode: bool = None, freeze_indices: List[int] = None)

Applies random distortion to atomic coordinates. Also calculates graph connectivity.

MolecularDiffusion.runmodes.data.augmentation.augment_size(input_db: pathlib.Path, output_db: pathlib.Path, s_start: int, t_start: int, s_end: int, t_end: int, strength: float = 1.0, decay_power: float = 1.0, invert_prob: float = 0.5, plot_prefix: pathlib.Path = None)

Augments DB by adding rotated/inverted versions to balance molecule sizes.

MolecularDiffusion.runmodes.data.augmentation.correct_edges(data, scale_factor=1.3)

Corrects edges based on covalent radii.

MolecularDiffusion.runmodes.data.augmentation.create_pyg_graph(cartesian_coordinates_tensor, atomic_numbers_tensor, r=2.5)

Creates a PyTorch Geometric graph.

MolecularDiffusion.runmodes.data.augmentation.get_electronegativity(element_symbol)
MolecularDiffusion.runmodes.data.augmentation.get_num_graphs(data)

Calculates number of connected components.

MolecularDiffusion.runmodes.data.augmentation.get_target_count(s, s_start, t_start, s_end, t_end, decay_power=1.0)
MolecularDiffusion.runmodes.data.augmentation.read_xyz(filename) Tuple[List[str], numpy.ndarray]
MolecularDiffusion.runmodes.data.augmentation.remove_h_unit(atoms: List[str], coords: numpy.ndarray) Tuple[List[str], numpy.ndarray, int] | None

Removes a single hydrogen.

MolecularDiffusion.runmodes.data.augmentation.write_xyz(filename, atoms, coords)
MolecularDiffusion.runmodes.data.augmentation.CSV_FIELDS = ['filename', 'total_charge', 'num_atoms', 'num_graph']
MolecularDiffusion.runmodes.data.augmentation.CUTOFF = 3
MolecularDiffusion.runmodes.data.augmentation.SCALE_FACTOR = 1.2
MolecularDiffusion.runmodes.data.augmentation.logger