MolecularDiffusion.modules.models.shepherd_arch.inference.sampler

This module contains the inference sampler for the ShEPhERD model.

Functions

generate(, surface, electrostatics, pharm_types, ...)

Runs inference of ShEPhERD to sample batch_size number of molecules.

generate_from_intermediate_time(, surface, ...)

Runs inpainting-based inference of ShEPhERD starting from an intermediate time step.

Module Contents

MolecularDiffusion.modules.models.shepherd_arch.inference.sampler.generate(model_pl, batch_size: int, N_x1: int, N_x4: int, unconditional: bool, prior_noise_scale: float = 1.0, denoising_noise_scale: float = 1.0, inject_noise_at_ts: list[int] = [], inject_noise_scales: list[int] = [], harmonize: bool = False, harmonize_ts: list[int] = [], harmonize_jumps: list[int] = [], inpaint_x1_pos: bool = False, inpaint_x1_x: bool = False, inpaint_x1_bonds: bool = False, inpaint_x2_pos: bool = False, inpaint_x3_pos: bool = False, inpaint_x3_x: bool = False, inpaint_x4_pos: bool = False, inpaint_x4_direction: bool = False, inpaint_x4_type: bool = False, stop_inpainting_at_time_x1_pos: float = 0.0, stop_inpainting_at_time_x1_x: float = 0.0, stop_inpainting_at_time_x1_bonds: float = 0.0, stop_inpainting_at_time_x2: float = 0.0, add_noise_to_inpainted_x2_pos: float = 0.0, stop_inpainting_at_time_x3: float = 0.0, add_noise_to_inpainted_x3_pos: float = 0.0, add_noise_to_inpainted_x3_x: float = 0.0, stop_inpainting_at_time_x4: float = 0.0, add_noise_to_inpainted_x4_pos: float = 0.0, add_noise_to_inpainted_x4_direction: float = 0.0, add_noise_to_inpainted_x4_type: float = 0.0, mol: rdkit.Chem.Mol | None = None, atom_inds_to_inpaint: list[int] = [], atom_types: list[int] | None = None, atom_pos: numpy.ndarray | None = None, exit_vector_atom_inds: list[int] = [], center_of_mass: numpy.ndarray = np.zeros(3), surface: numpy.ndarray = np.zeros((75, 3)), electrostatics: numpy.ndarray = np.zeros(75), pharm_types: numpy.ndarray | None = None, pharm_pos: numpy.ndarray | None = None, pharm_direction: numpy.ndarray | None = None, num_steps: int = 400, verbose: bool = True, store_trajectories: bool = False, store_trajectories_x0: bool = False, interruption_callback: callable | None = None) list[dict]

Runs inference of ShEPhERD to sample batch_size number of molecules.

We diffuse from time T=400 to time T=0. When represented as a float, this is 1.0 to 0.0.

Parameters:
  • model_pl (PyTorch Lightning module.)

  • batch_size (int Number of molecules to sample in a single batch.)

  • N_x1 (int Number of atoms to diffuse.)

  • N_x4 (int Number of pharmacophores to diffuse.) – If inpainting, can be greater than len(pharm_types) for partial pharmacophore conditioning.

  • unconditional (bool to toggle unconditional generation.)

  • prior_noise_scale (float (default = 1.0) Noise scale of the prior distribution.)

  • denoising_noise_scale (float (default = 1.0) Noise scale for each denoising step.)

  • inject_noise_at_ts (list[int] (default = []) Time steps to inject extra noise.)

  • inject_noise_scales (list[int] (default = []) Scale of noise to inject at above time steps.)

  • harmonize (bool (default=False) Whether to use harmonization.)

  • harmonize_ts (list[int] (default = []) Time steps to to harmonization.)

  • harmonize_jumps (list[int] (default = []) Length of time to harmonize (in time steps).)

  • False* (*all the below options are only relevant if unconditional is)

  • inpaint_x1_pos (bool (default=False))

  • inpaint_x1_x (bool (default=False))

  • inpaint_x1_bonds (bool (default=False))

  • inpaint_x2_pos (bool (default=False) Toggle inpainting.) – Note that x2 is implicitly modeled via x3.

  • inpaint_x3_pos (bool (default=False))

  • inpaint_x3_x (bool (default=False))

  • inpaint_x4_pos (bool (default=False))

  • inpaint_x4_direction (bool (default=False))

  • inpaint_x4_type (bool (default=False))

  • stop_inpainting_at_time_x1_pos (float (default = 0.0)) – Time step to stop inpainting atom positions.

  • stop_inpainting_at_time_x1_x (float (default = 0.0)) – Time step to stop inpainting atom types.

  • stop_inpainting_at_time_x1_bonds (float (default = 0.0)) – Time step to stop inpainting bond types.

  • stop_inpainting_at_time_x2 (float (default = 0.0))

  • add_noise_to_inpainted_x2_pos (float (default = 0.0)) – Scale of noise to add to inpainted values.

  • stop_inpainting_at_time_x3 (float (default = 0.0))

  • add_noise_to_inpainted_x3_pos (float (default = 0.0))

  • add_noise_to_inpainted_x3_x (float (default = 0.0))

  • stop_inpainting_at_time_x4 (float (default = 0.0))

  • add_noise_to_inpainted_x4_pos (float (default = 0.0))

  • add_noise_to_inpainted_x4_direction (float (default = 0.0))

  • add_noise_to_inpainted_x4_type (float (default = 0.0))

  • targets* (*these are the inpainting)

  • mol (Optional[Chem.Mol] (default = None)) – Target molecule specifically for atom-inpainting. If provided, atom_inds_to_inpaint must also be provided. This is required for bond-inpainting. If atom_pos and atom_types are also provided, they will override the atom types and positions extracted from mol.

  • atom_inds_to_inpaint (list[int] (default = [])) – Indices of atoms to inpaint. This is required for bond-inpainting.

  • atom_types (Optional[list[int]] (default = None)) – Atom elements expected as a list of atomic numbers. If provided alongside mol, atom_inds_to_inpaint, it will override the atom types extracted from mol.

  • atom_pos (Optional[np.ndarray] (default = None) Atom positions as coordinates.) – If provided alongside mol, atom_inds_to_inpaint, it will override the atom positions extracted from mol.

  • exit_vector_atom_inds (list[int] (default = []) Indices of atoms to use as "exit vectors".) – If empty and mol and atom_inds_to_inpaint are provided, then the bonds between each of these atoms will be inpainted, but ignore other edges. If provided, the inpainted atoms will be inpainted with bonds to all other atoms that are diffused, (i.e., None bond-type for atoms that should not be bonded to the inpainted atoms) However, the exit vector atom(s) will not inpaint any “None” bond-types to non-inpainted atoms. NOTE: Must be a subset of atom_inds_to_inpaint.

  • center_of_mass (np.ndarray (3,) (default = np.zeros(3)) Must be supplied if target molecule is) – not already centered.

  • surface (np.ndarray (75,3) (default = np.zeros((75,3)) Surface point coordinates.)

  • electrostatics (np.ndarray (75,) (default = np.zeros(75)) Electrostatics at each surface point.)

  • pharm_types (np.ndarray (<=N_x4,) (default = np.zeros(5, dtype = int)) Pharmacophore types.)

  • pharm_pos (np.ndarray (<=N_x4,3) (default = np.zeros((5,3))) Pharmacophore positions as) – coordinates.

  • pharm_direction (np.ndarray (<=N_x4,3) (default = np.zeros((5,3))) Pharmacophore directions as) – unit vectors.

  • num_steps (int (default = 400) Number of time steps to run the sampler for.)

  • save_intermediate (bool (default=False)) – whether to save intermediates –> not implemented

  • start_t_ind (int (default = 0)) – Index of the time step to start from (default is from pure noise). Requires xi_initial_dict to be provided.

  • xi_initial_dict (Optional[dict] (default = None)) – Dictionary containing the initial states of x1, x2, x3, and x4. If None, the states are initialized randomly.

  • noise_dict (Optional[dict] (default = None)) – Dictionary containing the noise parameters for the first inference step. After the first inference step, the noise will be sampled randomly. If None, the noises will be sampled randomly.

  • do_property_cfg (bool (default = False) Whether to use property conditioning.)

  • cfg_weight (float (default = 3.0) Weight of property conditioning.)

  • sa_score (float (default = 1.0) SA score of the target molecule.) – Range is 0-10: 10 is difficult to synthesize.

  • verbose (bool (default = True) Whether to print progress bar.)

  • store_trajectories (bool (default = False) Whether to store the trajectories.)

  • store_trajectories_x0 (bool (default = False) Whether to store the trajectories of the x0 predictions.)

Returns:

generated_structures – Output dictionary is structured as: ‘x1’: {

’atoms’: np.ndarray (N_x1,) of ints for atomic numbers. ‘bonds’: np.ndarray of bond types between every atom pair. ‘positions’: np.ndarray (N_x1, 3) Coordinates of atoms.

}, ‘x2’: {

’positions’: np.ndarray (75, 3) Coordinates of surface points.

}, ‘x3’: {

’charges’: np.ndarray (75, 3) ESP at surface points. ‘positions’: np.ndarray (75, 3) Coordinates of surface points.

}, ‘x4’: {

’types’: np.ndarray (N_x4,) of ints for pharmacophore types. ‘positions’: np.ndarray (N_x4, 3) Coordinates of pharmacophores. ‘directions’: np.ndarray (N_x4, 3) Unit vectors of pharmacophores.

}, }

Return type:

list[dict]

MolecularDiffusion.modules.models.shepherd_arch.inference.sampler.generate_from_intermediate_time(model_pl, batch_size: int, start_time: float, N_x1: int, N_x4: int, mol: rdkit.Chem.Mol, atom_inds_to_inpaint: numpy.ndarray | list[int], exit_vector_atom_inds: list[int] = [], new_atom_placement_region: numpy.ndarray | None = None, new_atom_placement_radius: float = 1.5, new_atom_types: list[str] = ['C', 'O', 'N', 'F', 'H'], new_atom_type_weights: list[float] = [0.3, 0.2, 0.2, 0.1, 0.2], center_of_mass: numpy.ndarray = np.zeros(3), surface: numpy.ndarray = np.zeros((75, 3)), electrostatics: numpy.ndarray = np.zeros(75), pharm_types: numpy.ndarray = np.zeros(5, dtype=int), pharm_pos: numpy.ndarray = np.zeros((5, 3)), pharm_direction: numpy.ndarray = np.zeros((5, 3)), denoising_noise_scale: float = 1.0, inject_noise_at_ts: list[int] = [], inject_noise_scales: list[int] = [], inpaint_x1_bonds: bool = False, stop_inpainting_at_time_x1_pos: float = 0.0, stop_inpainting_at_time_x1_x: float = 0.0, stop_inpainting_at_time_x1_bonds: float = 0.0, verbose: bool = True, store_trajectories: bool = False, store_trajectories_x0: bool = False, interruption_callback: callable | None = None) list[dict]

Runs inpainting-based inference of ShEPhERD starting from an intermediate time step. Requires a conformer that is assumed to partially be inpainted. This function assumes that modalities x1, x3, and x4 are being inpainted.

The key features are: - Starts diffusion from a specified intermediate time start_time. - Uses a full molecule for generating inpainting trajectories. - Allows for partial inpainting of atoms via atom_inds_to_inpaint. - The number of atoms to generate (N_x1) can be different from the number of

atoms in the provided molecule, allowing for addition of atoms (N_x1 > N_atoms).

Parameters:
  • model_pl

  • batch_size (int) – Number of molecules to sample.

  • start_time (float [0, 1]) – The time to start diffusion from. 0 is t=0 (data distribution) and 1 is t=T (prior).

  • N_x1 (int) – The number of atoms to diffuse. Must be >= the number of atoms in the mol.

  • N_x4 (int) – The number of pharmacophores to diffuse.

  • mol (Chem.Mol) – The molecule to inpaint. Requires a conformer.

  • atom_inds_to_inpaint (np.ndarray | list[int]) – The indices of the atoms to inpaint.

  • new_atom_placement_region (Optional[np.ndarray] = None) – Shape (3,). Position around which to place new atoms. Image it to be a cluster center. If None, new atoms are sampled from a Gaussian distribution.

  • new_atom_placement_radius (float [default: 1.5]) – The radius of the sphere to place new atoms in. Only used if new_atom_placement_region is not None.

  • new_atom_types (list[str] [default: ['C', 'O', 'N', 'F', 'H']]) – The types of new atoms to forward noise. Does not guarantee that these atoms will be added.

  • new_atom_type_weights (list[float] [default: [0.3, 0.2, 0.2, 0.1, 0.2]]) – The weights of the new atom types.

  • center_of_mass (np.ndarray (3,) [default: np.zeros(3)]) – The center of mass of the molecule.

  • surface (np.ndarray (75, 3) [default: np.zeros((75, 3))]) – The surface of the molecule.

  • electrostatics (np.ndarray (75,) [default: np.zeros(75)]) – The electrostatics of the molecule.

  • pharm_types (np.ndarray (5,) [default: np.zeros(5, dtype = int)]) – The types of pharmacophores.

  • pharm_pos (np.ndarray (5, 3) [default: np.zeros((5, 3))]) – The positions of the pharmacophores.

  • pharm_direction (np.ndarray (5, 3) [default: np.zeros((5, 3))]) – The directions of the pharmacophores.

  • denoising_noise_scale (float [default: 1.0]) – The scale of the denoising noise.

  • inject_noise_at_ts (list[int] [default: []]) – The times to inject noise.

  • inject_noise_scales (list[int] [default: []]) – The scales of the noise to inject.

  • inpaint_x1_bonds (bool [default: False]) – Whether to inpaint bonds between atoms specified in atom_inds_to_inpaint.

  • stop_inpainting_at_time_x1_pos (float [default: 0.0]) – Time step to stop inpainting atom positions.

  • stop_inpainting_at_time_x1_x (float [default: 0.0]) – Time step to stop inpainting atom types.

  • stop_inpainting_at_time_x1_bonds (float [default: 0.0]) – Time step to stop inpainting bond types.

  • verbose (bool [default: True]) – Whether to print progress.

  • store_trajectories (bool [default: False]) – Whether to store the trajectories.

  • store_trajectories_x0 (bool [default: False]) – Whether to store the trajectories of the initial state.

Returns:

generated_structures – Output dictionary is structured as: ‘x1’: {

’atoms’: np.ndarray (N_x1,) of ints for atomic numbers. ‘bonds’: np.ndarray of bond types between every atom pair. ‘positions’: np.ndarray (N_x1, 3) Coordinates of atoms.

}, ‘x2’: {

’positions’: np.ndarray (75, 3) Coordinates of surface points.

}, ‘x3’: {

’charges’: np.ndarray (75, 3) ESP at surface points. ‘positions’: np.ndarray (75, 3) Coordinates of surface points.

}, ‘x4’: {

’types’: np.ndarray (N_x4,) of ints for pharmacophore types. ‘positions’: np.ndarray (N_x4, 3) Coordinates of pharmacophores. ‘directions’: np.ndarray (N_x4, 3) Unit vectors of pharmacophores.

}, }

Return type:

list[dict]