MolecularDiffusion.modules.tasks.diffusion

Attributes

Classes

GeomMolecularGenerative

Class for load/save configuration.

GuidanceModelPrediction

Class for load/save configuration.

Functions

Module Contents

class MolecularDiffusion.modules.tasks.diffusion.GeomMolecularGenerative(diffusion_model, node_dist_model=None, prop_dist_model=None, n_node_dist: Dict = {}, augment_noise: float = 0, data_augmentation: bool = False, condition: List = [], normalize_condition: str = None, sp_regularizer: MolecularDiffusion.callbacks.SP_regularizer = None, reference_indices: List = None)

Bases: MolecularDiffusion.modules.tasks.task.Task, MolecularDiffusion.core.Configurable

Class for load/save configuration. It will automatically record every argument passed to the __init__ function.

This class is inspired by state_dict() in PyTorch, but designed for hyperparameters.

Inherit this class to construct a configurable class.

>>> class MyClass(nn.Module, core.Configurable):

Note Configurable only applies to the current class rather than any derived class. For example, the following definition only records the arguments of MyClass.

>>> class DerivedClass(MyClass):

In order to record the arguments of DerivedClass, explicitly specify the inheritance.

>>> class DerivedClass(MyClass, core.Configurable):

To get the configuration of an instance, use config_dict(), which returns a dict of argument names and values. If an argument is also an instance of Configurable, it will be recursively expanded in the dict. The configuration dict can be passed to load_config_dict() to create a copy of the instance.

For classes already registered in Registry, they can be directly created from the Configurable class. This is convenient for building models from configuration files.

>>> config = models.GCN(128, [128]).config_dict()
>>> gcn = Configurable.load_config_dict(config)

Generative Diffusion model for molecular structures. Parameters: - diffusion_model: The dynamic functional model for diffusion. - node_dist_model (Optional[NodeDistributionModel]): The model for number of node distribution. Default is None. - prop_dist_model (Optional[PropertyDistributionModel]): The model for property distribution. Default is None. - n_node_dist (Dict): The distribution of number of nodes. Default is {}. - augment_noise (float): The amount of noise to add to the coordinates for data augmentation. Default is 0. - data_augmentation (bool): Whether to apply data augmentation by symmetry operations. Default is False. - condition (List): The list of conditions for the model. Default is []. - normalize_condition (str): The normalization method for the condition. Default is None. [None, “maxmin”, “mad”] - sp_regularizer (SP_regularizer): The self-pace learning regularizer for the model. Default is None.

density_estimation(batch)
evaluate(all_loss, dummy_tensor)
forward(batch)
predict_and_target(batch)
preprocess(train_set=None)
sample(nodesxsample=torch.tensor([10]), context=None, condition_tensor=None, condition_mode=None, fix_noise=False, n_frames=0, n_retrys=0, t_retry=180, mode='ddpm', **kwargs)

Sample molecular structures.

Parameters: - nodesxsample (Tensor): Number of nodes per sample. - context (Optional[Tensor]): Context tensor for sampling. Default is None. - condition_tensor (Optional[Tensor]): Condition tensor for sampling. Default is None.

Note that it has to be normalized the same way as the training set. Size = [batch size, n_atom, n_features]

  • condition_mode (Optional[str]): Mode for conditioning. Default is None.

    Format: [condition_name]_[component_alg] component name can be x, h, or xh component_alg: SSGD, …

  • fix_noise (bool): Fix noise for visualization purposes. Default is False.

  • n_frames (int): Number of frames to keep. Default is 0.

  • n_retrys (int): Number of retry attempts in the event of bad molecules . Default is 0.

  • t_retrys (int): Timestep to start retrying. Default is 180.

  • mode (str): Mode for sampling. Default is “ddpm [“ddpm”, “ddim”].

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, positions, and node mask.

sample_around_xh_target(nodesxsample=torch.tensor([10]), xh_target=None, context=None, fix_noise=False)

Sample molecular structures.

Parameters: - nodesxsample (Tensor): Number of nodes per sample. - xh_target (Tensor): target xh: [batch size, n_atom, n_features] - context (Optional[Tensor]): Context tensor for sampling. Default is None. - fix_noise (bool): Fix noise for visualization purposes. Default is False.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, positions, and node mask.

sample_chain(n_nodes: int, n_tries: int, keep_frames: int = 100)

Sample a molecule for visualizing the diffusion process.

Parameters: - n_nodes (int): Number of nodes in the molecular graph. - n_tries (int): Number of attempts to find a stable molecule. - keep_frames (int): Number of frames to keep. Default is 100.

Returns: Tuple[Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, and positions.

sample_chain_guide(n_nodes: int, n_tries: int, target_function, scale: float = 1, max_norm=10, std: float = 1.0, scheduler=None, keep_frames: int = 100)

Sample a molecule for visualizing the diffusion process.

Parameters: - n_nodes (int): Number of nodes in the molecular graph. - n_tries (int): Number of attempts to find a stable molecule. - target_function (Callable[[Tensor], Tensor]): Target function for guidance. Higher value, better - scale (float): Scale factor for guidance. Default is 1.0. - max_norm (float): Initial maximum norm for the gradients. Default is 10.0. - std (float): Standard deviation of the noise. Default is 1.0. - scheduler (RateScheduler): Rate scheduler. Default is None.

The scheduler should have a step method that takes the energy and the current scale as input.

  • keep_frames (int): Number of frames to keep. Default is 100.

Returns: Tuple[Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, and positions.

sample_conditonal(nodesxsample=torch.tensor([10]), target_value=[0], fix_noise=False, mode='ddpm', n_frames=0)

Sample molecular structures conditioned on a property value. Only works if the model is trained with a property distribution.

The interval should be wider than the bin width of the property distribution. If the interval is too narrow, the model might just get the same molecule.

Parameters: - nodesxsample (Tensor): Number of nodes per sample. - target_value (List[float]): Target values for conditional sampling. - fix_nose (bool): Fix noise for visualization purposes. Default is False. - mode (str): Mode for sampling. Default is “ddpm [“ddpm”, “ddim”]. - n_frames (int): Number of frames to keep. Default is 0.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, positions, and node mask.

sample_guidance(target_function, nodesxsample=torch.tensor([10]), scale=1, max_norm=10, std=1.0, fix_noise=False, scheduler=None, guidance_at=0, guidance_stop=1, guidance_ver=1, n_backwards=0, h_weight=1, x_weight=1, n_frames=0, debug=False)

Sample molecular structures with guidance from target function.

Parameters: - target_function (Callable[[Tensor], Tensor]): Target function for guidance. Higher value, better - nodesxsample (Tensor): Number of nodes per sample. Default is torch.tensor([10]). - scale (float): Scale factor for gradient guidance. Default is 1.0. - max_norm (float): Initial maximum norm for the gradients. Default is 10.0. - std (float): Standard deviation of the noise. Default is 1.0. - fix_noise (bool): Fix noise for visualization purposes. Default is False. - scheduler (RateScheduler): Rate scheduler. Default is None.

The scheduler should have a step method that takes the energy and the current scale as input.

  • guidance_at (int): The timestep at which to apply guidance [0-1] 0 = since beginning. Default is 1.

  • guidance_stop (int): The timestep at which to stop applying guidance [0-1] 1 = until the end. Default is 0.

  • guidance_ver (int): The version of the guidance. Default is 1. [0,1,2,cfg,cfg_gg]

  • n_backwards (int): Number of backward steps. Default is 0.

  • h_weight (float): Weight for the gradient of atom feature. Default is 1.0.

  • x_weight (float): Weight for the gradient of cartesian coordinate. Default is 1.0.

  • context (Optional[Tensor]): Context tensor for sampling. Default is None.

  • condition_tensor (Optional[Tensor]): Condition tensor for sampling. Default is None.

  • n_frames (int): Number of frames to keep. Default is 0.

  • debug (bool): Debug mode. Default is False.

    Save gradient norms, max gradients, clipping coefficients, and energies to files.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: Positions, one-hot encoding of atoms, node mask, and edge mask.

sample_guidance_conitional(target_function, target_value=[0], negative_target_value=[], nodesxsample=torch.tensor([10]), gg_scale=1, cfg_scale=1, cfg_scale_schedule=None, max_norm=10, std=1.0, fix_noise=False, scheduler=None, guidance_at=1, guidance_stop=0, guidance_ver=1, n_backwards=0, h_weight=1, x_weight=1, n_frames=0, debug=False)

Sample molecular structures with guidance from target function and conditional property.

Parameters: - target_function (Callable[[Tensor], Tensor]): Target function for guidance. Higher value, better - target_value (List[float]): Target values for conditional sampling. - nodesxsample (Tensor): Number of nodes per sample. Default is torch.tensor([10]). - gg_scale (float): Scale factor for gradient guidance. Default is 1.0. - cfg_scale (float): Scale factor for classifier-free guidance. Default is 1.0. - cfg_scale_schedule (str, optional): Scheduler for cfg scale. Default is None. [linear, exponential, cosine] - max_norm (float): Initial maximum norm for the gradients. Default is 10.0. - std (float): Standard deviation of the noise. Default is 1.0. - fix_noise (bool): Fix noise for visualization purposes. Default is False. - scheduler (RateScheduler): Rate scheduler. Default is None.

The scheduler should have a step method that takes the energy and the current scale as input.

  • guidance_at (int): The timestep at which to apply guidance [0-1] 0 = since beginning. Default is 1.

  • guidance_stop (int): The timestep at which to stop applying guidance [0-1] 1 = until the end. Default is 0.

  • guidance_ver (int): The version of the guidance. Default is 1. [0,1,2,cfg,cfg_gg]

  • n_backwards (int): Number of backward steps. Default is 0.

  • h_weight (float): Weight for the gradient of atom feature. Default is 1.0.

  • x_weight (float): Weight for the gradient of cartesian coordinate. Default is 1.0.

  • n_frames (int): Number of frames to keep. Default is 0.

  • debug (bool): Debug mode. Default is False.

    Save gradient norms, max gradients, clipping coefficients, and energies to files.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: Positions, one-hot encoding of atoms, node mask, and edge mask.

sample_hybrid_guidance(target_function, target_value=[0], negative_target_value=[], nodesxsample=torch.tensor([10]), gg_scale=1, cfg_scale=1, cfg_scale_schedule=None, max_norm=10, std=1.0, fix_noise=False, scheduler=None, guidance_at=1, guidance_stop=0, guidance_ver=1, n_backwards=0, h_weight=1, x_weight=1, condition_tensor=None, condition_mode=None, inpaint_cfgs={}, outpaint_cfgs={}, n_frames=0, debug=False)

Sample molecular structures with guidance from target function and conditional property.

Parameters: - target_function (Callable[[Tensor], Tensor]): Target function for guidance. Higher value, better - target_value (List[float]): Target values for conditional sampling. - nodesxsample (Tensor): Number of nodes per sample. Default is torch.tensor([10]). - gg_scale (float): Scale factor for gradient guidance. Default is 1.0. - cfg_scale (float): Scale factor for classifier-free guidance. Default is 1.0. - cfg_scale_schedule (str, optional): Scheduler for cfg scale. Default is None. [linear, exponential, cosine] - max_norm (float): Initial maximum norm for the gradients. Default is 10.0. - std (float): Standard deviation of the noise. Default is 1.0. - fix_noise (bool): Fix noise for visualization purposes. Default is False. - scheduler (RateScheduler): Rate scheduler. Default is None.

The scheduler should have a step method that takes the energy and the current scale as input.

  • guidance_at (int): The timestep at which to apply guidance [0-1] 0 = since beginning. Default is 1.

  • guidance_stop (int): The timestep at which to stop applying guidance [0-1] 1 = until the end. Default is 0.

  • guidance_ver (int): The version of the guidance. Default is 1. [0,1,2,cfg,cfg_gg]

  • n_backwards (int): Number of backward steps. Default is 0.

  • h_weight (float): Weight for the gradient of atom feature. Default is 1.0.

  • x_weight (float): Weight for the gradient of cartesian coordinate. Default is 1.0.

  • debug (bool): Debug mode. Default is False.

    Save gradient norms, max gradients, clipping coefficients, and energies to files.

  • condition_tensor (torch.Tensor, optional): Tensor for conditional guidance. Defaults to None.

  • condition_mode (str, optional): Mode for conditional guidance. Defaults to None.

  • inpaint_cfgs (dict, optional): Configuration for inpainting.
    The dictionary must contains:
    • mask_node_index (torch.Tensor, optional): Indices of nodes to be inpainted. Defaults to an empty tensor.

    • denoising_strength (float, optional): Strength of denoising for inpainting

    • noise_initial_mask (bool, optional): Whether to noise the initial masked region. Defaults to False.

  • outpaint_cfgs (dict, optional): Configuration for outpainting.
    The dictionary must contains:
    • t_start (float, optional): Timestep to start the generation. Defaults to 1.0.

    • t_critical (float, optional): Timestep threshold for applying reference tensor constraints. Defaults to None.

` - connector_index (torch.Tensor, optional): Indices of connector nodes for outpainting. Defaults to an empty tensor.
  • seed_dist (float, optional): Distance of the seed from the connector atom (used if n_bq_atom == 0)..

  • min_dist (float, optional): Minimum distance from any existing atom in xh_cond (except the connector itself). Defaults to 1.

  • spread (float, optional): Spread of the initiating nodes. Defaults is 1 angstrom.

  • n_bq_atom (int, optional): Number of dummy atoms. Defaults is 0.

  • n_frames (int, optional): Number of frames to keep. Defaults to 0.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: Positions, one-hot encoding of atoms, node mask, and edge mask.

augment_noise = 0
condition = []
data_augmentation = False
model
n_atom_types
n_dim_data
n_node_dist
node_dist_model = None
normalize_condition = None
prop_dist_model = None
reference_indices = None
sp_regularizer = None
class MolecularDiffusion.modules.tasks.diffusion.GuidanceModelPrediction(model, noisemodel, task=(), include_charge=True, metric=('mae', 'rmse'), num_mlp_layer=1, normalization=True, num_class=None, mlp_batch_norm=True, readout='mean', mlp_dropout=0, std_mean=None, load_mlps_layer=0, nextra_nf=0, norm_values=(1.0, 1.0, 1.0), extra_norm_values=(), norm_biases=(None, 0.0, 0.0), weight_classes=None, t_max=1, verbose=0)

Bases: MolecularDiffusion.modules.tasks.task.Task, MolecularDiffusion.core.Configurable

Class for load/save configuration. It will automatically record every argument passed to the __init__ function.

This class is inspired by state_dict() in PyTorch, but designed for hyperparameters.

Inherit this class to construct a configurable class.

>>> class MyClass(nn.Module, core.Configurable):

Note Configurable only applies to the current class rather than any derived class. For example, the following definition only records the arguments of MyClass.

>>> class DerivedClass(MyClass):

In order to record the arguments of DerivedClass, explicitly specify the inheritance.

>>> class DerivedClass(MyClass, core.Configurable):

To get the configuration of an instance, use config_dict(), which returns a dict of argument names and values. If an argument is also an instance of Configurable, it will be recursively expanded in the dict. The configuration dict can be passed to load_config_dict() to create a copy of the instance.

For classes already registered in Registry, they can be directly created from the Configurable class. This is convenient for building models from configuration files.

>>> config = models.GCN(128, [128]).config_dict()
>>> gcn = Configurable.load_config_dict(config)
evaluate(pred, target)
forward(batch)
get_adj_matrix(_edges_dict, n_nodes, batch_size)
normalize(x, h, node_mask)
pad_data(array, batch, dim)

” array: torch.Tensor of shape (n_atoms, n_features) batch: pytorch_geometric.data.Batch

predict(batch, all_loss=None, metric=None, evaluate=False)

” If evaluate is True, the data must be normalized beforehand.

preprocess(train_set, valid_set=None, test_set=None)

Compute the mean and derivation for each task on the training set.

readout_f(embeddings: torch.Tensor) torch.Tensor

Perform readout operation over nodes in each molecule.

Parameters: - embeddings (torch.Tensor): Tensor of size (x, y, z) where x is the batch size, y is the number of nodes, and z is the feature size.

Returns: torch.Tensor: Aggregated tensor of size (x, z).

sample_combined_position_feature_noise(n_samples, n_nodes, node_mask, std=1.0)

Samples mean-centered normal noise for z_x, and standard normal noise for z_h.

subspace_dimensionality(node_mask)

Compute the dimensionality on translation-invariant linear subspace where distributions on x are defined.

target(batch)
T
criterion
eps = 1e-10
extra_norm_values = ()
gamma
in_node_nf
include_charge = True
load_mlps_layer = 0
metric = ('mae', 'rmse')
mlp = None
mlp_batch_norm = True
mlp_dropout = 0
mlp_final = None
model
n_dims = 3
ndim_extra = 0
norm_biases = (None, 0.0, 0.0)
norm_values = (1.0, 1.0, 1.0)
normalization = True
num_class
num_mlp_layer = 1
num_targets
readout = 'mean'
std_mean = None
t_max = 1
task = ()
verbose = 0
MolecularDiffusion.modules.tasks.diffusion.reverse_tensor(x)
MolecularDiffusion.modules.tasks.diffusion.logger