MolecularDiffusion.modules.tasks.diffusion¶

Attributes¶

logger

Classes¶

`GeomMolecularGenerative`	Class for load/save configuration.
`GuidanceModelPrediction`	Class for load/save configuration.
`GuidanceModelPredictionPointCloud`	PointCloud-optimized subclass of GuidanceModelPrediction for EGCL/EGNN.

Functions¶

reverse_tensor(x)

Module Contents¶

class MolecularDiffusion.modules.tasks.diffusion.GeomMolecularGenerative(diffusion_model, node_dist_model=None, prop_dist_model=None, n_node_dist: Dict = {}, augment_noise: float = 0, data_augmentation: bool = False, condition: List = [], normalize_condition: str = None, sp_regularizer: MolecularDiffusion.callbacks.SP_regularizer = None, reference_indices: List = None, reference_freeze_mode: str = 'all')¶

Bases: MolecularDiffusion.modules.tasks.task.Task, MolecularDiffusion.core.Configurable

Class for load/save configuration. It will automatically record every argument passed to the __init__ function.

This class is inspired by state_dict() in PyTorch, but designed for hyperparameters.

Inherit this class to construct a configurable class.

>>> class MyClass(nn.Module, core.Configurable):

Note Configurable only applies to the current class rather than any derived class. For example, the following definition only records the arguments of MyClass.

>>> class DerivedClass(MyClass):

In order to record the arguments of DerivedClass, explicitly specify the inheritance.

>>> class DerivedClass(MyClass, core.Configurable):

To get the configuration of an instance, use config_dict(), which returns a dict of argument names and values. If an argument is also an instance of Configurable, it will be recursively expanded in the dict. The configuration dict can be passed to load_config_dict() to create a copy of the instance.

For classes already registered in Registry, they can be directly created from the Configurable class. This is convenient for building models from configuration files.

>>> config = models.GCN(128, [128]).config_dict()
>>> gcn = Configurable.load_config_dict(config)

Generative Diffusion model for molecular structures. Parameters: - diffusion_model: The dynamic functional model for diffusion. - node_dist_model (Optional[NodeDistributionModel]): The model for number of node distribution. Default is None. - prop_dist_model (Optional[PropertyDistributionModel]): The model for property distribution. Default is None. - n_node_dist (Dict): The distribution of number of nodes. Default is {}. - augment_noise (float): The amount of noise to add to the coordinates for data augmentation. Default is 0. - data_augmentation (bool): Whether to apply data augmentation by symmetry operations. Default is False. - condition (List): The list of conditions for the model. Default is []. - normalize_condition (str): The normalization method for the condition. Default is None. [None, “maxmin”, “mad”] - sp_regularizer (SP_regularizer): The self-pace learning regularizer for the model. Default is None.

density_estimation(batch)¶

evaluate(all_loss, dummy_tensor)¶

forward(batch)¶

predict_and_target(batch)¶

preprocess(train_set=None)¶

sample(nodesxsample=torch.tensor([10]), context=None, condition_tensor=None, condition_mode=None, fix_noise=False, n_frames=0, n_retrys=0, t_retry=180, mode='ddpm', use_noised_conditioning=False, **kwargs)¶

Sample molecular structures.

Parameters: - nodesxsample (Tensor): Number of nodes per sample. - context (Optional[Tensor]): Context tensor for sampling. Default is None. - condition_tensor (Optional[Tensor]): Condition tensor for sampling. Default is None.

Note that it has to be normalized the same way as the training set. Size = [batch size, n_atom, n_features]

condition_mode (Optional[str]): Mode for conditioning. Default is None.
Format: [condition_name]_[component_alg] component name can be x, h, or xh component_alg: SSGD, …
fix_noise (bool): Fix noise for visualization purposes. Default is False.
n_frames (int): Number of frames to keep. Default is 0.
n_retrys (int): Number of retry attempts in the event of bad molecules . Default is 0.
t_retrys (int): Timestep to start retrying. Default is 180.
mode (str): Mode for sampling. Default is “ddpm [“ddpm”, “ddim”].

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, positions, and node mask.

sample_around_xh_target(nodesxsample=torch.tensor([10]), xh_target=None, context=None, fix_noise=False)¶

Sample molecular structures.

Parameters: - nodesxsample (Tensor): Number of nodes per sample. - xh_target (Tensor): target xh: [batch size, n_atom, n_features] - context (Optional[Tensor]): Context tensor for sampling. Default is None. - fix_noise (bool): Fix noise for visualization purposes. Default is False.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, positions, and node mask.

sample_chain(n_nodes: int, n_tries: int, keep_frames: int = 100)¶

Sample a molecule for visualizing the diffusion process.

Parameters: - n_nodes (int): Number of nodes in the molecular graph. - n_tries (int): Number of attempts to find a stable molecule. - keep_frames (int): Number of frames to keep. Default is 100.

Returns: Tuple[Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, and positions.

sample_chain_guide(n_nodes: int, n_tries: int, target_function, scale: float = 1, max_norm=10, std: float = 1.0, scheduler=None, keep_frames: int = 100)¶

Sample a molecule for visualizing the diffusion process.

Parameters: - n_nodes (int): Number of nodes in the molecular graph. - n_tries (int): Number of attempts to find a stable molecule. - target_function (Callable[[Tensor], Tensor]): Target function for guidance. Higher value, better - scale (float): Scale factor for guidance. Default is 1.0. - max_norm (float): Initial maximum norm for the gradients. Default is 10.0. - std (float): Standard deviation of the noise. Default is 1.0. - scheduler (RateScheduler): Rate scheduler. Default is None.

The scheduler should have a step method that takes the energy and the current scale as input.

keep_frames (int): Number of frames to keep. Default is 100.

Returns: Tuple[Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, and positions.

sample_conditonal(nodesxsample=torch.tensor([10]), target_value=[0], fix_noise=False, mode='ddpm', n_frames=0)¶

Sample molecular structures conditioned on a property value. Only works if the model is trained with a property distribution.

The interval should be wider than the bin width of the property distribution. If the interval is too narrow, the model might just get the same molecule.

Parameters: - nodesxsample (Tensor): Number of nodes per sample. - target_value (List[float]): Target values for conditional sampling. - fix_nose (bool): Fix noise for visualization purposes. Default is False. - mode (str): Mode for sampling. Default is “ddpm [“ddpm”, “ddim”]. - n_frames (int): Number of frames to keep. Default is 0.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: One-hot encoding of atoms, charges, positions, and node mask.

sample_guidance(target_function, nodesxsample=torch.tensor([10]), scale=1, max_norm=10, std=1.0, fix_noise=False, scheduler=None, guidance_at=0, guidance_stop=1, guidance_ver=1, n_backwards=0, h_weight=1, x_weight=1, n_frames=0, debug=False)¶

Sample molecular structures with guidance from target function.

Parameters: - target_function (Callable[[Tensor], Tensor]): Target function for guidance. Higher value, better - nodesxsample (Tensor): Number of nodes per sample. Default is torch.tensor([10]). - scale (float): Scale factor for gradient guidance. Default is 1.0. - max_norm (float): Initial maximum norm for the gradients. Default is 10.0. - std (float): Standard deviation of the noise. Default is 1.0. - fix_noise (bool): Fix noise for visualization purposes. Default is False. - scheduler (RateScheduler): Rate scheduler. Default is None.

The scheduler should have a step method that takes the energy and the current scale as input.

guidance_at (int): The timestep at which to apply guidance [0-1] 0 = since beginning. Default is 1.
guidance_stop (int): The timestep at which to stop applying guidance [0-1] 1 = until the end. Default is 0.
guidance_ver (int): The version of the guidance. Default is 1. [0,1,2,cfg,cfg_gg]
n_backwards (int): Number of backward steps. Default is 0.
h_weight (float): Weight for the gradient of atom feature. Default is 1.0.
x_weight (float): Weight for the gradient of cartesian coordinate. Default is 1.0.
context (Optional[Tensor]): Context tensor for sampling. Default is None.
condition_tensor (Optional[Tensor]): Condition tensor for sampling. Default is None.
n_frames (int): Number of frames to keep. Default is 0.
debug (bool): Debug mode. Default is False.
Save gradient norms, max gradients, clipping coefficients, and energies to files.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: Positions, one-hot encoding of atoms, node mask, and edge mask.

sample_guidance_conitional(target_function, target_value=[0], negative_target_value=[], nodesxsample=torch.tensor([10]), gg_scale=1, cfg_scale=1, cfg_scale_schedule=None, max_norm=10, std=1.0, fix_noise=False, scheduler=None, guidance_at=1, guidance_stop=0, guidance_ver=1, n_backwards=0, h_weight=1, x_weight=1, n_frames=0, debug=False)¶

Sample molecular structures with guidance from target function and conditional property.

Parameters: - target_function (Callable[[Tensor], Tensor]): Target function for guidance. Higher value, better - target_value (List[float]): Target values for conditional sampling. - nodesxsample (Tensor): Number of nodes per sample. Default is torch.tensor([10]). - gg_scale (float): Scale factor for gradient guidance. Default is 1.0. - cfg_scale (float): Scale factor for classifier-free guidance. Default is 1.0. - cfg_scale_schedule (str, optional): Scheduler for cfg scale. Default is None. [linear, exponential, cosine] - max_norm (float): Initial maximum norm for the gradients. Default is 10.0. - std (float): Standard deviation of the noise. Default is 1.0. - fix_noise (bool): Fix noise for visualization purposes. Default is False. - scheduler (RateScheduler): Rate scheduler. Default is None.

The scheduler should have a step method that takes the energy and the current scale as input.

guidance_at (int): The timestep at which to apply guidance [0-1] 0 = since beginning. Default is 1.
guidance_stop (int): The timestep at which to stop applying guidance [0-1] 1 = until the end. Default is 0.
guidance_ver (int): The version of the guidance. Default is 1. [0,1,2,cfg,cfg_gg]
n_backwards (int): Number of backward steps. Default is 0.
h_weight (float): Weight for the gradient of atom feature. Default is 1.0.
x_weight (float): Weight for the gradient of cartesian coordinate. Default is 1.0.
n_frames (int): Number of frames to keep. Default is 0.
debug (bool): Debug mode. Default is False.
Save gradient norms, max gradients, clipping coefficients, and energies to files.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: Positions, one-hot encoding of atoms, node mask, and edge mask.

sample_hybrid_guidance(target_function, target_value=[0], negative_target_value=[], nodesxsample=torch.tensor([10]), gg_scale=1, cfg_scale=1, cfg_scale_schedule=None, max_norm=10, std=1.0, fix_noise=False, scheduler=None, guidance_at=1, guidance_stop=0, guidance_ver=1, n_backwards=0, h_weight=1, x_weight=1, condition_tensor=None, condition_mode=None, inpaint_cfgs={}, outpaint_cfgs={}, use_noised_conditioning=False, n_frames=0, debug=False)¶

Sample molecular structures with guidance from target function and conditional property.

Parameters: - target_function (Callable[[Tensor], Tensor]): Target function for guidance. Higher value, better - target_value (List[float]): Target values for conditional sampling. - nodesxsample (Tensor): Number of nodes per sample. Default is torch.tensor([10]). - gg_scale (float): Scale factor for gradient guidance. Default is 1.0. - cfg_scale (float): Scale factor for classifier-free guidance. Default is 1.0. - cfg_scale_schedule (str, optional): Scheduler for cfg scale. Default is None. [linear, exponential, cosine] - max_norm (float): Initial maximum norm for the gradients. Default is 10.0. - std (float): Standard deviation of the noise. Default is 1.0. - fix_noise (bool): Fix noise for visualization purposes. Default is False. - scheduler (RateScheduler): Rate scheduler. Default is None.

The scheduler should have a step method that takes the energy and the current scale as input.

guidance_at (int): The timestep at which to apply guidance [0-1] 0 = since beginning. Default is 1.

guidance_stop (int): The timestep at which to stop applying guidance [0-1] 1 = until the end. Default is 0.

guidance_ver (int): The version of the guidance. Default is 1. [0,1,2,cfg,cfg_gg]

n_backwards (int): Number of backward steps. Default is 0.

h_weight (float): Weight for the gradient of atom feature. Default is 1.0.

x_weight (float): Weight for the gradient of cartesian coordinate. Default is 1.0.

debug (bool): Debug mode. Default is False.
Save gradient norms, max gradients, clipping coefficients, and energies to files.

condition_tensor (torch.Tensor, optional): Tensor for conditional guidance. Defaults to None.

condition_mode (str, optional): Mode for conditional guidance. Defaults to None.

inpaint_cfgs (dict, optional): Configuration for inpainting.

The dictionary must contains:

mask_node_index (torch.Tensor, optional): Indices of nodes to be inpainted. Defaults to an empty tensor.

denoising_strength (float, optional): Strength of denoising for inpainting

noise_initial_mask (bool, optional): Whether to noise the initial masked region. Defaults to False.

outpaint_cfgs (dict, optional): Configuration for outpainting.

The dictionary must contains:

t_start (float, optional): Timestep to start the generation. Defaults to 1.0.

t_critical (float, optional): Timestep threshold for applying reference tensor constraints. Defaults to None.

` - connector_index (torch.Tensor, optional): Indices of connector nodes for outpainting. Defaults to an empty tensor.

seed_dist (float, optional): Distance of the seed from the connector atom (used if n_bq_atom == 0)..

min_dist (float, optional): Minimum distance from any existing atom in xh_cond (except the connector itself). Defaults to 1.

spread (float, optional): Spread of the initiating nodes. Defaults is 1 angstrom.

n_bq_atom (int, optional): Number of dummy atoms. Defaults is 0.

n_frames (int, optional): Number of frames to keep. Defaults to 0.

Returns: Tuple[Tensor, Tensor, Tensor, Tensor]: Positions, one-hot encoding of atoms, node mask, and edge mask.

augment_noise = 0¶

condition = []¶

data_augmentation = False¶

model¶

n_atom_types¶

n_dim_data¶

n_node_dist¶

node_dist_model = None¶

normalize_condition = None¶

prop_dist_model = None¶

reference_feature_stats = None¶

reference_freeze_mode = 'all'¶

reference_indices = None¶

reference_scaffold = None¶

sp_regularizer = None¶

class MolecularDiffusion.modules.tasks.diffusion.GuidanceModelPrediction(model, noisemodel, task=(), include_charge=True, metric=('mae', 'rmse'), num_mlp_layer=1, normalization=True, num_class=None, mlp_batch_norm=None, readout='mean', mlp_dropout=0, std_mean=None, load_mlps_layer=0, nextra_nf=0, norm_values=(1.0, 1.0, 1.0), extra_norm_values=(), norm_biases=(None, 0.0, 0.0), weight_classes=None, t_max=1, verbose=0, prediction_mlp_type='pernode', prediction_activation='relu', loss_weighting='none', **kwargs)¶

Bases: MolecularDiffusion.modules.tasks.task.Task, MolecularDiffusion.core.Configurable

Class for load/save configuration. It will automatically record every argument passed to the __init__ function.

This class is inspired by state_dict() in PyTorch, but designed for hyperparameters.

Inherit this class to construct a configurable class.

>>> class MyClass(nn.Module, core.Configurable):

Note Configurable only applies to the current class rather than any derived class. For example, the following definition only records the arguments of MyClass.

>>> class DerivedClass(MyClass):

In order to record the arguments of DerivedClass, explicitly specify the inheritance.

>>> class DerivedClass(MyClass, core.Configurable):

For classes already registered in Registry, they can be directly created from the Configurable class. This is convenient for building models from configuration files.

>>> config = models.GCN(128, [128]).config_dict()
>>> gcn = Configurable.load_config_dict(config)

evaluate(pred, target)¶

forward(batch)¶

get_adj_matrix(_edges_dict, n_nodes, batch_size)¶

get_loss_weight(t)¶

Compute importance weights for the loss based on the timestep t.

Parameters:: t – Tensor of shape (B, 1) or (B,) containing normalized timesteps in [0, 1].
Returns:: Tensor of importance weights.
Return type:: w

normalize(x, h, node_mask)¶

pad_data(array, batch, dim)¶: ” array: torch.Tensor of shape (n_atoms, n_features) batch: pytorch_geometric.data.Batch

predict(batch, all_loss=None, metric=None, evaluate=False)¶: ” If evaluate is True, the data must be normalized beforehand.

preprocess(train_set, valid_set=None, test_set=None)¶: Compute the mean and derivation for each task on the training set.

readout_f(embeddings: torch.Tensor) → torch.Tensor¶

Perform readout operation over nodes in each molecule.

Parameters: - embeddings (torch.Tensor): Tensor of size (x, y, z) where x is the batch size, y is the number of nodes, and z is the feature size.

Returns: torch.Tensor: Aggregated tensor of size (x, z).

sample_combined_position_feature_noise(n_samples, n_nodes, node_mask, std=1.0)¶: Samples mean-centered normal noise for z_x, and standard normal noise for z_h.

subspace_dimensionality(node_mask)¶: Compute the dimensionality on translation-invariant linear subspace where distributions on x are defined.

target(batch)¶

SNR_CLAMP_MAX = 5.0¶

T¶

criterion¶

property device¶

eps = 1e-10¶

extra_norm_values = ()¶

gamma¶

include_charge = True¶

load_mlps_layer = 0¶

loss_weighting = 'none'¶

metric = ('mae', 'rmse')¶

mlp = None¶

mlp_batch_norm = None¶

mlp_dropout = 0¶

mlp_final = None¶

model¶

n_dims = 3¶

ndim_extra = 0¶

norm_biases = (None, 0.0, 0.0)¶

norm_values = (1.0, 1.0, 1.0)¶

normalization = True¶

num_class¶

num_mlp_layer = 1¶

num_targets¶

prediction_activation = 'relu'¶

prediction_mlp_type = 'pernode'¶

readout = 'mean'¶

std_mean = None¶

t_max = 1¶

task = ()¶

verbose = 0¶

class MolecularDiffusion.modules.tasks.diffusion.GuidanceModelPredictionPointCloud(*args, **kwargs)¶

Bases: GuidanceModelPrediction

PointCloud-optimized subclass of GuidanceModelPrediction for EGCL/EGNN.

Accepts dense tensor inputs (B, N, D) directly without requiring PyG Data objects. Generates fully-connected edge indices on-the-fly for the EGNN backbone.

Works with both pointcloud batch format (training) and raw tensors (inference).

predict(batch, all_loss=None, metric=None, evaluate=False)¶

Prediction for pointcloud batch format.

Batch format: {coords, node_feature, charges, node_mask, edge_mask, natoms, …}

predict_dense(x, h, node_mask, t)¶

Dense tensor prediction for gradient guidance.

Parameters:

x – Coordinates (B, N, 3)
h – Node features (B, N, F) - should include atom types, charges, etc.
node_mask – Valid node mask (B, N, 1) or (B, N)
t – Timestep (B, 1) or scalar - normalized float in [0, 1]

Returns:

Model predictions (B, num_targets)

Return type:

pred

target(batch)¶: Override target extraction for pointcloud batch format.

MolecularDiffusion.modules.tasks.diffusion.reverse_tensor(x)¶

MolecularDiffusion.modules.tasks.diffusion.logger¶