MolecularDiffusion.runmodes.train.eval

Attributes

Functions

analyze_and_save(→ Dict[str, Any])

Samples molecules from a generative model, saves them as XYZ files,

evaluate(task, solver[, epoch, current_best_metric, ...])

Evaluates the performance of a trained model based on the specified task.

get_versioned_output_path(→ str)

Get next available version folder (engine_logs/version_X).

Module Contents

MolecularDiffusion.runmodes.train.eval.analyze_and_save(model, epoch: int, n_samples: int = 1000, batch_size: int = 100, logger: Literal['wandb', 'logging'] = 'logging', path_save: str = 'samples', use_posebuster: bool = False, postbuster_timeout: int = 60) Dict[str, Any]

Samples molecules from a generative model, saves them as XYZ files, and computes structural validity statistics.

Parameters:
  • model – The generative model used for sampling.

  • epoch (int) – The current training epoch (for logging purposes).

  • n_samples (int) – Total number of molecules to sample.

  • batch_size (int) – Number of molecules sampled per batch.

  • logger (str) – Logging backend, either “wandb” or “logging”.

  • path_save (str) – Directory to save the sampled XYZ files and CSV.

Returns:

Dictionary summarizing validity and connectivity statistics.

Return type:

Dict[str, Any]

MolecularDiffusion.runmodes.train.eval.evaluate(task: str, solver: MolecularDiffusion.core.Engine, epoch: int = 0, current_best_metric: float = torch.inf, best_checkpoints: list = None, logger: Literal['wandb', 'logging'] = 'logging', output_path: str = None, use_amp: bool = False, precision: str = 'bf16', **kwargs)

Evaluates the performance of a trained model based on the specified task.

For ‘diffusion’ tasks, it evaluates generative performance by sampling molecules, saving them, and analyzing their structural validity and connectivity. For ‘property’ or ‘guidance’ tasks, it evaluates predictive performance by calculating Mean Absolute Error (MAE).

Parameters:
  • task (str) – The type of task being evaluated (“diffusion”, “property”, or “guidance”).

  • solver (Engine) – The training engine containing the model and evaluation methods.

  • epoch (int, optional) – The current training epoch, used for naming generated files. Defaults to 0.

  • best_checkpoints (list, optional) – A list of tuples containing the metric and path of the best checkpoints.

  • logger (Literal["wandb", "logging"], optional) – The logging backend to use. Defaults to “logging”.

  • **kwargs – Additional keyword arguments specific to the task, such as: - output_generated_dir (str): Directory to save generated molecules (for diffusion). - generative_analysis (bool): Whether to perform generative analysis (for diffusion). - n_samples (int): Number of samples to generate (for diffusion). - metric (str): The metric to return from generative analysis (for diffusion).

Returns:

A tuple containing the best performance metric and the list of best checkpoints.

Return type:

Tuple[float, list]

MolecularDiffusion.runmodes.train.eval.get_versioned_output_path(base_output_path: str) str

Get next available version folder (engine_logs/version_X).

This mimics Lightning’s behavior of creating version_0, version_1, etc.

Parameters:

base_output_path (str) – The base output directory (e.g., training_outputs/my_model)

Returns:

Path to the versioned checkpoint folder (e.g., training_outputs/my_model/engine_logs/version_0)

Return type:

str

MolecularDiffusion.runmodes.train.eval.ANGLE_RELAX = 20
MolecularDiffusion.runmodes.train.eval.DISTRIBUTED_DEFAULT_TIMEOUT_SEC = 1800
MolecularDiffusion.runmodes.train.eval.DIST_RELAX_BOND = 0.25
MolecularDiffusion.runmodes.train.eval.DIST_THRESHOLD = 3
MolecularDiffusion.runmodes.train.eval.SCALE_FACTOR = 1.2