Tutorial 2: Training a Regressor Model¶
This tutorial explains how to train a model to predict specific molecular properties (e.g., energy, solubility). This regressor model can be used as a standalone predictor or, more powerfully, as a guidance model to steer molecule generation towards desired property values (as we will see in Tutorial 07).
Configuration¶
We use the exact same override-only configuration workflow introduced in Tutorial 1: Training a Diffusion Model, but we load the regression templates via the defaults list.
# Inside my_regressor_run.yaml
defaults:
- data: mol_dataset
- tasks: regression # Use the regression task configuration
- logger: wandb
- trainer: regression # Use the regression-specific trainer settings
- _self_
Key Parameters for Regression¶
Below are the key parameters and recommended settings to override when training a regression model.
Parameter |
Example Override |
Description |
|---|---|---|
|
|
CRITICAL: Where your trained regressor model is saved. |
|
|
Path to your compiled ASE database containing molecular properties. |
(Note: Ensure you have prepared your database as described in Tutorial 0: Data Preparation & Management.)
Data Settings¶
Parameter |
Example Override |
Notes / Recommendations |
|---|---|---|
|
|
A larger batch size can often be used for this task. |
|
|
CRITICAL: For regression and guidance tasks, the data type must be set to |
Regression Task Hyperparameters¶
Parameter |
Example Override |
Notes / Recommendations |
|---|---|---|
|
|
CRITICAL: Tell the model which property from your dataset to predict. |
|
|
Regressors often benefit from being wider than diffusion models. |
|
|
|
|
|
For property prediction, it is preferred to have just one block of EGCL. |
|
|
Inside the single EGCL block, use multiple sublayers for a deeper model. |
Trainer Settings for Regression¶
Parameter |
Example Override |
Notes / Recommendations |
|---|---|---|
|
|
|
|
|
Regression can often be trained with a slightly higher learning rate than diffusion models. |
|
|
|
|
|
Important: Exponential Moving Average (EMA) is typically disabled for regressor training by setting the decay to |
Experiment Logging¶
Parameter |
Example Override |
Description |
|---|---|---|
|
|
(W&B only) The name of the project on your W&B dashboard. |
|
|
The top-level |
Putting It All Together¶
Here is a complete my_regressor_run.yaml example:
defaults:
- data: mol_dataset
- tasks: regression
- logger: wandb
- trainer: regression
- _self_
name: "my_s1_t1_regressor"
seed: 42
trainer:
output_path: "training_outputs/my_s1_t1_regressor"
num_epochs: 100
logger:
project_wandb: "My_Regressor_Project"
data:
data_type: "pyg"
batch_size: 128
tasks:
task_learn: ["S1_exc", "T1_exc"]
hidden_size: 512
num_layers: 1
num_sublayers: 4
Launch the training as usual:
MolCraftDiff train my_regressor_run
Next Steps: Property Prediction and Guidance¶
Once your regression model is trained, you can use it in two main ways:
As a Standalone Predictor: Use the
predictmodule to predict properties for batches of existing 3D molecules.As a Guidance Model: Use the predictions to steer the creation of new molecules.
Learn how to use your regressor to guide diffusion generation in Tutorial 7: Property-Directed Generation (CFG/GG).