Tutorial: Scoring¶
The scoring run mode evaluates an existing SMILES list against a scoring function and writes the results to a CSV. No model or training is involved.
Primary use case: validate and iterate on your scoring function configuration before committing to a full RL run.
Configuration¶
run_type = "scoring"
[parameters]
smiles_file = "molecules.smi" # one SMILES per line
output_csv = "scores.csv"
[scoring]
type = "geometric_mean" # aggregation across components
parallel = 4 # number of parallel CPU workers
[[scoring.component]]
...
Scoring Components¶
Each component computes one property per SMILES. Components are declared as [[scoring.component]] blocks. The component name (e.g. QED, MolecularWeight) must match the name registered in REINVENT4.
Structure of a component block¶
[[scoring.component]]
[scoring.component.MolecularWeight]
[[scoring.component.MolecularWeight.endpoint]]
name = "MW" # label used in the output CSV column
weight = 1.0 # relative weight in aggregation
transform.type = "double_sigmoid"
transform.low = 200.0
transform.high = 500.0
transform.coef_div = 500.0
transform.coef_si = 20.0
transform.coef_se = 20.0
Some components accept params for additional configuration (e.g. a reference SMILES, SMARTS pattern, or file path).
Available components¶
Component |
Description |
|---|---|
|
Drug-likeness score (0–1, higher is better) |
|
Crippen LogP |
|
Molecular weight in Da |
|
Topological polar surface area |
|
Number of H-bond acceptors |
|
Number of H-bond donors |
|
Number of rotatable bonds |
|
Total number of rings |
|
Number of aromatic rings |
|
Number of aliphatic rings |
|
Fraction of sp3 carbons |
|
Number of heavy atoms |
|
Synthetic accessibility score |
|
Tanimoto similarity to a reference SMILES |
|
Count of a SMARTS substructure (filter) |
|
Penalty if a SMARTS substructure is present (multiplied against total score) |
|
Zero the total score if any SMARTS alert matches (global filter) |
|
Principal moment of inertia — 3D shape descriptor ( |
External components (DockStream, Maize, ChemProp, REST) require additional setup and are not covered here.
Transforms¶
Transforms map the raw component value to [0, 1] before aggregation. All transforms are optional — without one, the raw value is passed directly (only appropriate if it is already in [0, 1], like QED).
sigmoid¶
Scores rise from 0 to 1 as the value increases through the [low, high] range. Use when higher is better (e.g. QED, similarity).
transform.type = "sigmoid"
transform.low = 0.3
transform.high = 0.7
transform.k = 0.5 # steepness; larger = sharper transition
reverse_sigmoid¶
Scores fall from 1 to 0 as the value increases. Use when lower is better (e.g. LogP, rotatable bonds).
transform.type = "reverse_sigmoid"
transform.low = 1.0
transform.high = 3.0
transform.k = 0.5
double_sigmoid¶
Scores peak at 1 within the [low, high] window and fall to 0 outside it. Use for properties with a preferred range (e.g. MW 200–500 Da, TPSA 0–140 Ų).
transform.type = "double_sigmoid"
transform.low = 200.0
transform.high = 500.0
transform.coef_div = 500.0 # normalisation divisor, typically set to high
transform.coef_si = 20.0 # steepness of the left (rising) edge
transform.coef_se = 20.0 # steepness of the right (falling) edge
step¶
Returns 1.0 if the value is within [low, high], 0.0 otherwise. Hard cutoff, no gradient.
transform.type = "step"
transform.low = 0
transform.high = 3
Aggregation¶
The [scoring] type controls how component scores are combined into a total score:
geometric_mean(default and recommended): sensitive to low-scoring components — a single zero pulls the total to zero. Encourages balanced optimisation.arithmetic_mean: averages scores; a high score on one component can compensate for a low score on another.
Filters vs. Components¶
Two special components operate differently from standard components:
custom_alerts: a global filter — if any SMARTS pattern matches, the total score is set to 0 regardless of all other components. No weight is needed.MatchingSubstructure: a penalty multiplier — the total score is multiplied by the component score. Use to penalise molecules containing an unwanted substructure.
Example: Drug-likeness Filter¶
run_type = "scoring"
[parameters]
smiles_file = "molecules.smi"
output_csv = "scores.csv"
[scoring]
type = "geometric_mean"
[[scoring.component]]
[scoring.component.custom_alerts]
[[scoring.component.custom_alerts.endpoint]]
name = "Alerts"
params.smarts = [
"[*;r{8-17}]", # macrocycles
"[#8][#8]", # peroxide
"[#6;+]", # charged carbon
"[#16][#16]" # disulfide
]
[[scoring.component]]
[scoring.component.QED]
[[scoring.component.QED.endpoint]]
name = "QED"
weight = 1.0
[[scoring.component]]
[scoring.component.MolecularWeight]
[[scoring.component.MolecularWeight.endpoint]]
name = "MW"
weight = 1.0
transform.type = "double_sigmoid"
transform.low = 200.0
transform.high = 500.0
transform.coef_div = 500.0
transform.coef_si = 20.0
transform.coef_se = 20.0
[[scoring.component]]
[scoring.component.SlogP]
[[scoring.component.SlogP.endpoint]]
name = "LogP"
weight = 1.0
transform.type = "reverse_sigmoid"
transform.low = 1.0
transform.high = 3.0
transform.k = 0.5
Running¶
reinvent scoring.toml
Output¶
The output CSV contains one row per input SMILES with columns for the total score and each component score (raw and transformed).
Column |
Description |
|---|---|
|
Canonicalized input SMILES |
|
Aggregated score across all components |
|
Raw value from the component |
|
Transformed value (0–1) used in aggregation |