Installation

Prerequisites

  • Python 3.11

  • A CUDA-capable GPU is recommended for training

Step-by-step

# 1. Create and activate a new environment
conda create -n molcraft python=3.11 -y
conda activate molcraft

# 2. Install MolCraftDiffusion with a compute backend
# GPU/CUDA:
pip install molcraftdiffusion[gpu] \
    --find-links https://data.pyg.org/whl/torch-2.6.0+cu124.html

# CPU-only:
pip install molcraftdiffusion[cpu] \
    --extra-index-url https://download.pytorch.org/whl/cpu \
    --find-links https://data.pyg.org/whl/torch-2.6.0+cpu.html

The base package does not install every data-processing or analysis dependency. Add the feature groups you need:

# Data preparation, augmentation, and featurization commands
pip install 'molcraftdiffusion[data]'

# Analysis and post-processing commands (metrics, compare, xyz2mol, xtb-electronic, featurize SOAP)
pip install 'molcraftdiffusion[analyze]'

# xTB is used by optimize, compare, and xtb-electronic — best installed from conda-forge:
conda install -c conda-forge xtb==6.7.1 -y

If an optional command is called without its dependencies, MolCraftDiffusion exits with a warning and an install hint such as pip install 'molcraftdiffusion[analyze]'.

Development / editable install

git clone https://github.com/pregHosh/MolCraftDiffusion
cd MolCraftDiffusion
pip install -e .[gpu] \
    --find-links https://data.pyg.org/whl/torch-2.6.0+cu124.html

# Add optional groups for editable development when needed:
pip install -e '.[data]'
pip install -e '.[analyze]'

Optional dependencies

# Data utilities (includes dscribe for SOAP featurization)
pip install 'molcraftdiffusion[data]'

# Analyze utilities (PoseBusters/RDKit/OpenBabel Python bindings)
pip install 'molcraftdiffusion[analyze]'

# Optional: needed for geometric-shape metrics in
# `MolCraftDiff analyze metrics --metrics {core,geom_revised,all}`
pip install cosymlib

# xTB executable for xTB-backed analysis
conda install -c conda-forge xtb==6.7.1 -y

UMA featurization backend

The featurize --backend uma command uses a pretrained UMA model from fairchem. fairchem is not installed as a pip package — the source tree is vendored into the repository and loaded at runtime.

Clone it into the repo root before using the UMA backend:

# from the MolCraftDiffusion repo root
git clone https://github.com/pregHosh/fairchem fairchem

A pretrained UMA checkpoint is also required. Download uma-s-1p2.pt from Hugging Face and place it at:

training_outputs/uma-s-1p2.pt

or pass a custom path with --checkpoint /path/to/checkpoint.pt.

If the fairchem source tree is not found at runtime, MolCraftDiffusion will print an explicit error with the clone instruction above. You can also set:

export MOLCRAFT_REPO_ROOT=/path/to/MolCraftDiffusion

to point to the repo root when running from a different working directory.

Verifying the installation

MolCraftDiff --help

You should see a list of all available commands: train, generate, predict, eval-predict, analyze, data.

Pre-trained models

Pre-trained checkpoints are available on Hugging Face. We recommend starting from these for any downstream application.