MolecularDiffusion.modules.layers.tabasco.transformer¶
Classes¶
A standard Transformer model with multiple layers. |
|
A transformer block with layer normalization and residual connections. |
Module Contents¶
- class MolecularDiffusion.modules.layers.tabasco.transformer.Transformer(dim: int, depth: int, num_heads: int, mlp_dim: int | None = None, dropout: float = 0.0, activation_type: str = 'gelu', norm_eps: float = 1e-05)¶
Bases:
torch.nn.ModuleA standard Transformer model with multiple layers.
This implements a sequence of transformer blocks, each containing self-attention and feed-forward components with residual connections.
Initialize the Transformer module.
- Parameters:
dim – Model dimension
depth – Number of transformer blocks
num_heads – Number of attention heads
mlp_dim – Hidden dimension for feed-forward networks (defaults to 4x dim)
dropout – Dropout probability
activation_type – Type of activation to use in feed-forward networks
norm_eps – Epsilon value for layer normalization
- forward(x: torch.Tensor, padding_mask: torch.Tensor | None = None, attn_mask: torch.Tensor | None = None) torch.Tensor¶
Forward pass through the transformer.
- Parameters:
x – Input tensor of shape [batch_size, seq_len, dim]
padding_mask – Boolean mask for padding tokens (True means ignore) Shape: [batch_size, seq_len]
attn_mask – Mask to prevent attention to certain positions Shape: [seq_len, seq_len] or [batch_size, seq_len, seq_len]
- Returns:
Output tensor of shape [batch_size, seq_len, dim]
- layers¶
- norm¶
- class MolecularDiffusion.modules.layers.tabasco.transformer.TransformerBlock(dim: int, num_heads: int, mlp_dim: int = None, dropout: float = 0.0, activation_type: str = 'swiglu', norm_eps: float = 1e-05)¶
Bases:
torch.nn.ModuleA transformer block with layer normalization and residual connections.
This implements a standard transformer block with self-attention followed by a feed-forward network, with layer normalization and residual connections.
Initialize the TransformerBlock module.
- Parameters:
dim – Input and output dimension
num_heads – Number of attention heads
mlp_dim – Hidden dimension for the feed-forward network (defaults to 4x input dim)
dropout – Dropout probability
activation_type – Type of activation to use in the feed-forward network
norm_eps – Epsilon value for layer normalization
- forward(x: torch.Tensor, padding_mask: torch.Tensor | None = None, attn_mask: torch.Tensor | None = None) torch.Tensor¶
Forward pass through the transformer block.
- Parameters:
x – Input tensor of shape [batch_size, seq_len, dim]
padding_mask – Boolean mask for padding tokens (True means ignore) Shape: [batch_size, seq_len]
attn_mask – Mask to prevent attention to certain positions Shape: [seq_len, seq_len] or [batch_size, seq_len, seq_len]
- Returns:
Output tensor of shape [batch_size, seq_len, dim]
- attn_block¶
- ff_block¶