Optimizers¶

This guide covers the three optimizers in the Alberta Framework.

LMS (Least Mean Squares)¶

The simplest optimizer with a fixed step-size.

Algorithm¶

\[ w_{t+1} = w_t + \alpha \cdot \delta_t \cdot x_t \]

Where: - \(\alpha\) is the fixed step-size - \(\delta_t = y_t - \hat{y}_t\) is the prediction error - \(x_t\) is the feature vector

Usage¶

from alberta_framework import LMS

optimizer = LMS(step_size=0.01)

When to Use¶

LMS serves as a baseline. Use it to:

Establish performance benchmarks
Compare against adaptive methods
Validate problem setups

The main limitation is that optimal step-size depends on the problem and may change as conditions shift.

IDBD (Incremental Delta-Bar-Delta)¶

IDBD maintains per-weight adaptive step-sizes that increase when gradients consistently agree and decrease when they conflict.

Algorithm¶

Compute per-weight step-sizes: \(\alpha_i = \exp(\log \alpha_i)\)
Update weights: \(w_i \leftarrow w_i + \alpha_i \cdot \delta \cdot x_i\)
Update log step-sizes: \(\log \alpha_i \leftarrow \log \alpha_i + \beta \cdot \delta \cdot x_i \cdot h_i\)
Update traces: \(h_i \leftarrow h_i \cdot \max(0, 1 - \alpha_i \cdot x_i^2) + \alpha_i \cdot \delta \cdot x_i\)

Where \(h_i\) is a trace that tracks gradient correlation over time.

Usage¶

from alberta_framework import IDBD

optimizer = IDBD(
    initial_step_size=0.01,  # Starting step-size
    meta_step_size=0.01,     # How fast step-sizes adapt
)

Parameters¶

Parameter	Description	Typical Range
`initial_step_size`	Starting value for per-weight step-sizes	0.001 - 0.1
`meta_step_size`	Learning rate for step-size adaptation (\(\beta\))	0.001 - 0.1

Reference¶

Sutton, R.S. (1992). "Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta"

Autostep¶

Autostep combines adaptive step-sizes with gradient normalization, making it robust to different feature scales without manual tuning.

Algorithm¶

Compute gradient: \(g_i = \delta \cdot x_i\)
Normalize: \(g'_i = g_i / \max(|g_i|, v_i)\)
Update weights: \(w_i \leftarrow w_i + \alpha_i \cdot g'_i\)
Adapt step-sizes: \(\alpha_i \leftarrow \alpha_i \cdot \exp(\mu \cdot g'_i \cdot h_i)\)
Update traces: \(h_i \leftarrow h_i \cdot (1 - \alpha_i) + \alpha_i \cdot g'_i\)
Update normalizers: \(v_i \leftarrow \max(|g_i|, v_i \cdot \tau)\)

Usage¶

from alberta_framework import Autostep

optimizer = Autostep(
    initial_step_size=0.01,
    meta_step_size=0.01,
    normalizer_decay=0.99,
)

Parameters¶

Parameter	Description	Typical Range
`initial_step_size`	Starting value for per-weight step-sizes	0.001 - 0.1
`meta_step_size`	Learning rate for adaptation (\(\mu\))	0.001 - 0.1
`normalizer_decay`	Decay for gradient normalizers (\(\tau\))	0.9 - 0.999

Reference¶

Mahmood, A.R., Sutton, R.S., Degris, T., & Pilarski, P.M. (2012). "Tuning-free step-size adaptation"

Comparison¶

Feature	LMS	IDBD	Autostep
Per-weight step-sizes	No	Yes	Yes
Gradient normalization	No	No	Yes
Tuning required	High	Medium	Low
Computational cost	Lowest	Medium	Highest

Choosing an Optimizer¶

Start with IDBD for most non-stationary problems
Use Autostep when feature scales vary significantly
Use LMS as a baseline or when you have a well-tuned step-size

Metrics¶

All optimizers report metrics during training:

# LMS metrics
{'step_size': 0.01}

# IDBD metrics
{'mean_step_size': 0.015, 'min_step_size': 0.001, 'max_step_size': 0.1}

# Autostep metrics
{'mean_step_size': 0.015, 'min_step_size': 0.001,
 'max_step_size': 0.1, 'mean_normalizer': 1.5}