horde
horde
¶
Horde learner: GVF demons sharing a trunk (Sutton et al. 2011).
Wraps MultiHeadMLPLearner to add:
- Per-demon gamma/lambda via HordeSpec
- TD target computation for temporal demons (gamma > 0)
- GVF metadata and typed update results
Architecture decision: the trunk has no temporal trace decay (gamma=0).
Per-demon gamma/lambda applies only to heads. This avoids the
trace-error coupling problem: MultiHeadMLPLearner's VJP backward
pass folds per-head errors into the trunk cotangent before trace
accumulation, so trunk traces accumulate error-weighted gradients.
With trunk gamma=0, traces reset each step and this is correct.
If trunk gamma*lamda > 0, traces would carry biased error-gradient
products across steps, violating forward-view equivalence (Sutton &
Barto Ch. 12). This also avoids O(n_heads x trunk_params) memory
for per-demon trunk traces.
Reference: Sutton et al. 2011, "Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction"
HordeUpdateResult
¶
Result of a single Horde update step.
Attributes:
state: Updated multi-head MLP learner state
predictions: Predictions from all demons, shape (n_demons,)
td_errors: TD errors (target - prediction), shape (n_demons,).
NaN for inactive demons.
td_targets: Computed TD targets r + gamma * V(s'),
shape (n_demons,). NaN for inactive demons.
per_demon_metrics: Per-demon metrics, shape (n_demons, 3).
Columns: [squared_error, raw_error, mean_step_size].
NaN for inactive demons.
trunk_bounding_metric: Scalar trunk bounding metric
HordeLearningResult
¶
Result from a Horde scan-based learning loop.
Attributes:
state: Final multi-head MLP learner state
per_demon_metrics: Per-demon metrics over time,
shape (num_steps, n_demons, 3)
td_errors: TD errors over time, shape (num_steps, n_demons)
BatchedHordeResult
¶
Result from batched Horde learning loop.
Attributes:
states: Batched multi-head MLP learner states
per_demon_metrics: Per-demon metrics,
shape (n_seeds, num_steps, n_demons, 3)
td_errors: TD errors, shape (n_seeds, num_steps, n_demons)
HordeLearner(horde_spec, hidden_sizes=(128, 128), optimizer=None, step_size=1.0, bounder=None, normalizer=None, sparsity=0.9, leaky_relu_slope=0.01, use_layer_norm=True, head_optimizer=None)
¶
Horde: GVF demons sharing a trunk (Sutton et al. 2011).
Wraps MultiHeadMLPLearner. Adds:
- Per-demon gamma/lambda from HordeSpec
- TD target computation for temporal demons (gamma > 0)
- GVF metadata
The trunk uses gamma=0, lamda=0 (no temporal trace decay on shared
features). Each head uses its own gamma * lambda product for
trace decay, set via per_head_gamma_lamda on the inner learner.
For all-gamma=0 Hordes (e.g. rlsecd's 5 prediction heads), this
produces identical results to MultiHeadMLPLearner since the
TD target reduces to just the cumulant.
Single-Step (Daemon) Usage
Both predict() and update() work with single unbatched
observations (1D arrays). JIT-compiled automatically.
Attributes: horde_spec: The HordeSpec defining all demons n_demons: Number of demons (heads)
Args: horde_spec: Specification of all GVF demons hidden_sizes: Tuple of hidden layer sizes (default: two layers of 128) optimizer: Optimizer for weight updates. Defaults to LMS(step_size). step_size: Base learning rate (used only when optimizer is None) bounder: Optional update bounder (e.g. ObGDBounding) normalizer: Optional feature normalizer sparsity: Fraction of weights zeroed out per neuron (default: 0.9) leaky_relu_slope: Negative slope for LeakyReLU (default: 0.01) use_layer_norm: Whether to apply parameterless layer normalization head_optimizer: Optional separate optimizer for heads
Source code in src/alberta_framework/core/horde.py
horde_spec
property
¶
The HordeSpec defining all demons.
n_demons
property
¶
Number of demons (heads).
learner
property
¶
The underlying MultiHeadMLPLearner.
to_config()
¶
Serialize learner configuration to dict.
Returns: Dict with horde_spec and all MultiHeadMLPLearner constructor args.
Source code in src/alberta_framework/core/horde.py
from_config(config)
classmethod
¶
Reconstruct from config dict.
Args:
config: Dict as produced by to_config()
Returns: Reconstructed HordeLearner
Source code in src/alberta_framework/core/horde.py
init(feature_dim, key)
¶
Initialize Horde learner state.
Args: feature_dim: Dimension of the input feature vector key: JAX random key for weight initialization
Returns: Initial MultiHeadMLPState
Source code in src/alberta_framework/core/horde.py
predict(state, observation)
¶
Compute predictions from all demons.
Args: state: Current learner state observation: Input feature vector
Returns:
Array of shape (n_demons,) with one prediction per demon
Source code in src/alberta_framework/core/horde.py
update(state, observation, cumulants, next_observation)
¶
Update Horde given observation, cumulants, and next observation.
Computes TD targets r + gamma * V(s') for each demon, then
delegates to MultiHeadMLPLearner.update(). For gamma=0 demons,
the target equals the cumulant.
Args:
state: Current state
observation: Input feature vector, shape (feature_dim,)
cumulants: Per-demon pseudo-rewards, shape (n_demons,).
NaN = inactive demon.
next_observation: Next feature vector, shape (feature_dim,).
Used for V(s') bootstrapping. For all-gamma=0 Hordes,
this is required but doesn't affect results.
Returns: HordeUpdateResult with updated state, predictions, TD errors, TD targets, and per-demon metrics
Source code in src/alberta_framework/core/horde.py
run_horde_learning_loop(horde, state, observations, cumulants, next_observations)
¶
Run Horde learning loop using jax.lax.scan.
Scans over (obs, cumulants, next_obs) triples.
Args:
horde: Horde learner
state: Initial learner state
observations: Input observations, shape (num_steps, feature_dim)
cumulants: Per-demon cumulants, shape (num_steps, n_demons).
NaN = inactive demon for that step.
next_observations: Next observations, shape (num_steps, feature_dim)
Returns: HordeLearningResult with final state, per-demon metrics, and TD errors
Source code in src/alberta_framework/core/horde.py
run_horde_learning_loop_batched(horde, observations, cumulants, next_observations, keys)
¶
Run Horde learning loop across seeds using jax.vmap.
Each seed produces an independently initialized state. All seeds share the same observations, cumulants, and next observations.
Args:
horde: Horde learner
observations: Shared observations, shape (num_steps, feature_dim)
cumulants: Shared cumulants, shape (num_steps, n_demons)
next_observations: Shared next observations,
shape (num_steps, feature_dim)
keys: JAX random keys, shape (n_seeds,) or (n_seeds, 2)
Returns: BatchedHordeResult with batched states, per-demon metrics, and TD errors