optimizers
optimizers
¶
Optimizers for continual learning.
Implements LMS (fixed step-size baseline), IDBD (meta-learned step-sizes), Autostep (tuning-free step-size adaptation), and ObGD (observation-bounded) for the Alberta Plan.
Also provides the Bounder ABC for decoupled update bounding (e.g. ObGDBounding).
References: - Sutton 1992, "Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta" - Mahmood et al. 2012, "Tuning-free step-size adaptation" - Elsayed et al. 2024, "Streaming Deep Reinforcement Learning Finally Works"
Bounder
¶
Bases: ABC
Base class for update bounding strategies.
A bounder takes the proposed per-parameter step arrays from an optimizer and optionally scales them down to prevent overshooting.
to_config()
abstractmethod
¶
bound(steps, error, params)
abstractmethod
¶
Bound proposed update steps.
Args: steps: Per-parameter step arrays from the optimizer error: Prediction error scalar params: Current parameter values (needed by some bounders like AGC)
Returns:
(bounded_steps, metric) where metric is a scalar for reporting
(e.g., scale factor for ObGD, mean clip ratio for AGC)
Source code in src/alberta_framework/core/optimizers.py
ObGDBounding(kappa=2.0)
¶
Bases: Bounder
ObGD-style global update bounding (Elsayed et al. 2024).
Computes a global bounding factor from the L1 norm of all proposed steps and the error magnitude, then uniformly scales all steps down if the combined update would be too large.
For LMS with a single scalar step-size alpha:
total_step = alpha * z_sum, giving
M = alpha * kappa * max(|error|, 1) * z_sum -- identical to
the original Elsayed et al. 2024 formula.
Attributes: kappa: Bounding sensitivity parameter (higher = more conservative)
Source code in src/alberta_framework/core/optimizers.py
to_config()
¶
bound(steps, error, params)
¶
Bound proposed steps using ObGD formula.
Args: steps: Per-parameter step arrays error: Prediction error scalar params: Current parameter values (unused by ObGD)
Returns:
(bounded_steps, scale) where scale is the bounding factor
Source code in src/alberta_framework/core/optimizers.py
AGCBounding(clip_factor=0.01, eps=0.001)
¶
Bases: Bounder
Adaptive Gradient Clipping (Brock et al. 2021).
Clips per-output-unit based on the ratio of gradient norm to weight norm.
Units where ||grad|| / max(||weight||, eps) > clip_factor get scaled
down to respect the constraint.
Unlike ObGDBounding which applies a single global scale factor, AGC applies fine-grained, per-unit clipping that adapts to each layer's weight magnitude.
The metric returned is the fraction of units that were clipped (0.0 = no clipping, 1.0 = all units clipped).
Reference: Brock, A., De, S., Smith, S.L., & Simonyan, K. (2021). "High-Performance Large-Scale Image Recognition Without Normalization" (arXiv: 2102.06171)
Attributes: clip_factor: Maximum allowed gradient-to-weight ratio (lambda). Default 0.01. eps: Floor for weight norm to avoid division by zero. Default 1e-3.
Source code in src/alberta_framework/core/optimizers.py
to_config()
¶
bound(steps, error, params)
¶
Bound proposed steps using per-unit adaptive gradient clipping.
For each parameter/step pair, computes unit-wise norms and clips
units where |error| * ||step|| > clip_factor * max(||param||, eps).
Args: steps: Per-parameter step arrays from the optimizer error: Prediction error scalar params: Current parameter values (used for weight norms)
Returns:
(clipped_steps, frac_clipped) where frac_clipped is the
fraction of units that were clipped
Source code in src/alberta_framework/core/optimizers.py
OptimizerUpdate
¶
Result of an optimizer update step.
Attributes: weight_delta: Change to apply to weights bias_delta: Change to apply to bias new_state: Updated optimizer state metrics: Dictionary of metrics for logging (values are JAX arrays for scan compatibility)
Optimizer
¶
Bases: ABC
Base class for optimizers.
to_config()
abstractmethod
¶
init(feature_dim)
abstractmethod
¶
Initialize optimizer state.
Args: feature_dim: Dimension of weight vector
Returns: Initial optimizer state
update(state, error, observation)
abstractmethod
¶
Compute weight updates given prediction error.
Args: state: Current optimizer state error: Prediction error (target - prediction) observation: Current observation/feature vector
Returns: OptimizerUpdate with deltas and new state
Source code in src/alberta_framework/core/optimizers.py
init_for_shape(shape)
¶
Initialize optimizer state for parameters of arbitrary shape.
Used by MLP learners where parameters are matrices/vectors of varying shapes. Not all optimizers support this.
The return type varies by subclass (e.g. LMSState for LMS,
AutostepParamState for Autostep) so the base signature uses
Any.
Args: shape: Shape of the parameter array
Returns: Initial optimizer state with arrays matching the given shape
Raises: NotImplementedError: If the optimizer does not support this
Source code in src/alberta_framework/core/optimizers.py
update_from_gradient(state, gradient, error=None)
¶
Compute step delta from pre-computed gradient.
The returned delta does NOT include the error -- the caller is
responsible for multiplying error * delta before applying.
The state type varies by subclass (e.g. LMSState for LMS,
AutostepParamState for Autostep) so the base signature uses
Any.
Args: state: Current optimizer state gradient: Pre-computed gradient (e.g. eligibility trace) error: Optional prediction error scalar. Optimizers with meta-learning (e.g. Autostep) use this for meta-gradient computation. LMS ignores it.
Returns:
(step, new_state) where step has the same shape as gradient
Raises: NotImplementedError: If the optimizer does not support this
Source code in src/alberta_framework/core/optimizers.py
LMS(step_size=0.01)
¶
Least Mean Square optimizer with fixed step-size.
The simplest gradient-based optimizer: w_{t+1} = w_t + alpha * delta * x_t
This serves as a baseline. The challenge is that the optimal step-size depends on the problem and changes as the task becomes non-stationary.
Attributes: step_size: Fixed learning rate alpha
Args: step_size: Fixed learning rate
Source code in src/alberta_framework/core/optimizers.py
to_config()
¶
init(feature_dim)
¶
Initialize LMS state.
Args: feature_dim: Dimension of weight vector (unused for LMS)
Returns: LMS state containing the step-size
Source code in src/alberta_framework/core/optimizers.py
init_for_shape(shape)
¶
Initialize LMS state for arbitrary-shape parameters.
LMS state is shape-independent (single scalar), so this returns the same state regardless of shape.
Source code in src/alberta_framework/core/optimizers.py
update_from_gradient(state, gradient, error=None)
¶
Compute step from gradient: step = alpha * gradient.
Args: state: Current LMS state gradient: Pre-computed gradient (any shape) error: Unused by LMS (accepted for interface compatibility)
Returns:
(step, state) -- state is unchanged for LMS
Source code in src/alberta_framework/core/optimizers.py
update(state, error, observation)
¶
Compute LMS weight update.
Update rule: delta_w = alpha * error * x
Args: state: Current LMS state error: Prediction error (scalar) observation: Feature vector
Returns: OptimizerUpdate with weight and bias deltas
Source code in src/alberta_framework/core/optimizers.py
IDBD(initial_step_size=0.01, meta_step_size=0.01, h_decay_mode='prediction_grads')
¶
Incremental Delta-Bar-Delta optimizer.
IDBD maintains per-weight adaptive step-sizes that are meta-learned based on gradient correlation. When successive gradients agree in sign, the step-size for that weight increases. When they disagree, it decreases.
This implements Sutton's 1992 algorithm for adapting step-sizes online without requiring manual tuning.
Reference: Sutton, R.S. (1992). "Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta"
Attributes: initial_step_size: Initial per-weight step-size meta_step_size: Meta learning rate beta for adapting step-sizes
Args:
initial_step_size: Initial value for per-weight step-sizes
meta_step_size: Meta learning rate beta for adapting step-sizes
h_decay_mode: Mode for computing the h-decay term in MLP path.
"prediction_grads": h_decay = z^2 (squared prediction
gradients). This is the principled generalization — for
linear models, z = x so z^2 = x^2, recovering Sutton 1992.
"loss_grads": h_decay = (error * z)^2 (Fisher
approximation of the Hessian diagonal).
Only affects the MLP path (update_from_gradient);
the linear update() method always uses x^2.
Raises:
ValueError: If h_decay_mode is not one of the valid modes
Source code in src/alberta_framework/core/optimizers.py
to_config()
¶
Serialize configuration to dict.
Source code in src/alberta_framework/core/optimizers.py
init(feature_dim)
¶
Initialize IDBD state.
Args: feature_dim: Dimension of weight vector
Returns: IDBD state with per-weight step-sizes and traces
Source code in src/alberta_framework/core/optimizers.py
init_for_shape(shape)
¶
Initialize IDBD state for arbitrary-shape parameters.
Args: shape: Shape of the parameter array
Returns: IDBDParamState with arrays matching the given shape
Source code in src/alberta_framework/core/optimizers.py
update_from_gradient(state, gradient, error=None)
¶
Compute IDBD update from pre-computed gradient (MLP path).
Implements Meyer's adaptation of IDBD to nonlinear models. The key
insight: replace x^2 in the h-decay term with (dy/dw)^2
(squared prediction gradients), which generalizes IDBD to arbitrary
architectures.
This follows Meyer's implementation, which differs from the linear IDBD (Sutton 1992) in two ways to better handle deep networks:
- The meta-update uses
z * h(prediction gradient times trace) without the current error, rather thanerror * z * h. - The h-trace accumulates loss gradients (
-error * z) rather than error-scaled prediction gradients (error * z).
These changes address problems with IDBD in deep networks where the step-size being factored into both h and beta updates causes compounding effects.
Reference: Meyer, https://github.com/ejmejm/phd_research
Operation order (meta-update first, then new alpha for trace):
- Compute h_decay based on mode:
z^2or(error * z)^2 - Meta-update with OLD traces:
log_alpha += beta * z * h - Clip log step-sizes to
[-10.0, 2.0] - New step-sizes:
alpha = exp(log_alpha) - Step:
alpha * z(error applied externally by caller) - Trace update:
h = h * max(0, 1 - alpha * h_decay) + alpha * gwhereg = -error * z(loss gradient direction)
When error is None (trunk path in multi-head), the gradient
is already in loss gradient direction (accumulated cotangents),
so the trace uses alpha * z directly.
Args: state: Current IDBD param state gradient: Pre-computed prediction gradient / eligibility trace (same shape as state arrays) error: Optional prediction error scalar. When provided, used for h_decay (loss_grads mode) and h-trace sign.
Returns:
(step, new_state) where step has the same shape as gradient
Source code in src/alberta_framework/core/optimizers.py
522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 | |
update(state, error, observation)
¶
Compute IDBD weight update with adaptive step-sizes.
Following Sutton 1992, Figure 2, the operation ordering is:
- Meta-update:
log_alpha_i += beta * error * x_i * h_i(using OLD traces) - Compute NEW step-sizes:
alpha_i = exp(log_alpha_i) - Update weights:
w_i += alpha_i * error * x_i(using NEW alpha) - Update traces:
h_i = h_i * max(0, 1 - alpha_i * x_i^2) + alpha_i * error * x_i(using NEW alpha)
The trace h_i tracks the correlation between current and past gradients. When gradients consistently point the same direction, h_i grows, leading to larger step-sizes.
Args: state: Current IDBD state error: Prediction error (scalar) observation: Feature vector
Returns: OptimizerUpdate with weight deltas and updated state
Source code in src/alberta_framework/core/optimizers.py
613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 | |
Autostep(initial_step_size=0.01, meta_step_size=0.01, tau=10000.0)
¶
Bases: Optimizer[AutostepState]
Autostep optimizer with tuning-free step-size adaptation.
Implements the exact algorithm from Mahmood et al. 2012, Table 1.
The algorithm maintains per-weight step-sizes that adapt based on
meta-gradient correlation. The key innovations are:
- Self-regulated normalizers (v_i) that track meta-gradient magnitude
|delta * x_i * h_i| for stable meta-updates
- Overshoot prevention via effective step-size normalization
M = max(sum(alpha_i * x_i^2), 1)
Per-sample update (Table 1):
v_i = max(|delta*x_i*h_i|, v_i + (1/tau)*alpha_i*x_i^2*(|delta*x_i*h_i| - v_i))alpha_i *= exp(mu * delta*x_i*h_i / v_i)wherev_i > 0M = max(sum(alpha_i * x_i^2), 1);alpha_i /= Mw_i += alpha_i * delta * x_i(weight update with NEW alpha)h_i = h_i * (1 - alpha_i * x_i^2) + alpha_i * delta * x_i(trace update)
Reference: Mahmood, A.R., Sutton, R.S., Degris, T., & Pilarski, P.M. (2012). "Tuning-free step-size adaptation"
Attributes: initial_step_size: Initial per-weight step-size meta_step_size: Meta learning rate mu for adapting step-sizes tau: Time constant for normalizer adaptation (default: 10000)
Args: initial_step_size: Initial value for per-weight step-sizes meta_step_size: Meta learning rate for adapting step-sizes tau: Time constant for normalizer adaptation (default: 10000). Higher values mean slower normalizer decay.
Source code in src/alberta_framework/core/optimizers.py
to_config()
¶
Serialize configuration to dict.
init(feature_dim)
¶
Initialize Autostep state.
Normalizers (v_i) and traces (h_i) are initialized to 0 per the paper.
Args: feature_dim: Dimension of weight vector
Returns: Autostep state with per-weight step-sizes, traces, and normalizers
Source code in src/alberta_framework/core/optimizers.py
init_for_shape(shape)
¶
Initialize Autostep state for arbitrary-shape parameters.
Args: shape: Shape of the parameter array
Returns: AutostepParamState with arrays matching the given shape
Source code in src/alberta_framework/core/optimizers.py
update_from_gradient(state, gradient, error=None)
¶
Compute Autostep update from pre-computed gradient (MLP path).
Implements the Table 1 algorithm generalized for arbitrary-shape
parameters, where gradient plays the role of the eligibility
trace z (prediction gradient).
When error is provided, the full paper algorithm is used:
meta-gradient is error * z * h. When error is None,
falls back to error-free approximation (z * h).
The returned step does NOT include the error -- the caller applies
param += error * step after optional bounding.
Args: state: Current Autostep param state gradient: Pre-computed gradient / eligibility trace (same shape as state arrays) error: Optional prediction error scalar. When provided, enables the full paper algorithm with error-scaled meta-gradients.
Returns:
(step, new_state) where step has the same shape as gradient
Source code in src/alberta_framework/core/optimizers.py
787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 | |
update(state, error, observation)
¶
Compute Autostep weight update following Mahmood et al. 2012, Table 1.
The algorithm per sample:
- Eq. 4:
v_i = max(|δ*x_i*h_i|, v_i + (1/τ)*α_i*x_i²*(|δ*x_i*h_i| - v_i)) - Eq. 5:
α_i *= exp(μ * δ*x_i*h_i / v_i)wherev_i > 0 - Eq. 6-7:
M = max(Σ α_i*x_i² + α_bias, 1);α_i /= M,α_bias /= M - Weight update:
w_i += α_i * δ * x_i(with NEW alpha) - Trace update:
h_i = h_i*(1 - α_i*x_i²) + α_i*δ*x_i
Args: state: Current Autostep state error: Prediction error (scalar) observation: Feature vector
Returns: OptimizerUpdate with weight deltas and updated state
Source code in src/alberta_framework/core/optimizers.py
872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 | |
ObGD(step_size=1.0, kappa=2.0, gamma=0.0, lamda=0.0)
¶
Observation-bounded Gradient Descent optimizer.
ObGD prevents overshooting by dynamically bounding the effective step-size based on the magnitude of the prediction error and eligibility traces. When the combined update magnitude would be too large, the step-size is scaled down to prevent the prediction from overshooting the target.
This is the deep-network generalization of Autostep's overshooting prevention, designed for streaming reinforcement learning.
For supervised learning (gamma=0, lamda=0), traces equal the current observation each step, making ObGD equivalent to LMS with dynamic step-size bounding.
The ObGD algorithm:
- Update traces:
z = gamma * lamda * z + observation - Compute bound:
M = alpha * kappa * max(|error|, 1) * (||z_w||_1 + |z_b|) - Effective step:
alpha_eff = min(alpha, alpha / M)(i.e.alpha / max(M, 1)) - Weight delta:
delta_w = alpha_eff * error * z_w - Bias delta:
delta_b = alpha_eff * error * z_b
Reference: Elsayed et al. 2024, "Streaming Deep Reinforcement Learning Finally Works"
Attributes: step_size: Base learning rate alpha kappa: Bounding sensitivity parameter (higher = more conservative) gamma: Discount factor for trace decay (0 for supervised learning) lamda: Eligibility trace decay parameter (0 for supervised learning)
Args: step_size: Base learning rate (default: 1.0) kappa: Bounding sensitivity parameter (default: 2.0) gamma: Discount factor for trace decay (default: 0.0 for supervised) lamda: Eligibility trace decay parameter (default: 0.0 for supervised)
Source code in src/alberta_framework/core/optimizers.py
to_config()
¶
Serialize configuration to dict.
Source code in src/alberta_framework/core/optimizers.py
init(feature_dim)
¶
Initialize ObGD state.
Args: feature_dim: Dimension of weight vector
Returns: ObGD state with eligibility traces
Source code in src/alberta_framework/core/optimizers.py
update(state, error, observation)
¶
Compute ObGD weight update with overshooting prevention.
The bounding mechanism scales down the step-size when the combined effect of error magnitude, trace norm, and step-size would cause the prediction to overshoot the target.
Args: state: Current ObGD state error: Prediction error (target - prediction) observation: Current observation/feature vector
Returns: OptimizerUpdate with bounded weight deltas and updated state
Source code in src/alberta_framework/core/optimizers.py
1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 | |
init_for_shape(shape)
¶
Initialize optimizer state for parameters of arbitrary shape.
Used by MLP learners where parameters are matrices/vectors of varying shapes. Not all optimizers support this.
The return type varies by subclass (e.g. LMSState for LMS,
AutostepParamState for Autostep) so the base signature uses
Any.
Args: shape: Shape of the parameter array
Returns: Initial optimizer state with arrays matching the given shape
Raises: NotImplementedError: If the optimizer does not support this
Source code in src/alberta_framework/core/optimizers.py
update_from_gradient(state, gradient, error=None)
¶
Compute step delta from pre-computed gradient.
The returned delta does NOT include the error -- the caller is
responsible for multiplying error * delta before applying.
The state type varies by subclass (e.g. LMSState for LMS,
AutostepParamState for Autostep) so the base signature uses
Any.
Args: state: Current optimizer state gradient: Pre-computed gradient (e.g. eligibility trace) error: Optional prediction error scalar. Optimizers with meta-learning (e.g. Autostep) use this for meta-gradient computation. LMS ignores it.
Returns:
(step, new_state) where step has the same shape as gradient
Raises: NotImplementedError: If the optimizer does not support this
Source code in src/alberta_framework/core/optimizers.py
TDOptimizerUpdate
¶
Result of a TD optimizer update step.
Attributes: weight_delta: Change to apply to weights bias_delta: Change to apply to bias new_state: Updated optimizer state metrics: Dictionary of metrics for logging
TDOptimizer
¶
Bases: ABC
Base class for TD optimizers.
TD optimizers handle temporal-difference learning with eligibility traces. They take TD error and both current and next observations as input.
init(feature_dim)
abstractmethod
¶
Initialize optimizer state.
Args: feature_dim: Dimension of weight vector
Returns: Initial optimizer state
update(state, td_error, observation, next_observation, gamma)
abstractmethod
¶
Compute weight updates given TD error.
Args: state: Current optimizer state td_error: TD error delta = R + gamma*V(s') - V(s) observation: Current observation phi(s) next_observation: Next observation phi(s') gamma: Discount factor gamma (0 at terminal)
Returns: TDOptimizerUpdate with deltas and new state
Source code in src/alberta_framework/core/optimizers.py
TDIDBD(initial_step_size=0.01, meta_step_size=0.01, trace_decay=0.0, use_semi_gradient=True)
¶
Bases: TDOptimizer[TDIDBDState]
TD-IDBD optimizer for temporal-difference learning.
Extends IDBD to TD learning with eligibility traces. Maintains per-weight adaptive step-sizes that are meta-learned based on gradient correlation in the TD setting.
Two variants are supported: - Semi-gradient (default): Uses only phi(s) in meta-update, more stable - Ordinary gradient: Uses both phi(s) and phi(s'), more accurate but sensitive
Reference: Kearney et al. 2019, "Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning"
Attributes: initial_step_size: Initial per-weight step-size meta_step_size: Meta learning rate theta trace_decay: Eligibility trace decay lambda use_semi_gradient: If True, use semi-gradient variant (default)
Args: initial_step_size: Initial value for per-weight step-sizes meta_step_size: Meta learning rate theta for adapting step-sizes trace_decay: Eligibility trace decay lambda (0 = TD(0)) use_semi_gradient: If True, use semi-gradient variant (recommended)
Source code in src/alberta_framework/core/optimizers.py
init(feature_dim)
¶
Initialize TD-IDBD state.
Args: feature_dim: Dimension of weight vector
Returns: TD-IDBD state with per-weight step-sizes, traces, and h traces
Source code in src/alberta_framework/core/optimizers.py
update(state, td_error, observation, next_observation, gamma)
¶
Compute TD-IDBD weight update with adaptive step-sizes.
Implements Algorithm 3 (semi-gradient) or Algorithm 4 (ordinary gradient) from Kearney et al. 2019.
Args: state: Current TD-IDBD state td_error: TD error delta = R + gamma*V(s') - V(s) observation: Current observation phi(s) next_observation: Next observation phi(s') gamma: Discount factor gamma (0 at terminal)
Returns: TDOptimizerUpdate with weight deltas and updated state
Source code in src/alberta_framework/core/optimizers.py
1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 | |
AutoTDIDBD(initial_step_size=0.01, meta_step_size=0.01, trace_decay=0.0, normalizer_decay=10000.0)
¶
Bases: TDOptimizer[AutoTDIDBDState]
AutoStep-style normalized TD-IDBD optimizer.
Adds AutoStep-style normalization to TDIDBD for improved stability and reduced sensitivity to the meta step-size theta.
Reference: Kearney et al. 2019, Algorithm 6 "AutoStep Style Normalized TIDBD(lambda)"
Attributes: initial_step_size: Initial per-weight step-size meta_step_size: Meta learning rate theta trace_decay: Eligibility trace decay lambda normalizer_decay: Decay parameter tau for normalizers
Args: initial_step_size: Initial value for per-weight step-sizes meta_step_size: Meta learning rate theta for adapting step-sizes trace_decay: Eligibility trace decay lambda (0 = TD(0)) normalizer_decay: Decay parameter tau for normalizers (default: 10000)
Source code in src/alberta_framework/core/optimizers.py
init(feature_dim)
¶
Initialize AutoTDIDBD state.
Args: feature_dim: Dimension of weight vector
Returns: AutoTDIDBD state with per-weight step-sizes, traces, h traces, and normalizers
Source code in src/alberta_framework/core/optimizers.py
update(state, td_error, observation, next_observation, gamma)
¶
Compute AutoTDIDBD weight update with normalized adaptive step-sizes.
Implements Algorithm 6 from Kearney et al. 2019.
Args: state: Current AutoTDIDBD state td_error: TD error delta = R + gamma*V(s') - V(s) observation: Current observation phi(s) next_observation: Next observation phi(s') gamma: Discount factor gamma (0 at terminal)
Returns: TDOptimizerUpdate with weight deltas and updated state
Source code in src/alberta_framework/core/optimizers.py
1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 | |
optimizer_from_config(config)
¶
Reconstruct an optimizer from a config dict.
Args:
config: Dict with "type" key and constructor kwargs
Returns: Reconstructed optimizer instance
Raises: ValueError: If the optimizer type is unknown
Source code in src/alberta_framework/core/optimizers.py
bounder_from_config(config)
¶
Reconstruct a bounder from a config dict.
Args:
config: Dict with "type" key and constructor kwargs
Returns: Reconstructed bounder instance
Raises: ValueError: If the bounder type is unknown