experiments
experiments
¶
Multi-seed experiment runner for publication-quality analysis.
Provides infrastructure for running experiments across multiple seeds with optional parallelization and aggregation of results.
ExperimentConfig
¶
Bases: NamedTuple
Configuration for a single experiment.
Attributes: name: Human-readable name for this configuration learner_factory: Callable that returns a fresh learner instance stream_factory: Callable that returns a fresh stream instance num_steps: Number of learning steps to run
SingleRunResult
¶
Bases: NamedTuple
Result from a single experiment run.
Attributes: config_name: Name of the configuration that was run seed: Random seed used for this run metrics_history: List of metric dictionaries from each step final_state: Final learner state after training
MetricSummary
¶
Bases: NamedTuple
Summary statistics for a single metric.
Attributes: mean: Mean across seeds std: Standard deviation across seeds min: Minimum value across seeds max: Maximum value across seeds n_seeds: Number of seeds values: Raw values per seed
AggregatedResults
¶
Bases: NamedTuple
Aggregated results across multiple seeds.
Attributes: config_name: Name of the configuration seeds: List of seeds used metric_arrays: Dict mapping metric name to (n_seeds, n_steps) array summary: Dict mapping metric name to MetricSummary (final values)
run_single_experiment(config, seed)
¶
Run a single experiment with a given seed.
Args: config: Experiment configuration seed: Random seed for the stream
Returns: SingleRunResult with metrics and final state
Source code in src/alberta_framework/utils/experiments.py
aggregate_metrics(results)
¶
Aggregate results from multiple seeds into summary statistics.
Args: results: List of SingleRunResult from multiple seeds
Returns: AggregatedResults with aggregated metrics
Source code in src/alberta_framework/utils/experiments.py
run_multi_seed_experiment(configs, seeds=30, parallel=True, n_jobs=-1, show_progress=True)
¶
Run experiments across multiple seeds with optional parallelization.
Args: configs: List of experiment configurations to run seeds: Number of seeds (generates 0..n-1) or explicit list of seeds parallel: Whether to use parallel execution (requires joblib) n_jobs: Number of parallel jobs (-1 for all CPUs) show_progress: Whether to show progress bar (requires tqdm)
Returns: Dictionary mapping config name to AggregatedResults
Source code in src/alberta_framework/utils/experiments.py
get_metric_timeseries(results, metric='squared_error')
¶
Get mean and standard deviation timeseries for a metric.
Args: results: Aggregated results metric: Name of the metric
Returns: Tuple of (mean, lower_bound, upper_bound) arrays
Source code in src/alberta_framework/utils/experiments.py
get_final_performance(results, metric='squared_error', window=100)
¶
Get final performance (mean, std) for each config.
Args: results: Dictionary of aggregated results metric: Metric to evaluate window: Number of final steps to average
Returns: Dictionary mapping config name to (mean, std) tuple
Source code in src/alberta_framework/utils/experiments.py
extract_hyperparameter_results(results, metric='squared_error', param_extractor=None)
¶
Extract results indexed by hyperparameter value.
Useful for creating hyperparameter sensitivity plots.
Args: results: Dictionary of aggregated results metric: Metric to evaluate param_extractor: Function to extract param value from config name
Returns: Dictionary mapping param value to (mean, std) tuple