A Self-Learning Prediction System

by Martin Russmann

A Self-Learning Prediction System

AI Workshop: Prediction, Backtesting, Correlation Analysis & Causation Studies

Whitepaper v1.2

Author: Martin Russmann — mrussmann@proton.me

Date: December 23, 2025

Abstract

This whitepaper specifies a self-learning algorithm architecture for cryptocurrency direction prediction, currently instantiated for Bitcoin. The design integrates (i) an ensemble-based prediction module with heterogeneous classifiers, (ii) a continuous validation pipeline with prequential evaluation, (iii) an autonomous multi-objective optimizer that updates operating parameters without human intervention, and (iv) a walk-forward backtesting engine for strategy validation. Feature construction employs advanced technical indicators including MACD, Bollinger Bands, ATR, and Stochastic Oscillator, augmented with multi-method feature selection (variance thresholding, correlation filtering, univariate scoring, and tree-based importance). The optimization scalarizes balanced accuracy, false-positive control, probability calibration, and temporal stability. Cross-asset correlation analysis against Gold, NASDAQ, and Treasury yields provides market regime context. All statistical testing protocols, economic significance assessments, and capacity/transaction-cost models will be defined in subsequent versions.

1. Purpose and Scope

The objective is a production-feasible architecture that continuously minimizes decision error under non-stationarity by closing a prediction–validation–optimization loop. In financial time series, distributional drift invalidates static models; accordingly, the system is designed to track rather than assume stationarity. The target use case is real-time, direction-only signals across short horizons (1h, 2h, 3h). This whitepaper documents design principles, mathematical definitions, feature engineering, validation protocols, and optimization criteria.

Design Rationale (First Principles)

Two constraints govern the design: (i) predictive distributions must be well-calibrated to support risk-aware decisions; (ii) adaptation must be safe, i.e., parameter updates should not induce instability or leak future information. The ensemble provides variance reduction and disagreement signals; the validation window provides prequential feedback using only past information; and the optimizer navigates bias–variance–drift trade-offs under explicit safeguards.

2. Notation and Symbols

Symbol	Meaning
\(t_i\)	Decision timestamp for sample \(i\)
\(h\)	Prediction horizon (\(\in\{1\text{h},2\text{h},3\text{h}\}\))
\(P_{t}\)	Asset price at time \(t\) (close price)
\(\Delta r_{i\to i+h}\)	Log-return from \(t_i\) to \(t_i{+}h\)
\(y_i^{(h)}\)	Realized label at horizon \(h\) (1: up, 0: down)
\(\hat{p}_i^{(h)}\)	Predicted probability of "up" at \(t_i\) for horizon \(h\)
\(\hat{y}_i^{(h)}\)	Hard decision: \(\mathbb{1}[\hat{p}_i^{(h)} \ge \gamma^{(h)}]\)
\(W_t^{(h)}\)	Rolling validation window up to time \(t\) for horizon \(h\)
\(\theta\)	Parameter vector (lags, thresholds, weights, etc.)
\(\bm{w}\)	Objective weights in scalarization

3. System Overview

The system is a closed loop

\[ \mathcal{S} \;=\; \langle \mathcal{P},\, \mathcal{V},\, \mathcal{O},\, \mathcal{B} \rangle, \]

with prediction module \(\mathcal{P}\), validation module \(\mathcal{V}\), optimizer \(\mathcal{O}\), and backtesting engine \(\mathcal{B}\). Each horizon \(h\in\{1\text{h},2\text{h},3\text{h}\}\) is handled independently to avoid cross-horizon interference.

3.1 Prediction Module \(\mathcal{P}\)

An intentionally heterogeneous voting ensemble is used:

\[ \mathcal{M} \;=\; \{ \text{RF},\, \text{ET},\, \text{GB},\, \text{AB},\, \text{BG} \}, \]

comprising Random Forest (RF), Extra Trees (ET), Gradient Boosting (GB), AdaBoost (AB), and Bagging (BG). Heterogeneity ensures that errors are imperfectly correlated.

Ensemble Configuration:

Model	Estimators	Max Depth	Learning Rate
Random Forest	1000	10	—
Extra Trees	100	None	—
Gradient Boosting	100	3	0.1
AdaBoost	50	—	1.0
Bagging	10	—	—

For each horizon \(h\) and time \(t_i\), the module outputs:
- Probability of "up" \(\hat{p}^{(h)}_{i}\in[0,1]\)
- Hard decision \(\hat{y}^{(h)}_{i}=\mathbb{1}[\hat{p}^{(h)}_{i}\ge \gamma^{(h)}]\)
- Consensus ratio (fraction of models agreeing on prediction)
- Confidence level (High/Medium/Low)

3.2 Validation Module \(\mathcal{V}\)

Labels are realized after horizon \(h\):

\[ \Delta r_{i\to i+h} \;=\; \log P_{t_i+h} - \log P_{t_i}, \]

\[ y^{(h)}_{i} = \begin{cases} 1, & \Delta r_{i\to i+h} > 0,\\ 0, & \Delta r_{i\to i+h} \le 0. \end{cases} \]

A rolling window maintains recent predictions and outcomes:

\[ W_t^{(h)} \;=\; \big\{\,(\hat{y}^{(h)}_{i},\, \hat{p}^{(h)}_{i},\, y^{(h)}_{i}) \;:\; t - \tau_h < t_i \le t \,\big\},\qquad |W_t^{(h)}|\le 500. \]

3.3 Prequential Evaluation Protocol

Algorithm: Leakage-Safe Prequential Loop (per horizon \(h\))

1.  Initialize parameters θ, thresholds γ^(h)
2.  For each decision time t_i:
3.      Ingest raw data with timestamps ≤ t_i
4.      Compute features x_i using rolling windows ending at t_i
5.      Compute p̂_i^(h) ← EnsemblePredict(x_i; θ)
6.      ŷ_i^(h) ← 𝟙[p̂_i^(h) ≥ γ^(h)]
7.      Store (ŷ_i^(h), p̂_i^(h)) in buffer
8.      If t ≥ t_i + h:
9.          Form y_i^(h) using Δr_{i→i+h}
10.         Append to W_t^(h); trim to |W_t^(h)| ≤ 500
11.         If optimization cycle trigger:
12.             θ ← Optimize(θ; W_t^(h), w)

4. Optimization Objectives

Let \(TP, TN, FP, FN\) be computed on \(W_t^{(h)}\).

Balanced Accuracy (discriminative parity):

\[ f_1(\theta) \;=\; \tfrac{1}{2}\Big(\tfrac{TP}{TP+FN} + \tfrac{TN}{TN+FP}\Big)\;\in[0,1]. \]

Specificity (false-positive control):

\[ f_2(\theta) \;=\; \tfrac{TN}{TN+FP}\;\in[0,1]. \]

Calibration (ECE complement):

\[ f_3(\theta) \;=\; 1 - \min(1, \mathrm{ECE})\;\in[0,1]. \]

Belief Stability (temporal smoothness):

\[ f_4(\theta) \;=\; 1 - \frac{1}{|W|-1}\sum_{i=1}^{|W|-1}\big|\hat{p}_{i+1} - \hat{p}_{i}\big| \;\in[0,1]. \]

Scalarization:

\[ \theta^{*} \;=\; \arg\max_{\theta \in \Theta} \; \sum_{k=1}^{4} w_k\, f_k\!\left(\theta,\, W_t^{(h)}\right),\qquad \bm{w}=(0.4,\,0.3,\,0.2,\,0.1). \]

5. Feature Engineering

5.1 Technical Indicator Suite

The system employs a comprehensive set of technical indicators:

MACD (Moving Average Convergence Divergence):

\[ \text{MACD} = \text{EMA}_{12}(P) - \text{EMA}_{26}(P) \]
\[ \text{Signal} = \text{EMA}_{9}(\text{MACD}) \]
\[ \text{Histogram} = \text{MACD} - \text{Signal} \]

With bullish/bearish crossover detection.

Bollinger Bands:

\[ \text{Middle} = \text{SMA}_{20}(P) \]
\[ \text{Upper} = \text{Middle} + 2\sigma \]
\[ \text{Lower} = \text{Middle} - 2\sigma \]
\[ \%B = \frac{P - \text{Lower}}{\text{Upper} - \text{Lower}} \]

Including squeeze indicator for volatility compression detection.

Average True Range (ATR):

\[ \text{TR} = \max(H - L, |H - C_{t-1}|, |L - C_{t-1}|) \]
\[ \text{ATR} = \text{EMA}_{14}(\text{TR}) \]
\[ \text{ATR}_\% = \frac{\text{ATR}}{P} \times 100 \]

Stochastic Oscillator:

\[ \%K = \frac{C - L_{14}}{H_{14} - L_{14}} \times 100 \]
\[ \%D = \text{SMA}_3(\%K) \]

With overbought/oversold thresholds (80/20) and crossover signals.

RSI (Relative Strength Index):

\[ \text{RSI} = 100 - \frac{100}{1 + \frac{\text{Avg Gain}}{\text{Avg Loss}}} \]

5.2 Multi-Method Feature Selection

The system employs a consensus-based feature selection approach:

Variance Thresholding: Remove features with variance below threshold \(\tau = 0.01\).

Correlation-Based Filtering: For highly correlated feature pairs (\(\rho > 0.95\)), retain the feature with higher variance.

Univariate Selection: F-statistic and mutual information scoring:

\[ F = \frac{\text{Between-class variance}}{\text{Within-class variance}} \]

\[ I(X;Y) = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)p(y)} \]

Tree-Based Importance: Gini importance from Random Forest:

\[ \text{Importance}(X_j) = \sum_{t \in T} \frac{n_t}{n} \Delta i(t) \]

Consensus Selection: Features selected by at least 50% of methods are retained.

5.3 Feature Vector Construction

x_i = [RSI, MA_5, MA_10, MA_20, σ_10,
       MACD, MACD_Signal, MACD_Histogram,
       BB_Percent, BB_Squeeze,
       ATR_Percent,
       Stoch_K, Stoch_D,
       Volume_Ratio, Price_Lags_1..5]

6. Backtesting Engine

6.1 Walk-Forward Validation

The backtesting engine implements walk-forward analysis to prevent lookahead bias:

\[ \text{For each period } [t_s, t_e]: \]
\[ \text{Train on } [t_s - W, t_s), \text{ test on } [t_s, t_s + S) \]

Where \(W\) is the training window size and \(S\) is the step size.

Default Configuration:
- Training window: 30 days
- Step size: 7 days
- Lookback for feature calculation: 60 days

6.2 Parameter Search

Grid Search: Exhaustive search over parameter grid with up to 100 combinations.

Random Search: Stochastic sampling with configurable iterations (default: 50).

Quick Test: Predefined parameter sets for rapid validation.

6.3 Performance Metrics

The performance analyzer computes:

Directional accuracy
Precision and recall (per class)
F1-score
Sharpe ratio (where applicable)
Maximum drawdown
Win/loss ratio

7. Cross-Asset Correlation Analysis

7.1 Multi-Asset Framework

The system analyzes correlations between Bitcoin and:
- Gold (GC=F)
- NASDAQ Composite (^IXIC)
- 10-Year Treasury Yield (^TNX)

7.2 Correlation Computation

For assets \(X\) and \(Y\) over period \(T\):

\[ \rho_{XY} = \frac{\sum_{t=1}^{T}(r_{X,t} - \bar{r}_X)(r_{Y,t} - \bar{r}_Y)}{\sqrt{\sum_{t=1}^{T}(r_{X,t} - \bar{r}_X)^2 \sum_{t=1}^{T}(r_{Y,t} - \bar{r}_Y)^2}} \]

Where \(r_{X,t}\) represents log-returns.

7.3 Time Periods

Correlations are computed over multiple horizons:
- Short-term: 1 week, 1 month, 3 months
- Medium-term: 6 months, 1 year
- Long-term: 2 years, 5 years

This multi-horizon approach captures regime-dependent correlation dynamics.

8. Optimization and Adaptation

8.1 Search and Safeguards

The optimizer employs simulated annealing with:
- Box constraints on all parameters
- Maximum step sizes per parameter component
- Performance floor triggering rollback
- Rate limiting on consecutive accepted steps

8.2 Exploration–Exploitation Control

\[ \Delta\theta \sim \begin{cases} \mathcal{N}(0,\, \sigma_{\text{exploit}}^2 I), & \text{if performance} \ge \pi,\\[0.25em] \mathcal{N}(0,\, \sigma_{\text{explore}}^2 I), & \text{otherwise}, \end{cases} \]

with \(\sigma_{\text{exploit}}{=}0.01\), \(\sigma_{\text{explore}}{=}0.10\), and performance threshold \(\pi = 0.55\).

8.3 Adaptive Parameter Bounds

Parameter	Range	Description
Decision threshold	[0.45, 0.65]	Classification cutoff
RSI period	[10, 20]	Momentum lookback
MA periods	[5, 50]	Trend identification
Volatility window	[5, 20]	Risk estimation
Ensemble weights	[0, 1]	Model contribution

9. Implementation Configuration

system:
  optimization_cycle: 3600        # seconds (hourly)
  prediction_horizons: [1h, 2h, 3h]
  max_predictions_stored: 500

feature_engineering:
  moving_average_periods: [5, 10, 20]
  rsi_period: 14
  lag_periods: [1, 2, 3, 4, 5]
  volatility_window: 10
  enable_volume_features: true
  enable_advanced_indicators: true

model_parameters:
  random_forest:
    n_estimators: 1000
    max_depth: 10
  extra_trees:
    n_estimators: 100
  gradient_boosting:
    n_estimators: 100
    learning_rate: 0.1
  minimum_models_agreement: 4

optimization:
  objectives:
    balanced_accuracy: 0.40
    specificity: 0.30
    calibration: 0.20
    stability: 0.10
  min_predictions_for_optimization: 3
  target_accuracy: 65.0
  max_parameter_change: 0.30

backtesting:
  window_size: 30
  step_size: 7
  lookback_days: 60
  max_workers: 4

10. Limitations

The system focuses on short-term horizons (1-3 hours); longer horizons require different feature engineering.
Current scalarization fixes objective weights; adaptive weight selection is under development.
Transaction costs and market impact are not modeled in the current version.
Statistical significance testing protocols are deferred to subsequent versions.

11. Future Work

Evaluation Protocols: Dependence-robust testing, economic utility evaluation.
Regime-Aware Weights: Online adaptation of feature/ensemble weights by regime classifiers.
Extended Correlation Analysis: Dynamic correlation tracking and regime-switching models.
Cross-Asset Transfer: Multi-task learning across BTC, ETH, and other liquid assets.
Optimizer Upgrades: CMA-ES/NSGA-II and Pareto-front logging.

12. Conclusion

A self-learning architecture can maintain usable directional performance in non-stationary markets by continuously closing the loop between prediction, validation, and optimization. The integration of advanced technical indicators, multi-method feature selection, walk-forward backtesting, and cross-asset correlation analysis provides a comprehensive framework for cryptocurrency price prediction. Careful objective design and conservative exploration guard against overfitting while enabling adaptation to changing market conditions.

ask_aiPREMIUM[+]

AI-powered Q&A is a Premium feature.

Ask questions about this article and get intelligent answers powered by AI.