feedback
← Back to Learn

Portfolio Backtest: Testing Combined Strategies

Why individual strategy backtests aren't enough — and how BlaveClaw's walk-forward portfolio simulation validates the whole picture.

The problem with backtesting strategies one by one

Suppose you've built three strategies, each with a respectable Sharpe ratio. You combine them in a portfolio. Is the portfolio better than any single strategy? You can't know without simulating how they interact.

Three things that only appear at the portfolio level:

  • Correlation: If two strategies both go long BTC on the same signal, combining them doesn't diversify — it just doubles the exposure.
  • Weight sensitivity: Allocating 80% to your best backtest strategy and 20% to the others might look optimal in-sample, but that allocation was "found" using data the optimizer has already seen.
  • Drawdown stacking: Strategies can draw down at the same time (e.g., all trend-following strategies fail in ranging markets). Portfolio MDD is not the average of individual MDDs.

What the management backtest does

BlaveClaw's management_backtest.py runs a walk-forward simulation — the same method the live manager uses, but replayed on historical data. It is the only valid way to estimate portfolio-level performance.

1
Start from the first out-of-sample day

The first lookback days (default: 365) are used as the initial calibration window. Performance reporting starts only after this — strictly out-of-sample.

2
Optimize weights on the past window

Each day, the optimizer finds the weight vector w that maximizes the slope/volatility of the combined equity curve over the past 365 days. Constraints: weights sum to 1, all ≥ 0 (no shorting strategies).

3
Apply weights to tomorrow's returns

The weights just computed are applied to the next day's actual strategy returns. The optimizer never sees the day it's predicting — this is what makes it out-of-sample.

4
Roll forward one day and repeat

The window slides forward. Each day gets fresh weights based on the most recent 365-day history.

The random portfolio benchmark

To know whether dynamic weight optimization actually adds value, the management backtest generates 1,000 random portfolios using the same strategies but with static random weights (sampled from a Dirichlet distribution). The key output metric is:

Managed beats X% of 1,000 random portfolios on Sharpe.
If X > 70%, dynamic allocation is adding value above chance. If X < 50%, the optimizer is not helping — the portfolio's edge comes entirely from individual strategy quality, not weight management.

This benchmark matters because a manager that "beats random" in backtesting has a stronger case for doing so live. One that doesn't is either underperforming or relying on overfitted weights.

Reading the three-panel chart

The output manager/pnl.png has three panels:

PanelWhat it showsWhat to look for
Top — Cumulative Return Managed portfolio (green) vs. p5–p95 band of 1,000 random portfolios (grey) Managed line should trend above the random median. Consistent outperformance > 2 years is meaningful.
Middle — Drawdown Managed portfolio's drawdown from equity peak MDD should be tolerable. Deep drawdowns that recover slowly suggest low diversification or strategies too correlated.
Bottom — Weight History Each strategy's allocation % over time Stable weights → strategies have consistent relative performance. Rapidly switching weights → the optimizer is chasing short-term noise.

How weights are optimized: slope / volatility

The optimizer maximizes this objective for the combined portfolio:

slope_of_cumsum(R · w, last 365d)
─────────────────────────────────────
    volatility(R · w, last 365d)

This is similar to a Sharpe ratio, but uses the linear slope of the cumulative returns curve rather than the arithmetic mean. Slope is more stable than mean for daily returns, which tend to be noisy — it measures trend consistency rather than average return magnitude.

The constraint is sum(w) = 1, w ≥ 0. This means all capital is allocated across the strategies, and no strategy can receive negative weight (no shorting strategies against each other). The optimizer runs with 10 random restarts to avoid local optima.

Leverage and target volatility

The manager has a separate concern from weight allocation: how much total leverage to apply to the portfolio.

After optimizing weights, manager.py computes the portfolio's realized annual volatility and scales it to match a target:

leverage = target_vol / portfolio_ann_vol

Default target is 30% annual volatility. If the portfolio's realized vol is 15%, leverage = 2x. If it's 40%, leverage = 0.75x (de-leveraged).

MDD rule of thumb: Target vol ≈ acceptable MDD ÷ 2. If you're willing to see a −20% portfolio drawdown, set --target-vol 0.10. At 30% target vol, expect to see drawdowns in the −40–60% range in bad periods.

When to run the management backtest

Run it before going live whenever:

  • You add a new strategy to the portfolio
  • You remove a strategy
  • Any individual strategy has significantly changed parameters
  • More than 3 months have passed since the last run
python3 manager/management_backtest.py --lookback 365 --random-n 1000

Run manager.py separately to update live weights after the backtest confirms the portfolio is sound:

python3 manager/manager.py --lookback 365 --account 10000 --target-vol 0.30

Warning signs in the management backtest

Warning signWhat it likely means
Managed Sharpe < random medianDynamic allocation is hurting, not helping. Consider equal-weight allocation instead.
Weights oscillate wildly (bottom panel)Strategies have similar edge — the optimizer is fitting noise. Use longer lookback or add more strategies.
One strategy receives >80% weight consistentlyThe other strategies are not contributing. Evaluate whether they're worth running at all.
MDD exceeds 2× individual strategy MDDStrategies are drawing down together. They may be too correlated to provide diversification.
OOS period < 180 daysNot enough data — results are statistically unreliable. Run individual strategy backtests longer first.
← Back to Learn