Expect a Worse Drawdown Than Your Backtest

STS ResearchPublished June 13, 2026Data through 2026-06-11

The worst drawdown in our NQ backtest was $38,695, or 12.72% of peak equity. That is the number on our tear sheet. It is also the most misleading number on it. When we reshuffled the same 5,424 trades 50,000 times, the typical worst drawdown came out around $53,000, and a bad-but-still-normal one around $80,000. The backtest figure sat near the bottom 5% of all those paths. In plain terms: our realized drawdown was lucky, and you should plan for a bigger one.

This is the honest answer to a fair question a buyer asks us: will that 12.72% hold up live? Probably not. We work in dollars from here on, all per one mini contract. A reshuffled drawdown can hit at any point on the equity curve, so it does not map to one clean percent. Here is the data, and here is what to do about it.

$38,695

Worst drawdown in the backtest (12.72%)

$53k

Monte-Carlo median drawdown (a normal future low)

$80k

Monte-Carlo p95 (bad but still normal)

~5th

Percentile our realized drawdown landed in (lucky)

Whose trades are these (read this first)

These numbers are from our own book: six systematic NQ strategies run as one book that holds a single position at a time. TradingView backtests, 2011 to 2026, one to three contracts scaled by volatility, commissions and slippage included, $1,000,906 net. The style is momentum and trend continuation, intraday plus one overnight model. Not mean reversion, not scalping.

That matters here in one specific way. The drawdown math below is about the order trades happen in, and that is not unique to our system. Any track record, yours included, is one ordering of history. So the method transfers cleanly even though our exact dollar figures do not. A discretionary trader can run this same test on their own closed trades and learn the same lesson. The $53,000 is ours. The idea that your realized drawdown is one lucky draw is everyone's.

The backtest drawdown is one roll of the dice

A max drawdown is the worst peak-to-trough dip your equity took. Most people read it as a hard floor: "the worst this can do is 12.72%." That is the mistake.

Your drawdown does not depend only on which trades you took. It depends heavily on the order they arrived in. Cluster your losers together and you get a deep drawdown. Spread them out and you barely notice them. The backtest shows you exactly one order: the one history happened to deal. Billions of other orders would have produced the same final profit with a very different worst dip.

So we tested the other orders.

We took the real 5,424 trades, kept every win and loss exactly as it was, and shuffled the sequence at random 50,000 times. Each shuffle is a full alternate history with the same edge and the same trades, just dealt in a different order. For each one we measured the worst drawdown. That gives a distribution: not one drawdown number, but the full range of drawdowns this strategy can produce.

A normal future drawdown is bigger than the one you have seen

Here is that distribution.

Distribution chart of Monte-Carlo drawdowns for our NQ book, 50,000 reshuffles of 5,424 trades, dollars per 1 mini contract. The backtest max drawdown of $38,695 sits at about the 5th percentile, near the low end. The middle 90% of reshuffled drawdowns runs from $39k (p5) to $80k (p95), with a median of $53k. About 95 of every 100 shuffled paths drew down more than the backtest did. — The backtest's $38,695 is the green line near the floor. The typical reshuffled drawdown (median) is $53k, and a bad-but-normal one (p95) is $80k.

Read it left to right. The green line is our actual backtested drawdown, $38,695. The amber band is the middle 90% of what the strategy can do. The median, the most ordinary outcome, is about $53,000. The p95, the level only 1 in 20 shuffles got worse than, is about $80,000.

Our realized drawdown landed at roughly the 5th percentile of all those paths. Said another way: about 95 of every 100 reshuffled histories drew down more than ours did. Our own earlier risk work flagged the same realized drawdown as low as the 2nd percentile. Either way, the verdict is the same. We did not get a typical run. We got a smooth one.

The three numbers side by side make the point harder to ignore.

Bar chart comparing three drawdown figures for the same NQ book, per 1 mini contract: the backtest max of $38,695 (12.72% of peak equity), the Monte-Carlo median of $53,000 (about 37% bigger), and the Monte-Carlo p95 of $80,000 (about twice the backtest). The tear-sheet number is the smallest of the three. — Same book, three honest drawdown numbers. The one on the tear sheet is the smallest. Plan around the middle one, survive the right one.

The median drawdown is about 37% bigger than the backtest. The p95 is about twice the backtest. Nothing changed about the strategy. We just stopped pretending the one history we saw was the only one possible.

The takeaway

A backtest max drawdown is the best case dressed up as the worst case. Size your account so that the drawdown you have NOT seen yet, roughly the Monte-Carlo p95, is survivable. For our book that is about $80,000 per mini, not $38,695.

Why this happens, and why the edge is still real

Two questions usually come up here, so let us answer both straight.

First: does a worse-than-backtest drawdown mean the edge is broken? No. We ran a second test on the same trades, a bootstrap. Instead of just reshuffling the trades, it draws them at random with repeats allowed. The total profit stayed strongly positive in 95% of those runs. The 5th-percentile result was still over $725,000. So the edge holds up. What is fragile is the smoothness of any single path, not the profit itself. A real edge and a deep drawdown live together comfortably.

Second: which number do I trust, $53k or $80k? Use both, for different jobs. The median ($53k) is what to expect over a long enough live run. A drawdown that size is the strategy working as designed, not a reason to quit. The p95 ($80k) is what your account has to survive. If a p95 drawdown would bust you or trip a hard account limit, you are trading too big, full stop. Our own risk rules work the same way. A new all-time drawdown still inside the shuffle's range counts as the edge working. We cut size as a drawdown gets close to the p95, and halt new entries if it blows past it.

One honest caveat. The reshuffle assumes the order of trades carries no information, that any trade was equally likely to come at any time. Real markets do not work that way. They have streaks, and losers can pile up together in one bad stretch of market. Pure shuffling understates that. The bootstrap pushes the p95 a little higher, to about $83,000. So the true forward drawdown could be a touch worse than even the shuffle says. Treating $80,000 as the ceiling is the optimistic read, not the pessimistic one.

How we measured this

Instrument: CME Nasdaq-100 E-mini (NQ), $100,000 starting capital, no compounding, one mini contract as the base unit (the live book scales to two or three when volatility allows; these drawdown dollars are stated per single mini, so a micro MNQ account is one tenth of every figure here). Data: the TradingView list-of-trades export from our live six-strategy intraday book, 2011-06-16 through 2026-06-11, 5,424 trades, commissions and slippage already inside the net P&L.

Method: this is a Monte-Carlo on real exported trades, not a new backtest. We keep the exact set of 5,424 wins and losses, shuffle their order at random 50,000 times, and record the worst peak-to-trough equity dip of each ordering. The median and the 95th percentile of those 50,000 drawdowns are the headline numbers. We cross-checked it three ways: two independent shuffle scripts (one with a fixed seed so it reproduces exactly, one using the system random source) and a direct recompute of the realized $38,695 from the export's own cumulative-P&L column. All three agreed.

The limits, plainly. A reshuffle inherits TradingView's fills because it uses TV's real trades, but it is still an approximation of the future, not a backtest of a strategy change. It assumes constant one-mini sizing and treats trades as independent of each other. It cannot model a brand-new market regime that produces losses bigger than any in the 15-year sample. What would falsify the "expect worse" claim: a live drawdown that consistently came in below the backtest's $38,695 over many years. We are not counting on it, and neither should you.

What to do with this

Run the shuffle on your own trades. If you have a backtest or a live record with at least a few hundred closed trades, reshuffle the order a few thousand times and look at the spread of drawdowns. The single max-drawdown number on your report is one draw from that spread. It is more likely to be lucky than not, because a clean run is what makes a backtest look good enough to trade.

Then size off the drawdown you have not seen. Take your strategy's Monte-Carlo p95 drawdown, not its backtested max, and make sure your account survives it with room to spare. Scale it to your own size: our $80k p95 is per one mini, so a micro (MNQ) account is a tenth of that, and running two or three minis roughly doubles or triples the dollar figure. For a prop account with a hard trailing limit, the loss line that follows your equity up, this is the difference between passing and blowing up. We worked that exact math for funded combines in prop firm trailing drawdown, where the same reshuffle decides survival instead of position size. If you also want to know whether the edge behind the drawdown is real or curve-fit, that is is my backtest overfit.

We publish our drawdowns the same way we publish our profit, including the parts that flatter us less. The full set of live numbers is on the strategy page and the tear sheet, and our subscribers get the signals from the same six systems measured here; plans are on the pricing page.

We trade this book live and sell access to the signals, so judge the data accordingly. This article is educational and is not investment advice. Futures trading involves substantial risk of loss and is not suitable for every investor.

Hypothetical performance disclaimer (CFTC Rule 4.41): hypothetical or simulated performance results have certain limitations. Unlike an actual performance record, simulated results do not represent actual trading. Also, since the trades have not been executed, the results may have under- or over-compensated for the impact, if any, of certain market factors, such as lack of liquidity. Simulated trading programs in general are also subject to the fact that they are designed with the benefit of hindsight. No representation is being made that any account will or is likely to achieve profit or losses similar to those shown. Past performance does not indicate future results.