Why your grid bot backtest looks better than your live results will

A backtest that produces a clean, rising equity curve is not evidence that a strategy works. It is evidence that the strategy worked on one specific historical price series — which you already had in front of you when you designed the setup. There are four ways grid bot backtests systematically flatter the strategy, and each one has a specific antidote.

1. Period selection bias

The most common problem is also the simplest. You pull 30 or 60 days of OHLCV data, run the backtest, see a strong result, and treat it as validation. What you have actually done is found a period where the market happened to cooperate — price stayed within a range, oscillated regularly, and didn't trend hard enough to exit your boundaries.

This is not bad luck or deliberate cheating. It is a natural consequence of how humans select data. We tend to pull recent data, and recent data is whatever the market was doing when we decided to start testing. If the market has been ranging for the past two months, a backtest on those two months will look excellent. When you deploy live and the market enters a trend — as it eventually will — the result will be nothing like the backtest.

The antidote is to test across multiple periods that you did not select because they looked good. At minimum, run the same configuration against a 30-day ranging period, a 30-day trending period, and a 30-day high-volatility period. If the strategy only survives one of those three, it is not robust — it is period-specific.

Minimum backtest portfolio for a meaningful validation:

  Period A: Recent 30–60 days (current regime)
  Period B: A month where the asset trended >20% in one direction
  Period C: A month where realised vol exceeded 80% annualised
  Period D: A quiet month where price moved <10% total

If the strategy is profitable across A, B, C, and D — or at
least survives B and C without liquidating — that is a more
honest signal than a single clean-looking period.

2. Range look-ahead bias

Look-ahead bias in grid bots is subtler than in other strategies. You are not peeking at future prices to decide when to enter — but you may be using the historical data to set the range, which is just as distorting.

If you pull 60 days of BTC data, notice that price oscillated between $92,000 and $108,000, and then set your grid range to $91,000–$109,000, you have built a range that is perfectly sized for the period you are testing. The backtest will show few or no breakouts because the range was calibrated to fit the data. Live, you will not have that luxury — you will set the range before knowing what the next 60 days look like.

The test for look-ahead bias is simple: could you have arrived at this range configuration without looking at the period you are testing? If the answer is no, the backtest is contaminated. Use the volatility-based range sizing method — deriving the range from a prior period's realised vol — so the range is set from data that predates the test window.

Clean range-setting process for backtesting:

  Test period:    1 Feb – 28 Feb
  Range source:   30d realised vol from 1 Jan – 31 Jan (prior period)
  Calculation:    ±1.5σ from entry price as at 1 Feb

This ensures the range was derivable without any knowledge
of what happened during the test period.

3. Candle size overstates fill frequency

The simulator's backtest engine uses a high-low sweep model: for each candle, it checks whether the candle's high crossed any ask orders (filling them top-down) and whether the low crossed any bid orders (filling them bottom-up). This is conservative and correct for small candles. For large candles — 4-hour or daily — it becomes a significant source of overstated fills.

Consider a 4-hour candle with a low of $97,000 and a high of $103,000, on a grid with levels at $98,000, $99,000, $100,000, $101,000, and $102,000. The high-low sweep model fills all five levels — two buys and two sells — and counts four completed round trips. In reality, price may have moved from $100,000 to $103,000 to $97,000 in a single sweep, completing only one round trip before the candle closed. The sweep model cannot distinguish these cases.

The result is that backtests on 4-hour or daily candles will show more round trips, and therefore more income, than actually occurred. The effect compounds over a long backtest period. On 1-hour candles the distortion is smaller; on 15-minute candles it is small enough to be negligible for most grid spacings.

Candle interval	Fill overcount risk	Use for
1 – 5 minutes	Negligible	Short-duration accuracy checks (7–30 days)
15 – 30 minutes	Low	Standard backtests up to 90 days
1 hour	Moderate on tight grids	Multi-month backtests — adequate for most setups
4 hours	High on spacing < 1%	Long-period overviews only — not for income estimates
Daily	Very high	Structural breakout analysis only — not fill counting

4. The single-path problem

A backtest produces one result: what happened on one specific sequence of prices. That sequence is one path out of the enormous number of paths that could have plausibly occurred given the same starting conditions. A strategy that performed well on that specific path may have performed poorly on most other plausible paths with similar statistical properties.

This is not a flaw in backtesting — it is its inherent nature. A backtest tells you what happened. It cannot tell you what was likely to happen. A Monte Carlo simulation, calibrated to the same period's realised volatility, shows the distribution of outcomes across many paths with similar statistical properties. The two tools together give a more complete picture than either alone.

The practical implication: if a backtest shows a strong result but the Monte Carlo on the same configuration shows a median outcome that is flat or negative, the backtest result was probably a lucky path — not a signal of genuine edge. If both show strong results, the configuration is more robustly profitable. If the backtest shows a loss but Monte Carlo shows a positive median, the historical period was unusually hostile and the strategy may still have merit.

Note: the simulator is designed for exactly this combined workflow. Run the backtest, note the result, then switch to Monte Carlo with the same configuration and compare the single historical path against the simulated distribution. The backtest result will appear as a single line overlaid on the fan chart where available.

A checklist before trusting a backtest

Before treating a backtest result as evidence:

  ☐ Tested across at least 3 distinct market periods
     (ranging, trending, high-vol)

  ☐ Range was set from data prior to the test period
     (not fitted to the test data)

  ☐ Used candles of 1 hour or smaller

  ☐ Compared against Monte Carlo on the same configuration
     — backtest result is consistent with the P50 outcome

  ☐ Checked whether the strategy would have survived
     the worst period in the dataset without liquidating

If any box is unticked, the backtest result carries less
weight than it appears to.

Warning: a grid bot backtest on a single ranging period with a range fitted to that period's price action is essentially circular — the setup was designed to work on the data it is being tested against. This is one of the most common mistakes made before a first live deployment.

Try it in the simulator

Upload the same configuration against two different 30-day OHLCV files — one from a ranging period and one from a trending period. Then run Monte Carlo on the same setup with vol calibrated to each period. The contrast between the four results is the most honest picture of what the strategy actually does.

Launch the simulator →