EducationMay 1, 202610 min read

Trading Bot Backtesting: How to Tell a Real Backtest From Curve-Fit Nonsense

Most "backtests" you see online are misleading. A guide to spotting look-ahead bias, survivorship bias, and overfitting, and what an honest walk-forward test actually looks like.

By Timo from blockresearch.ai

Trading Bot Backtesting, How to Tell a Real Backtest From Curve-Fit Nonsense

Most backtests you see posted online are bullshit. That is the flat truth and I am not going to soften it. Let me address this directly, because it matters: a clean-looking equity curve is evidence of almost nothing. The skill is not in producing a nice chart. The skill is in knowing which five ways the chart is lying to you.

I have been running backtests since 2017 and I have watched hundreds of strategies die the moment they move from historical data to live capital. Every single one of them had a beautiful chart on the way in. That pattern is not a coincidence. It is the entire industry's favorite mistake, and most "bot marketplaces" depend on you not noticing it.

The five ways a backtest lies to you

Before I show you what an honest backtest looks like, learn the failure modes. If you can spot these, you can reject about 95% of the "profitable strategies" you will ever be shown.

1. Look-ahead bias

The strategy uses information that would not have been available at the time of the trade. Most common flavor: computing an indicator on a bar's close, then placing the trade on that same bar's open. You cannot do that live. The open came first. The close came later. If your backtest reverses that order, you are trading on information from the future.

Look-ahead bias is embarrassingly easy to introduce and surprisingly hard to spot in someone else's code. If a strategy looks too clean, assume look-ahead first.

2. Survivorship bias

The backtest runs on assets that exist today. The assets that went to zero or got delisted are silently excluded. For crypto, this is catastrophic, the dead coin graveyard is enormous. For stocks, it is also real; most long-only backtests over long periods implicitly exclude bankrupt companies.

If a "crypto bot" backtest runs on "top 20 coins by market cap today", that list looked completely different five years ago. The backtest is measuring the performance of the winners you already know about.

3. Period cherry-picking

The backtest runs on a period where the strategy happens to work. Two years of 2021 to 2023 on a long-biased crypto bot looks great. Add the 2022 bear in the middle and the curve changes shape. Run it in 2018 to 2019 and it often disappears entirely.

Any backtest that does not show at least one full bull-bear cycle is not a backtest. It is a screenshot of a convenient window.

4. Parameter optimization without out-of-sample validation

This is the big one. The trader runs thousands of parameter combinations and picks the one with the best historical P&L. Of course that set looks great, it was selected because it looks great. Out-of-sample, the same parameters almost always underperform, because what was actually being optimized was the noise in the data, not the signal.

If somebody shows you a backtest without telling you whether the parameters were chosen on one period and then tested untouched on another, assume the parameters were curve-fit. They usually were.

5. Ignoring fees and slippage

The backtest uses mid-market prices, zero fees, infinite liquidity, and fills at the exact tick the signal fires. Live trading has spread, exchange fees, slippage on larger orders, partial fills, and the occasional failed order. A strategy that looks profitable at zero cost can easily become a net loser at realistic costs.

Rule of thumb: add at least 0.1% round-trip cost for liquid crypto spot, more for futures, much more for anything small-cap. If the strategy does not survive that haircut, it does not survive.

What an honest backtest actually looks like

Now the positive version. Here is what I expect to see in a backtest before I take it seriously, mine, yours, or someone else's.

Multiple assets, same parameters. If the strategy only works on one coin at one timeframe, it is a fluke. Run it on at least ten assets with identical settings. Compare the distribution of results.
Train/test split. Pick parameters on the first 70% of the data. Test untouched on the last 30%. The test-period performance is what matters. The train-period performance is flattery.
Walk-forward analysis. Slide the train/test window forward in time. Re-optimize on each window's training section, evaluate on each window's test section, then concatenate all the test results. This is how real quant teams do it.
Regime separation. Report results separately for bull markets, bear markets, and sideways regimes. A strategy that only makes money in bull markets is not a strategy. It is a bull market with extra steps.
Transaction costs included. Exchange fees, spread, realistic slippage. Every trade pays the cost. If the costs are not in the report, the report is not real.
Slippage modeling. For anything beyond trivial size, assume you do not get the exact price you saw. Model it pessimistically, at least half the spread, often a full spread or more for market orders on illiquid pairs.
Drawdown, not just return. A strategy with 100% annual returns and 80% drawdown is mathematically the same as a strategy that blows you out. Always report max drawdown, Calmar ratio, and time spent underwater.
Trade count. Fifteen trades is not a backtest. It is an anecdote. Statistical significance starts at somewhere around a hundred trades; confidence really builds past a few hundred.

If those eight boxes are not all checked, the backtest is entertainment, not evidence.

Red flag vs credible, a side-by-side

Signal	Red flag backtest	Credible backtest
Win rate	87%	45 to 60%, realistically
Assets tested	1	10+ with identical parameters
Period shown	"Last 2 years"	Full cycle including a bear phase
Parameter selection	"Optimized for the period"	Train/test split or walk-forward
Fees and slippage	Not mentioned	Explicit, per-trade
Drawdown	Rarely shown	Reported alongside returns
Trade count	15	300+
Regime breakdown	None	Separate curves for bull, bear, sideways
Author	Anonymous or guru brand	Named, track record, public code or open methodology

The left column is what most "I made 800% last year with my bot" posts look like. The right column is what a backtest looks like when somebody is actually trying to know the truth rather than sell you something.

An example of the lie

Here is a story you have seen a hundred times. "My bot has a 2-year track record. 87% win rate. 300% total return. Works on BTC and ETH. Join my Telegram for the full strategy."

Pull that apart. Two years, which two? 2023 to 2025 is easy money for long-biased strategies. Try 2021 to 2023, which includes the LUNA collapse and FTX. 87% win rate, is the average loser ten times the size of the average winner? Because that combination is what a martingale looks like right before it blows up. 300% return, on what drawdown? If the curve went 60% underwater at any point, most humans would have turned it off and missed the recovery. Works on BTC and ETH, does it work on SOL, ADA, LINK, AVAX with the same parameters, or only on the two assets cherry-picked because they performed?

Almost always, if you press on those questions, the answers either stop coming or collapse the pitch.

"If the system only works when you fine-tune it endlessly, it's not a system. It's a liability."

That is the engineering version of the same point. A strategy that requires different parameters per asset, per timeframe, per month, is not a strategy. It is a researcher chasing noise.

How we backtest, concretely

I am not going to pretend our process is secret. Here is how we actually test strategies before they go into production on our own capital.

Write the rule unambiguously. If it cannot be specified in code, it cannot be backtested. Discretionary "feel" is not a rule.
Run the rule on at least ten assets simultaneously. For crypto, a representative set of top-cap and mid-cap pairs. For stocks, a cross-section of sectors.
Run it with identical parameters everywhere. No per-asset tuning. That is the single most important rule.
Split the data. First window for training, later window for testing. Do not look at the test window during parameter selection. Ever.
Include realistic costs. For crypto spot, we assume 0.1% round-trip plus 0.02% slippage baseline. For less liquid pairs, more.
Plot the equity curve per asset, and plot the blended one. Look for assets where the strategy just does not work. If it fails badly on more than 20% of the set, the strategy is fragile.
Decompose by regime. Tag every period as bull, bear, or sideways using a separate classifier. Report per-regime performance.
Run walk-forward if the strategy has any meaningful optimization. Re-select parameters on each training window, evaluate on each test window. Look at the variance of the results.
Paper trade. For at least 30 days. Live data, simulated execution. This catches bugs that the backtester hides.
Scale in live gradually. Small size first, grow only when behavior matches the backtest expectation.

Steps 1 through 8 kill most strategies. Step 9 kills another chunk. Step 10 is just common sense. By the time something is running on real capital, we usually like it a lot, because it has survived many rounds of filtering.

Why vyn premium's backtest philosophy is what it is

When we built vyn premium, the backtest philosophy was the point. We wanted a strategy that runs the same Smart Safety Orders parameters on every asset, not a per-coin fine-tune. That constraint forces the logic to be robust. If we had allowed ourselves to optimize differently for each pair, we could have produced a prettier backtest and a fragile live strategy. We chose the opposite.

The execution is mean-reversion-based, volatility-scaled, and time-controlled, and it runs across 3Commas, SignalPipe, Alpaca, and Capital.com without per-venue parameter tweaks. Everything goes through the same brain. That is a deliberate choice, and it is the opposite of how most marketplace strategies are built.

If you are building your own strategy and you want to learn the shape of an honest signal engine before paying for anything, block algo flex is the free tool we built for that. It will not do backtesting for you, but it will force you to think about what your rule actually is before you test it.

The tools

A few tools that actually do backtesting correctly, in rough order of how much work they require.

TradingView Strategy Tester. Fine for a first look. Limited to what Pine Script can express. Does not natively do walk-forward. Easy to fool yourself.
Freqtrade. Solid backtesting engine, supports walk-forward via plugins, runs in Python. Requires you to write strategies as code. The right tool if you are serious.
Backtrader, Zipline, vectorbt. Python libraries with various strengths. vectorbt is very fast for parameter sweeps. Backtrader is more event-driven. Zipline is old but battle-tested.
Custom pipelines. What most quant teams actually build. For anything truly serious, you end up writing your own because no framework fits your exact requirements.

What you should not do is trust the backtest button on a bot marketplace without understanding what it is doing. Some of them include fees, some do not. Some handle slippage, some do not. Assume the worst and verify.

FAQ

What is a realistic win rate for a good trading bot? Depends on the strategy type. Trend-following: often 30 to 45% with large asymmetry. Mean-reversion: often 55 to 70%. Grid bots: can be 90%+ but with rare catastrophic losers. Any bot advertising a stable 85%+ win rate with symmetric risk-reward is almost certainly hiding something.

What is walk-forward analysis in one sentence? Retraining the strategy's parameters on a rolling window of historical data, then testing untouched on the next window, and repeating, so every test result is true out-of-sample.

How long should a backtest cover? At minimum, one full bull-bear cycle. For crypto, that means at least 2017 through today, or whichever sub-period includes both a meaningful rally and a meaningful drawdown. Two years of one-way market is not a backtest.

Should I trust backtests that do not include fees? No. Live trading is not fee-free. Any result without fees is overstated, often by a large multiple for high-turnover strategies.

How do I know if my parameters are overfit? Test them on data you did not use to select them. If the out-of-sample performance is meaningfully worse than the in-sample, you are overfit. If it is roughly the same, you probably have a real edge.

Can I just pay for a "proven" strategy instead of testing myself? You can, but you still need to understand the methodology. Even a legitimately good strategy, like the one behind vyn premium, will underperform if you do not understand its behavior through different regimes. The backtest is not just evidence. It is education.

Why do 99% of perfect backtests fail live? Because they were optimized, consciously or accidentally, on the specific data they were tested on. Noise gets baked in as if it were signal. Live data has fresh noise that the old parameters do not fit. Drawdown immediately, hope next, recovery rarely.

Risk disclaimer

Backtesting is a tool, not a guarantee. Historical performance does not predict future results, even when measured correctly. All trading involves substantial risk of loss. This article is not financial advice.

The honest take

A backtest is a simulation. It is as good as the assumptions baked into it, and most of the assumptions in most public backtests are wrong. The correct mental posture is skeptical by default. Your own backtest is probably lying to you in at least one of the five ways above. Somebody else's backtest is probably lying in more than one.

The only defense is methodology. Multiple assets, same parameters, train/test split, walk-forward, realistic costs, regime decomposition, meaningful trade count. If your own tests do not check those boxes, fix them before you put capital behind the strategy. If somebody else's pitch does not check those boxes, walk away, there are plenty of legitimate options and almost no reason to pay for a pitch that cannot survive basic scrutiny.

The strategies that survive this process are usually less exciting than the marketing copy around them. They have moderate win rates. They have real drawdowns. They make boring, compounding returns across regimes. That boringness is the feature, not the bug. Marketable backtests are usually the ones that die first. Survivable backtests are the ones you can actually run on real money for years. That is the bar, and it is a much higher bar than the industry wants you to believe.

#backtesting#trading bot#overfitting

About the author

Timo from blockresearch.ai

Founder of Block Research. Running automated trading systems on personal and company capital since 2017, three full crypto cycles of live execution. Author of Smart Safety Orders (volatility-adaptive DCA), the mean-reversion entries inside vyn premium, and the 3-second webhook response invariant inside SignalPipe. We ship the same strategies we run on our own money.

More about Block Research →YouTube Discord

← Back to all articles