Why Backtesting Is the Most Important Part of Building a Betting Model

The difference between a betting strategy and a validated betting model is backtesting. Most bettors never test their theories against historical data - and that's why most bettors lose.

New to the API? Start with our Getting Started guide first to set up your account and API key.

What Is Backtesting?

Backtesting is running your prediction logic against historical games where you know the outcome. It's how you answer the question: "If I had bet this way for the last 5 years, would I have made money?"

The key is using opening lines, not closing lines. Opening lines are available hours before game time - that's what you can actually bet. Closing lines incorporate all the market's information right before tip-off and are nearly impossible to beat consistently. Testing against closing lines would give you unrealistically optimistic results.

The goal of backtesting is simple: measure whether your edge is real before risking money.

Why Most "Systems" Fail

Every sports bettor has heard a friend claim they have a "system." Here's why most of these systems fall apart under scrutiny:

1. Recency bias - "The Lakers are hot, bet the Lakers." Hot streaks feel predictive but are often just noise. A team that won 5 in a row is no more likely to win game 6 than their season-long win rate suggests.

2. Small sample sizes - A 5-game winning streak isn't statistically significant. Even a 20-game sample tells you almost nothing. You need hundreds of games to separate signal from noise.

3. Confirmation bias - We remember our wins vividly and forget our losses quickly. That "system" that feels like it's working might actually be underwater.

4. No baseline - Without testing, you don't know if you're beating random chance. Betting favorites at -110 wins roughly 52% of games but loses money due to the vig. Is your system actually better?

Without backtesting, you don't know if your edge is real or if you got lucky.

What Makes a Backtest Meaningful?

Not all backtests are created equal. Here's what separates a meaningful backtest from a misleading one:

1. Sample size - 50 games vs 500 games vs 5000 games matters enormously. With 50 games, a 60% win rate could easily be luck. With 5000 games, even a 52.5% win rate is statistically meaningful. More data equals more confidence in your results.

2. Time range - Testing across multiple seasons captures different market conditions. A model that only works in one specific season probably found a pattern that no longer exists.

3. Realistic lines - Always use opening lines, not closing lines. Opening lines are what you can actually bet - they're available hours before game time. Testing against closing lines will overstate your edge because you'd never get those numbers in practice.

4. Confidence filtering - Your model's conviction matters as much as its accuracy. A model that's 55% accurate overall might be 65% accurate on its highest-confidence plays. If you only bet when the model is confident, you might cut volume but significantly improve ROI.

This last point is crucial: not all predictions are equal. Backtesting lets you discover which signals are worth following and which to ignore.

Case Study: Confidence Matters More Than Win Rate

Here's a realistic example of how confidence bucketing changes everything.

A model backtested over several seasons shows:

Overall: 54.2% win rate, +3.8% ROI
High confidence (67-100%): 58% win rate, +11% ROI
Medium confidence (34-66%): 53% win rate, +2% ROI
Low confidence (0-33%): 51% win rate, -2% ROI

If you bet every game the model recommends, you make a modest profit. But if you only bet when the model shows high confidence, you triple your ROI despite betting fewer games.

This insight - that selective betting beats volume betting - is exactly the kind of discovery you can only make through backtesting.

Building Your First Model with Lab

BALLDONTLIE Lab lets you build, backtest, and deploy betting models for NBA, NFL, NHL, and MLB. You can use the web UI for a visual experience, the API for programmatic control, or give the OpenAPI spec to an AI agent to build models for you.

Option 1: Use the Web UI

Lab's web interface walks you through five steps:

Create a model - Select your sport, bet type (spread, moneyline, over/under), and assign importance to factors
Run a preview - Test your configuration before saving to see historical performance
Save and backtest - Get full 6-year historical analysis with per-game breakdowns
Generate predictions - Apply your model to upcoming games
Track results - Monitor wins/losses and refine your approach

For a detailed walkthrough with screenshots, see the Lab documentation.

Option 2: Use the API

The Lab API gives you programmatic control over model creation and backtesting. Here's a quick example:

import os
import time
import requests

API_KEY = os.environ.get("BALLDONTLIE_API_KEY", "your-api-key")
BASE_URL = "https://api.balldontlie.io"
headers = {"Authorization": API_KEY}

# List available factors
response = requests.get(
    f"{BASE_URL}/lab/v1/factors",
    headers=headers,
    params={"sport": "nba"}
)
response.raise_for_status()
factors = response.json()["data"]
print(f"Found {len(factors)} factors")

# Create an NBA spread model
model_data = {
    "name": "NBA Backtest Example",
    "sport": "nba",
    "bet_type": "spread",
    "mode": "simple",
    "factors": [
        {"factor_id": factors[0]["id"], "importance": "high"},
        {"factor_id": factors[1]["id"], "importance": "high"},
        {"factor_id": factors[2]["id"], "importance": "medium"},
    ]
}

response = requests.post(
    f"{BASE_URL}/lab/v1/models",
    headers=headers,
    json=model_data
)
response.raise_for_status()
model = response.json()["data"]
model_id = model["id"]
print(f"Created model: {model['name']} (ID: {model_id})")

# Trigger backtest evaluation
response = requests.post(
    f"{BASE_URL}/lab/v1/models/{model_id}/performance",
    headers=headers
)
response.raise_for_status()
job = response.json()["data"]
job_id = job["id"]
print(f"Started backtest job: {job_id}")

# Poll until complete
while True:
    response = requests.get(
        f"{BASE_URL}/lab/v1/jobs/{job_id}",
        headers=headers
    )
    response.raise_for_status()
    job = response.json()["data"]
    if job["status"] == "completed":
        break
    elif job["status"] == "failed":
        raise Exception(f"Job failed: {job.get('error')}")
    print(f"Status: {job['status']}...")
    time.sleep(2)

# Get results
response = requests.get(
    f"{BASE_URL}/lab/v1/models/{model_id}/performance",
    headers=headers
)
response.raise_for_status()
perf = response.json()["data"]

print(f"\nBacktest Results:")
print(f"Record: {perf['wins']}-{perf['losses']}-{perf['pushes']}")
print(f"Win Rate: {perf['win_rate']:.1%}")
print(f"ROI: {perf['roi']:+.1%}")

# Clean up: delete the test model
requests.delete(f"{BASE_URL}/lab/v1/models/{model_id}", headers=headers)

See the full API documentation for all available endpoints.

Option 3: Let an AI Agent Build It

Lab exposes a complete OpenAPI specification. Give this spec to an AI agent (Claude, GPT, etc.) along with a goal like "build me the most profitable NBA spread model" and let it iterate:

Agent creates a model with initial factor configuration
Agent runs a backtest and analyzes results
Agent adjusts factor weights based on performance
Repeat until the model meets your criteria

This approach is covered in more detail in our AI-assisted development guide.

Common Backtesting Mistakes

Even with the right tools, backtesting can go wrong:

1. Overfitting - If you keep adding factors until your backtest looks perfect, you've probably found patterns that won't repeat. A model with 20 carefully tuned factors that achieves 65% in backtesting will likely underperform a simpler 3-factor model in live betting.

2. Ignoring pushes - Pushes don't count as losses, but they still affect your ROI by tying up your bankroll without returns. Track them.

3. Forgetting the vig - Standard -110 juice means you need 52.4% to break even, not 50%. A 53% model sounds profitable but barely covers the vig.

4. Cherry-picking periods - If your model only works during one season or against one type of opponent, it's probably not a real edge. Look for consistency across different time periods.

Start Backtesting Today

BALLDONTLIE Lab gives you everything you need:

20+ pre-built factors across team performance, matchup, situational, and player categories
6 years of historical data with real opening lines from DraftKings
Confidence bucketing to find your highest-conviction plays
Full API access for programmatic model optimization

Free tier: Try it with 1 model and 1 week of data

Pro tier ($99.99/mo): Unlimited models, 6 years of history, full API access

Start Building Free | View Docs

Need help? Join our Discord community to discuss betting models and strategies.