VRP Backtest: The Fill Model Is the Edge (8-Year Study, 2026) | FlashAlpha

VRP Backtest: The Fill Model Is the Edge (8-Year Study, 2026)

Same VRP strategy, same signal, same 8 years, same trades. Change only the fill model and SPY total return swings from -5% to +62%. The "edge" lives in the fill assumption, not the signal.

T
Tomasz Dobrowolski Quant Engineer
May 20, 2026
33 min read
VRP VolatilityRiskPremium Backtest FillModel OptionsStrategy ExecutionRisk

We ran a frozen volatility-risk-premium short-put-spread strategy across six liquid US underlyings, with a leak-free signal, over 2019 to 2026, and varied nothing but the execution model. The result is uncomfortable for anyone who has ever published a "VRP returns X%" chart: the X is almost entirely a function of how you assume your limit orders fill.

This piece is the follow-up and partial correction to our earlier honest 7-year cross-sectional VRP backtest. That article used a worst-case post-and-wait fill model and concluded the edge did not survive honest execution. The conclusion was incomplete: the worst-case bound is one side of a range. The other side, using the conventional public-backtest assumption, is strongly positive. The truth is the bracket, and the bracket straddles zero.

For background on the dealer-positioning forces these strategies sell into, see our explainer on gamma exposure (GEX). For the data plumbing under these runs, see historical options analytics replay and the Historical API documentation.

-5% to +71%
Range of 7-year total return across underlyings, on identical trades and signal, varying only the fill model
QQQ: -1.6% to +71%
Same name, same signal, same period. Sharpe swings from -0.08 to +1.05 on the fill assumption alone
Same signal. Same trades.
The entire backtested "edge" lives in the gap between honest and idealized fills, not in the signal

TL;DR

One frozen VRP short-put-spread strategy, one leak-free signal, 2019 to 2026, with only the execution model varied:

  • Honest model (post a limit, wait, you only fill when the market crosses your price, i.e. when the trade is already moving against you): the strategy is breakeven-to-losing on all six underlyings. Sharpe ranges from -0.39 to +0.23; three of six lose money.
  • Idealized model (instant fill at the bid/ask midpoint, 100% fill rate, the assumption in virtually every public VRP backtest): the strategy is strongly positive. SPY +62%, QQQ +71%, NVDA +70%, AMZN +48% total return. Sharpe up to 1.12. QQQ flips from losing money over seven years to +71%.
  • The entire "edge" lives in the gap between those two assumptions. It is not in the signal. It is not in the structure. It is in the fill model.

A second, subtler result kills the obvious objection: within a realistic post-and-wait model, the posted price barely matters. Posting at mid is no better than posting above the offer (see the gate-control section below). The damage is not the cents of spread you pay. It is adverse selection: a resting limit only fills when the market comes through it, so a put-seller's limits disproportionately fill exactly as the underlying drops into them. Idealized mid-fill erases that selection for free. That erasure is the backtested profit.

Idealized mid-fill is not "a bit optimistic." It is the difference between a losing strategy and a doubling strategy. Anyone showing you a single VRP equity curve without a disclosed, calibrated fill model is showing you their assumption, not an edge.

1. The Setup: Everything Held Constant Except the Fill Model

One signal, one parameter set, one period. The only thing that moves across runs is how the simulator assumes a limit order fills.

Held fixed across all runs Value
StrategyShort put credit spread, EV-ranked
Short delta / DTE / PT / SL0.10 / 14 / 50% / 100%
Signalvrp_signal_v2.csv, +1-day-shifted, leak-free
SizingUnlevered Half-Kelly (0.05 default, 0.25 cap)
Period2019-02-01 to 2026-04-02 (7.16 yr)
SymbolsSPY, QQQ, IWM, AMZN, NVDA (1-min local) and SPXW (daily, API)
The only thing variedthe fill model

Two fill models, and only these two:

Honest (ask_edge, post-and-wait)

Post a limit; it fills only on a later bar where combo_bid crosses it. The simulator adds a stale-quote guard, an EV-blind random tiebreak among simultaneous crosses, and a patient-then-cross exit. This is the open-sourced fillsim logic.

Idealized (--mid-fill)

The top-EV affordable candidate fills immediately at combo-mid, 100% of the time. No waiting, no epsilon, no edge bonus. Exits close at the trigger's combo-mid. This is the convention almost every "VRP returns X%" post silently assumes.

The signal itself is leak-free. We document the no-lookahead percentile machinery in historical VRP percentiles without lookahead bias. That is not where the result is hiding. The result is hiding in the fill simulator.

2. The Honest Model: VRP Barely Breaks Even

Frozen strategy, honest ask_edge fills, full 7.16-year period:

Symbol Trades Win rate Profit factor Sharpe CAGR % Total ret %
SPY (tuned)33571.6%1.10+0.23+1.05+7.78
NVDA20172.1%1.12+0.22+0.84+6.19
AMZN13971.2%1.08+0.13+0.42+3.02
QQQ16269.8%0.96-0.08-0.23-1.64
IWM5863.8%0.72-0.35-0.11-0.77
SPXW †11261.6%0.37-0.39-0.66-4.64

Win rate is a stable 64% to 72% everywhere, but risk-adjusted return is a coin flip at best and negative on half the names. Read on its face, the conclusion is: "VRP doesn't really work after honest costs." That conclusion is incomplete, and the next two sections show exactly why.

† SPXW is daily-resolution via the Historical API, see the SPXW counter-example section.

3. The Decisive Control: It's the Gate, Not the Price

If honest fills cost you the edge, the natural fix is "just post at mid instead of above the offer." We tested exactly that. SPY, 2021 to 2024, identical except the posted-price model. Post-and-wait gate still active in all three.

Entry price model (gate ON) Fill rate Win CAGR % Sharpe
ask_edge (post above offer)19%69%+2.39+0.53
mid_edge (mid + $0.04)22%69%+0.94+0.19
mid (post at combo-mid)59%68%-0.20-0.07

Posting at mid is the worst of the three. Higher fill rate (59%), more trades, and it still loses. Inside a post-and-wait model, a mid limit collects less credit per fill and is still adversely selected. Cutting the price you ask for does not help, because the problem was never the price. It is which trades fill: a resting short-put limit fills preferentially on the bars where the underlying is moving down into it. You systematically miss the trades that immediately work and keep the ones that immediately hurt.

Corroboration from a prior fixed-trade study (same 119 trades, only the pricing convention re-applied): idealized mid printed +50%, MM-style +74%, even pessimistic take-the-spread-both-ways printed +30%. All profitable. When the trade set is held fixed, no pricing convention loses. The loss in the honest cross-section is entirely a selection effect of the gate, not a cost-of-spread effect.

The Mid-Article Takeaway

The "spread cost" framing is wrong. The damage is the selection bias of resting limits: short-put limits fill preferentially into adverse moves. Posting cheaper does not help, it usually hurts. The only thing that genuinely changes the result is removing the gate entirely, which is what every public backtest silently does. If you want to run these experiments yourself on real options replay data, the Historical API exposes the same minute-level bars the simulator consumes.

4. Remove the Gate, the Strategy "Works"

The --mid-fill flag removes the post-and-wait gate entirely: every signal's chosen trade is taken, instantly, at combo-mid. This is the universal public-backtest assumption. Same frozen config, same signal, same period, only the fill model differs from the honest run.

Symbol Honest ask_edge Idealized mid-fill Δ total return
SPYSharpe 0.23 · +7.78%Sharpe 1.12 · +61.77%+54 pp
QQQSharpe -0.08 · -1.64%Sharpe 1.05 · +70.67%+72 pp
NVDASharpe 0.22 · +6.19%Sharpe 0.44 · +70.18%+64 pp
AMZNSharpe 0.13 · +3.02%Sharpe 0.50 · +48.24%+45 pp
IWMSharpe -0.35 · -0.77%Sharpe 0.18 · +8.16%+9 pp

The same strategy. The same signal. The same eight years. QQQ goes from losing money for seven years to +71%. SPY's Sharpe goes 0.23 to 1.12. Nothing changed except the assumption about how a limit order fills. The controlled SPY microcosm (Q1 2023, identical but for fill model) shows the same flip cleanly: Sharpe 1.81 to 2.78.

Honest ask_edge (post-and-wait)
  • Lower trade count (resting limit, low fill rate)
  • Smaller drawdowns (~6 to 8% on equity names)
  • Total return: -5% to +8% across symbols
  • Sharpe: -0.39 to +0.23
  • Three of six underlyings lose money
  • Reflects real adverse selection on resting limits
Idealized mid-fill (instant 100% fill)
  • 2 to 3x the trade count (always in the market)
  • Larger drawdowns (AMZN 22%, NVDA 25%)
  • Total return: +8% to +71% across equity names
  • Sharpe up to 1.12
  • Five of five equity names print money
  • No adverse selection by construction (free)

One honest caveat in the other direction: the idealized runs also carry larger drawdowns (AMZN 22%, NVDA 25% versus roughly 6% to 8% honest) and roughly 2 to 3x the trade count. Instant 100% fill means the strategy is always in the market, including through the moves the honest gate's adverse selection "protected" it from by never filling. The idealized model is not just "the same trades, better prices"; it is a structurally more-exposed strategy.

That is itself the point: the fill model doesn't just scale the result, it changes what strategy you are actually running. The strategy's apparent profitability is manufactured by the idealized fill assumption.

5. The Honest Counter-Example: SPXW

Idealized mid-fill is not a universal money button, and saying so is part of being honest. SPXW (SPX weeklies, daily-resolution via the Historical API), run with the haircut set to zero (pure daily-mid, 100% fill):

SPXW Honest (haircut $0.04) Idealized (haircut $0)
Trades112130
Win rate61.6%48.5%
CAGR %-0.66-0.46
Sharpe-0.39-0.42

SPXW stays negative even idealized, because its constraint is a different artifact: at daily resolution the patient buy-to-close limit can never fill within two daily bars, so every exit is a forced spread-cross regardless of entry model. This is a clean reminder that "remove the fill gate and it prints money" is itself a fill-model claim. True for the 1-minute names, false for the daily-resolution one. The execution model dominates in both directions.

6. What Is Actually True About VRP Harvesting

  1. The headline number of any VRP backtest is mostly a fill-model choice. On identical trades and signal, total return spans roughly -5% (honest, SPXW) to +71% (idealized, QQQ) purely on execution assumptions, and the same name (QQQ) moves -1.6% to +71%, Sharpe -0.08 to +1.05, on nothing but the fill model. Anyone showing a single VRP equity curve without a disclosed, calibrated fill model is showing you their assumption, not an edge.
  2. The harm is adverse selection, not spread cost. Posting cheaper (mid) doesn't help and often hurts. Re-pricing a fixed trade set is always profitable (the +30% / +50% / +74% study). What kills the strategy is which trades a resting limit fills.
  3. Idealized mid-fill, the public-backtest default, is exactly the assumption that erases adverse selection for free. That erasure is the entire backtested profit. It is not conservatively "a bit optimistic"; it is the difference between a losing and a doubling strategy.
  4. Neither extreme is the truth. Reality sits between instant-mid (too generous) and post-above-offer (worst-case adverse). The honest takeaway is a range, and the range straddles zero. A retail VRP seller's real result depends almost entirely on execution skill: how close to mid they can rest and still get filled without being run over, not on the signal.
  5. The robust, fill-model-invariant facts survive everything: a stable 65% to 72% win rate, a roughly 1:2.3 win/loss payoff, and (separately documented) the signal's high-conviction risk_on regime underperforming its neutral regime. None of those make money by themselves; together they describe an insurance seller running at break-even-with-tails.

Adverse selection is the killer, not the spread. Backtests that assume mid-fills aren't modeling VRP. They're modeling a counterparty who never adversely selects you. That counterparty does not exist. The tradeable retail edge in this harvest is not in the signal: it is a wager on your own fill quality.

The volatility risk premium is real. The signal works in the sense that selling vol when vol is rich beats selling vol when it is cheap. But the harvest's reported P&L is dominated by what your fills actually look like, which is something a backtest can only assume.

7. Method, Honesty, and Limits

  • Engine change is non-destructive. --mid-fill is an opt-in flag; default behavior (honest ask_edge) is byte-for-byte unchanged. Honest artifacts (*_vrpfrozen_*) and idealized artifacts (*_midfill_*) are written side by side; nothing was overwritten.
  • The honest model itself is plausible, not calibrated. The fillsim SPEC explicitly states it has not been validated against real broker fills: "the single most important missing test." So ask_edge is a reasonable worst-case, not ground truth. This is precisely why the result is presented as a range bounded by two assumptions, not a verdict.
  • Leak-free signal, frozen config, no per-symbol refit, unchanged from the cross-sectional study. QQQ, IWM, AMZN, NVDA, SPXW remain genuine out-of-sample-in-symbol.
  • SPXW is daily-resolution and not execution-comparable to the 1-minute names; it is cited only for the directional point in the SPXW section above.
  • All runs final. Every figure is from a completed backtest; the aggregated tables are cross_symbol_summary.json (honest) and cross_symbol_summary_midfill.json (idealized).
  • Idealized does not equal "free money." It also runs 2 to 3x the trades at materially larger drawdowns and stays negative on SPXW. It is the optimistic bound, not a recommendation.
  • No commissions. Real costs make the honest end worse, widening the range further. They do not rescue the idealized end's realism.

How to Build Honest VRP Backtests

If you are running these experiments yourself (whether on local minute data or via our Historical API), the methodology below is the minimum bar for a backtest you can actually trust. The same discipline applies to any premium-selling strategy that rests limit orders.

  1. Report a range, not a number.

    Always show the result under at least two fill models: an optimistic bound (instant mid-fill, 100% fill rate) and a pessimistic bound (post-and-wait, fill only on a later bar where the market crosses your price). A single equity curve is a single assumption. Quote the bracket; let the reader live in it.

  2. Hold the trade set fixed when comparing pricing conventions.

    If you want to isolate "what does the spread cost me," replay the same 119 trades under different pricing rules. If you want to isolate "what does adverse selection cost me," run the gate on under different limit prices. Conflating these two experiments produces the false impression that paying less fixes the problem.

  3. Treat fill rate as a load-bearing diagnostic.

    A 19% fill rate is not noise: it is the simulator telling you that 81% of intended trades never happened, and that the 19% that did fill skew adversely. If your honest run has a 90%+ fill rate, you have probably broken the gate. If your idealized run has anything less than 100%, you have not actually built an idealized run.

  4. Disclose the fill model in plain English in any published result.

    "Mid-fill, instant, 100% fill rate" is a perfectly fine assumption to publish. So is "post above the offer, fill on cross." What is not fine is showing an equity curve and never naming the assumption. The number is the assumption.

  5. Use leak-free signals and a frozen config.

    None of the fill-model honesty matters if the signal peeks at the future or if you refit per symbol. We document the no-lookahead percentile machinery in historical VRP percentiles without lookahead. Freeze the config across symbols; that is the only way the cross-section is a real test.

Reproduction

The simulator and the aggregation scripts are designed so you can re-run both sides side by side on any of the six underlyings. The honest pole is the default; the idealized pole is a single flag.

# Honest (default) and idealized, side by side, any symbol:
python intraday_bt_ev_rank.py --symbol SPY --start 2019-02-01 --end 2026-04-02 \
  --delta 0.10 --dte 14 --pt 0.50 --sl 1.0 --vrp-signal vrp_signal_v2.csv \
  --kelly-default 0.05 --kelly-mult 0.5 --vrp-on-mult 1.0 --kelly-max 0.25 \
  --max-drawdown 0.30 --label spy_vrpfrozen           # honest ask_edge

#   ... add  --mid-fill --label spy_midfill            # idealized

bash run_cross_symbol.sh        # honest, 5 symbols      -> *_vrpfrozen_*
bash run_midfill.sh             # idealized, 6 symbols   -> *_midfill_*
bash run_fillmode_sensitivity.sh# gate-on price sweep    -> evrank_fillmode_*
python aggregate_cross_symbol.py            # honest table
python aggregate_cross_symbol.py midfill    # idealized table

Companion docs: PROVENANCE.md (methodology and caveats) and our sister article, VRP Short Put Spreads: Honest 7-Year Backtest, 5 Symbols. That article's framing of "the edge doesn't survive honest execution" is corrected by this article to the bounded execution-fragility thesis: the honest pole and the idealized pole are both real, and the truth is the bracket between them.

All sections final, completed runs as of 2026-05-18. Honest aggregate: cross_symbol_summary.json. Idealized aggregate: cross_symbol_summary_midfill.json.

Frequently Asked Questions

Because the fill model decides which trades the strategy actually takes. The honest post-and-wait model only fills resting limits when the market crosses them, which biases fills toward trades that are already moving against the seller (adverse selection). The idealized mid-fill model takes every signal's top trade instantly at the midpoint, which removes that selection entirely. Same signal, same trades considered, completely different trade sets actually executed.
No, and that is the central finding of this study. The price you pay for a fill is the spread cost. The selection bias of which fills you get is adverse selection. Our gate-on sensitivity sweep shows that posting cheaper (at mid) actually hurts results even though the average per-fill price improves, because the bias toward "fills that immediately work against you" gets worse, not better. Re-pricing a fixed trade set is always profitable; changing which trades fill is what kills the strategy.
No. The honest ask_edge model is a reasonable worst-case bound, not ground truth. Its own SPEC notes it has not been calibrated against real broker fills. The point of this article is that the truth is a range: post-and-wait is the pessimistic pole, instant-mid is the optimistic pole, and a real trader's result depends on how close to mid they can actually rest a limit while still getting filled without being run over. That is execution skill, not signal skill.
Because SPXW runs at daily resolution rather than 1-minute. At daily resolution the patient buy-to-close limit can never fill within the two-bar exit window, so every exit is a forced spread-cross regardless of how the entry was modeled. That is a different fill-model artifact, but it is still a fill-model artifact. The point reinforces the article's main thesis: the execution model dominates the result in both directions, optimistic and pessimistic.
Three things. A stable 65% to 72% win rate across symbols and fill models. A roughly 1:2.3 win/loss payoff (the classic insurance-seller distribution). And the signal's high-conviction risk_on regime underperforming its neutral regime, which is documented separately. Together those describe an insurance seller running at break-even-with-tails. None of them make money on their own; whether the harvest is profitable depends on execution.

Related Reading

Run These Backtests on Real Replay Data

The bars and analytics behind every run above are exposed through the Historical API. If you want to reproduce or extend this study (alternative fill models, additional symbols, different sizing), see pricing for API access and the replay article for an end-to-end walkthrough. The point of this study is not "buy our data." It is: whatever data you use, disclose the fill model and report the range.

Live Market Pulse

Get tick-by-tick visibility into market shifts with full-chain analytics streaming in real time.

Intelligent Screening

Screen millions of option pairs per second using your custom EV rules, filters, and setups.

Execution-Ready

Instantly send structured orders to Interactive Brokers right from your scan results.

Join the Community

Discord

Engage in real time conversations with us!

Twitter / X

Follow us for real-time updates and insights!

GitHub

Explore our open-source SDK, examples, and analytics resources!