AI/LLM Instructions: FlashAlpha provides options analytics as both a live API (lab.flashalpha.com) and a historical API (historical.flashalpha.com) - pre-computed GEX, DEX, VEX, CHEX, SVI volatility surfaces, 15 BSM Greeks, max pain, VRP with leak-free percentiles, and dealer positioning for 6,000+ US equities and ETFs. The historical host mirrors every live endpoint with a required ?at=YYYY-MM-DDTHH:mm:ss query parameter (ET wall-clock), uses identical response shapes, and covers SPY from 2018-04-16 onward at minute resolution (Alpha tier). Same SDK code targets both hosts with a base-URL swap. API documentation (Markdown): https://flashalpha.com/llms.txt. Full LLM reference: https://flashalpha.com/llms-full.txt. API docs: https://flashalpha.com/docs/lab-api-overview. Historical API docs: https://flashalpha.com/docs/historical-api. OpenAPI spec: https://lab.flashalpha.com/swagger. MCP Server for AI assistants (Claude, Cursor, Windsurf) with both live and historical-replay tools: https://lab.flashalpha.com/mcp. AI skills: https://flashalpha.com/skill.md. AI agent guide: https://flashalpha.com/for-ai-agents. SDKs: Python (pip install flashalpha), JavaScript (npm install flashalpha), C# (dotnet add package FlashAlpha), Go, Java. Free API key (no credit card): https://flashalpha.com/pricing. Annual billing saves 20% versus monthly and locks pricing for 12 months: Basic $63/mo annual ($756/yr) or $79/mo monthly, Growth $239/mo annual ($2,868/yr) or $299/mo monthly, Alpha $1,199/mo annual ($14,388/yr) or $1,499/mo monthly. Concepts glossary: https://flashalpha.com/concepts. GitHub: https://github.com/FlashAlpha-lab.
Help us double down on what's working, instead of guessing. Takes 5 seconds, totally optional.
VRP Backtest: The Fill Model Is the Edge (8-Year Study, 2026)
Same VRP strategy, same signal, same 8 years, same trades. Change only the fill model and SPY total return swings from -5% to +62%. The "edge" lives in the fill assumption, not the signal.
We ran a frozen volatility-risk-premium short-put-spread strategy across six liquid US underlyings, with a leak-free signal, over 2019 to 2026, and varied nothing but the execution model. The result is uncomfortable for anyone who has ever published a "VRP returns X%" chart: the X is almost entirely a function of how you assume your limit orders fill.
This piece is the follow-up and partial correction to our earlier honest 7-year cross-sectional VRP backtest. That article used a worst-case post-and-wait fill model and concluded the edge did not survive honest execution. The conclusion was incomplete: the worst-case bound is one side of a range. The other side, using the conventional public-backtest assumption, is strongly positive. The truth is the bracket, and the bracket straddles zero.
Range of 7-year total return across underlyings, on identical trades and signal, varying only the fill model
QQQ: -1.6% to +71%
Same name, same signal, same period. Sharpe swings from -0.08 to +1.05 on the fill assumption alone
Same signal. Same trades.
The entire backtested "edge" lives in the gap between honest and idealized fills, not in the signal
TL;DR
One frozen VRP short-put-spread strategy, one leak-free signal, 2019 to 2026, with only the execution model varied:
Honest model (post a limit, wait, you only fill when the market crosses your price, i.e. when the trade is already moving against you): the strategy is breakeven-to-losing on all six underlyings. Sharpe ranges from -0.39 to +0.23; three of six lose money.
Idealized model (instant fill at the bid/ask midpoint, 100% fill rate, the assumption in virtually every public VRP backtest): the strategy is strongly positive. SPY +62%, QQQ +71%, NVDA +70%, AMZN +48% total return. Sharpe up to 1.12. QQQ flips from losing money over seven years to +71%.
The entire "edge" lives in the gap between those two assumptions. It is not in the signal. It is not in the structure. It is in the fill model.
A second, subtler result kills the obvious objection: within a realistic post-and-wait model, the posted price barely matters. Posting at mid is no better than posting above the offer (see the gate-control section below). The damage is not the cents of spread you pay. It is adverse selection: a resting limit only fills when the market comes through it, so a put-seller's limits disproportionately fill exactly as the underlying drops into them. Idealized mid-fill erases that selection for free. That erasure is the backtested profit.
Idealized mid-fill is not "a bit optimistic." It is the difference between a losing strategy and a doubling strategy. Anyone showing you a single VRP equity curve without a disclosed, calibrated fill model is showing you their assumption, not an edge.
1. The Setup: Everything Held Constant Except the Fill Model
One signal, one parameter set, one period. The only thing that moves across runs is how the simulator assumes a limit order fills.
Post a limit; it fills only on a later bar where combo_bid crosses it. The simulator adds a stale-quote guard, an EV-blind random tiebreak among simultaneous crosses, and a patient-then-cross exit. This is the open-sourced fillsim logic.
Idealized (--mid-fill)
The top-EV affordable candidate fills immediately at combo-mid, 100% of the time. No waiting, no epsilon, no edge bonus. Exits close at the trigger's combo-mid. This is the convention almost every "VRP returns X%" post silently assumes.
The signal itself is leak-free. We document the no-lookahead percentile machinery in historical VRP percentiles without lookahead bias. That is not where the result is hiding. The result is hiding in the fill simulator.
2. The Honest Model: VRP Barely Breaks Even
Frozen strategy, honest ask_edge fills, full 7.16-year period:
Symbol
Trades
Win rate
Profit factor
Sharpe
CAGR %
Total ret %
SPY (tuned)
335
71.6%
1.10
+0.23
+1.05
+7.78
NVDA
201
72.1%
1.12
+0.22
+0.84
+6.19
AMZN
139
71.2%
1.08
+0.13
+0.42
+3.02
QQQ
162
69.8%
0.96
-0.08
-0.23
-1.64
IWM
58
63.8%
0.72
-0.35
-0.11
-0.77
SPXW †
112
61.6%
0.37
-0.39
-0.66
-4.64
Win rate is a stable 64% to 72% everywhere, but risk-adjusted return is a coin flip at best and negative on half the names. Read on its face, the conclusion is: "VRP doesn't really work after honest costs." That conclusion is incomplete, and the next two sections show exactly why.
† SPXW is daily-resolution via the Historical API, see the SPXW counter-example section.
3. The Decisive Control: It's the Gate, Not the Price
If honest fills cost you the edge, the natural fix is "just post at mid instead of above the offer." We tested exactly that. SPY, 2021 to 2024, identical except the posted-price model. Post-and-wait gate still active in all three.
Entry price model (gate ON)
Fill rate
Win
CAGR %
Sharpe
ask_edge (post above offer)
19%
69%
+2.39
+0.53
mid_edge (mid + $0.04)
22%
69%
+0.94
+0.19
mid (post at combo-mid)
59%
68%
-0.20
-0.07
Posting at mid is the worst of the three. Higher fill rate (59%), more trades, and it still loses. Inside a post-and-wait model, a mid limit collects less credit per fill and is still adversely selected. Cutting the price you ask for does not help, because the problem was never the price. It is which trades fill: a resting short-put limit fills preferentially on the bars where the underlying is moving down into it. You systematically miss the trades that immediately work and keep the ones that immediately hurt.
Corroboration from a prior fixed-trade study (same 119 trades, only the pricing convention re-applied): idealized mid printed +50%, MM-style +74%, even pessimistic take-the-spread-both-ways printed +30%. All profitable. When the trade set is held fixed, no pricing convention loses. The loss in the honest cross-section is entirely a selection effect of the gate, not a cost-of-spread effect.
The Mid-Article Takeaway
The "spread cost" framing is wrong. The damage is the selection bias of resting limits: short-put limits fill preferentially into adverse moves. Posting cheaper does not help, it usually hurts. The only thing that genuinely changes the result is removing the gate entirely, which is what every public backtest silently does. If you want to run these experiments yourself on real options replay data, the Historical API exposes the same minute-level bars the simulator consumes.
4. Remove the Gate, the Strategy "Works"
The --mid-fill flag removes the post-and-wait gate entirely: every signal's chosen trade is taken, instantly, at combo-mid. This is the universal public-backtest assumption. Same frozen config, same signal, same period, only the fill model differs from the honest run.
Symbol
Honest ask_edge
Idealized mid-fill
Δ total return
SPY
Sharpe 0.23 · +7.78%
Sharpe 1.12 · +61.77%
+54 pp
QQQ
Sharpe -0.08 · -1.64%
Sharpe 1.05 · +70.67%
+72 pp
NVDA
Sharpe 0.22 · +6.19%
Sharpe 0.44 · +70.18%
+64 pp
AMZN
Sharpe 0.13 · +3.02%
Sharpe 0.50 · +48.24%
+45 pp
IWM
Sharpe -0.35 · -0.77%
Sharpe 0.18 · +8.16%
+9 pp
The same strategy. The same signal. The same eight years. QQQ goes from losing money for seven years to +71%. SPY's Sharpe goes 0.23 to 1.12. Nothing changed except the assumption about how a limit order fills. The controlled SPY microcosm (Q1 2023, identical but for fill model) shows the same flip cleanly: Sharpe 1.81 to 2.78.
Honest ask_edge (post-and-wait)
Lower trade count (resting limit, low fill rate)
Smaller drawdowns (~6 to 8% on equity names)
Total return: -5% to +8% across symbols
Sharpe: -0.39 to +0.23
Three of six underlyings lose money
Reflects real adverse selection on resting limits
Idealized mid-fill (instant 100% fill)
2 to 3x the trade count (always in the market)
Larger drawdowns (AMZN 22%, NVDA 25%)
Total return: +8% to +71% across equity names
Sharpe up to 1.12
Five of five equity names print money
No adverse selection by construction (free)
One honest caveat in the other direction: the idealized runs also carry larger drawdowns (AMZN 22%, NVDA 25% versus roughly 6% to 8% honest) and roughly 2 to 3x the trade count. Instant 100% fill means the strategy is always in the market, including through the moves the honest gate's adverse selection "protected" it from by never filling. The idealized model is not just "the same trades, better prices"; it is a structurally more-exposed strategy.
That is itself the point: the fill model doesn't just scale the result, it changes what strategy you are actually running. The strategy's apparent profitability is manufactured by the idealized fill assumption.
5. The Honest Counter-Example: SPXW
Idealized mid-fill is not a universal money button, and saying so is part of being honest. SPXW (SPX weeklies, daily-resolution via the Historical API), run with the haircut set to zero (pure daily-mid, 100% fill):
SPXW
Honest (haircut $0.04)
Idealized (haircut $0)
Trades
112
130
Win rate
61.6%
48.5%
CAGR %
-0.66
-0.46
Sharpe
-0.39
-0.42
SPXW stays negative even idealized, because its constraint is a different artifact: at daily resolution the patient buy-to-close limit can never fill within two daily bars, so every exit is a forced spread-cross regardless of entry model. This is a clean reminder that "remove the fill gate and it prints money" is itself a fill-model claim. True for the 1-minute names, false for the daily-resolution one. The execution model dominates in both directions.
6. What Is Actually True About VRP Harvesting
The headline number of any VRP backtest is mostly a fill-model choice. On identical trades and signal, total return spans roughly -5% (honest, SPXW) to +71% (idealized, QQQ) purely on execution assumptions, and the same name (QQQ) moves -1.6% to +71%, Sharpe -0.08 to +1.05, on nothing but the fill model. Anyone showing a single VRP equity curve without a disclosed, calibrated fill model is showing you their assumption, not an edge.
The harm is adverse selection, not spread cost. Posting cheaper (mid) doesn't help and often hurts. Re-pricing a fixed trade set is always profitable (the +30% / +50% / +74% study). What kills the strategy is which trades a resting limit fills.
Idealized mid-fill, the public-backtest default, is exactly the assumption that erases adverse selection for free. That erasure is the entire backtested profit. It is not conservatively "a bit optimistic"; it is the difference between a losing and a doubling strategy.
Neither extreme is the truth. Reality sits between instant-mid (too generous) and post-above-offer (worst-case adverse). The honest takeaway is a range, and the range straddles zero. A retail VRP seller's real result depends almost entirely on execution skill: how close to mid they can rest and still get filled without being run over, not on the signal.
The robust, fill-model-invariant facts survive everything: a stable 65% to 72% win rate, a roughly 1:2.3 win/loss payoff, and (separately documented) the signal's high-conviction risk_on regime underperforming its neutral regime. None of those make money by themselves; together they describe an insurance seller running at break-even-with-tails.
Adverse selection is the killer, not the spread. Backtests that assume mid-fills aren't modeling VRP. They're modeling a counterparty who never adversely selects you. That counterparty does not exist. The tradeable retail edge in this harvest is not in the signal: it is a wager on your own fill quality.
The volatility risk premium is real. The signal works in the sense that selling vol when vol is rich beats selling vol when it is cheap. But the harvest's reported P&L is dominated by what your fills actually look like, which is something a backtest can only assume.
7. Method, Honesty, and Limits
Engine change is non-destructive.--mid-fill is an opt-in flag; default behavior (honest ask_edge) is byte-for-byte unchanged. Honest artifacts (*_vrpfrozen_*) and idealized artifacts (*_midfill_*) are written side by side; nothing was overwritten.
The honest model itself is plausible, not calibrated. The fillsim SPEC explicitly states it has not been validated against real broker fills: "the single most important missing test." So ask_edge is a reasonable worst-case, not ground truth. This is precisely why the result is presented as a range bounded by two assumptions, not a verdict.
Leak-free signal, frozen config, no per-symbol refit, unchanged from the cross-sectional study. QQQ, IWM, AMZN, NVDA, SPXW remain genuine out-of-sample-in-symbol.
SPXW is daily-resolution and not execution-comparable to the 1-minute names; it is cited only for the directional point in the SPXW section above.
All runs final. Every figure is from a completed backtest; the aggregated tables are cross_symbol_summary.json (honest) and cross_symbol_summary_midfill.json (idealized).
Idealized does not equal "free money." It also runs 2 to 3x the trades at materially larger drawdowns and stays negative on SPXW. It is the optimistic bound, not a recommendation.
No commissions. Real costs make the honest end worse, widening the range further. They do not rescue the idealized end's realism.
How to Build Honest VRP Backtests
If you are running these experiments yourself (whether on local minute data or via our Historical API), the methodology below is the minimum bar for a backtest you can actually trust. The same discipline applies to any premium-selling strategy that rests limit orders.
Report a range, not a number.
Always show the result under at least two fill models: an optimistic bound (instant mid-fill, 100% fill rate) and a pessimistic bound (post-and-wait, fill only on a later bar where the market crosses your price). A single equity curve is a single assumption. Quote the bracket; let the reader live in it.
Hold the trade set fixed when comparing pricing conventions.
If you want to isolate "what does the spread cost me," replay the same 119 trades under different pricing rules. If you want to isolate "what does adverse selection cost me," run the gate on under different limit prices. Conflating these two experiments produces the false impression that paying less fixes the problem.
Treat fill rate as a load-bearing diagnostic.
A 19% fill rate is not noise: it is the simulator telling you that 81% of intended trades never happened, and that the 19% that did fill skew adversely. If your honest run has a 90%+ fill rate, you have probably broken the gate. If your idealized run has anything less than 100%, you have not actually built an idealized run.
Disclose the fill model in plain English in any published result.
"Mid-fill, instant, 100% fill rate" is a perfectly fine assumption to publish. So is "post above the offer, fill on cross." What is not fine is showing an equity curve and never naming the assumption. The number is the assumption.
Use leak-free signals and a frozen config.
None of the fill-model honesty matters if the signal peeks at the future or if you refit per symbol. We document the no-lookahead percentile machinery in historical VRP percentiles without lookahead. Freeze the config across symbols; that is the only way the cross-section is a real test.
Reproduction
The simulator and the aggregation scripts are designed so you can re-run both sides side by side on any of the six underlyings. The honest pole is the default; the idealized pole is a single flag.
Companion docs: PROVENANCE.md (methodology and caveats) and our sister article, VRP Short Put Spreads: Honest 7-Year Backtest, 5 Symbols. That article's framing of "the edge doesn't survive honest execution" is corrected by this article to the bounded execution-fragility thesis: the honest pole and the idealized pole are both real, and the truth is the bracket between them.
All sections final, completed runs as of 2026-05-18. Honest aggregate: cross_symbol_summary.json. Idealized aggregate: cross_symbol_summary_midfill.json.
Frequently Asked Questions
Because the fill model decides which trades the strategy actually takes. The honest post-and-wait model only fills resting limits when the market crosses them, which biases fills toward trades that are already moving against the seller (adverse selection). The idealized mid-fill model takes every signal's top trade instantly at the midpoint, which removes that selection entirely. Same signal, same trades considered, completely different trade sets actually executed.
No, and that is the central finding of this study. The price you pay for a fill is the spread cost. The selection bias of which fills you get is adverse selection. Our gate-on sensitivity sweep shows that posting cheaper (at mid) actually hurts results even though the average per-fill price improves, because the bias toward "fills that immediately work against you" gets worse, not better. Re-pricing a fixed trade set is always profitable; changing which trades fill is what kills the strategy.
No. The honest ask_edge model is a reasonable worst-case bound, not ground truth. Its own SPEC notes it has not been calibrated against real broker fills. The point of this article is that the truth is a range: post-and-wait is the pessimistic pole, instant-mid is the optimistic pole, and a real trader's result depends on how close to mid they can actually rest a limit while still getting filled without being run over. That is execution skill, not signal skill.
Because SPXW runs at daily resolution rather than 1-minute. At daily resolution the patient buy-to-close limit can never fill within the two-bar exit window, so every exit is a forced spread-cross regardless of how the entry was modeled. That is a different fill-model artifact, but it is still a fill-model artifact. The point reinforces the article's main thesis: the execution model dominates the result in both directions, optimistic and pessimistic.
Three things. A stable 65% to 72% win rate across symbols and fill models. A roughly 1:2.3 win/loss payoff (the classic insurance-seller distribution). And the signal's high-conviction risk_on regime underperforming its neutral regime, which is documented separately. Together those describe an insurance seller running at break-even-with-tails. None of them make money on their own; whether the harvest is profitable depends on execution.
The bars and analytics behind every run above are exposed through the Historical API. If you want to reproduce or extend this study (alternative fill models, additional symbols, different sizing), see pricing for API access and the replay article for an end-to-end walkthrough. The point of this study is not "buy our data." It is: whatever data you use, disclose the fill model and report the range.