VRP Short Put Spreads: Honest 7-Year Backtest, 5 Symbols (2026)
Single VRP short-put-spread strategy, tuned once on SPY, frozen, deployed unchanged on QQQ/IWM/AMZN/NVDA/SPXW with honest fills. 70% win rate held. Sharpe and edge did not.
Single VRP short-put-spread strategy, tuned once on SPY, frozen, deployed unchanged on QQQ/IWM/AMZN/NVDA/SPXW with honest fills. 70% win rate held. Sharpe and edge did not.
⚠ Framing correction (read first). This article's thesis - "the edge doesn't survive honest execution" - is overstated. A follow-up controlled study, VRP Backtest: The Fill Model Is the Edge, shows the result is bounded by the fill-model assumption: honest fills give breakeven to negative (these numbers); idealized mid-fills (the universal public-backtest default) give strongly positive. The correct conclusion is execution-fragility and a result-range straddling zero, not "VRP is dead." The data below is accurate for the honest (ask_edge) model specifically; read it as the pessimistic bound, not the verdict.
Search "volatility risk premium strategy" and you will find a hundred posts with an equity curve going up and to the right and a win rate north of 80%. They are almost all built on three quiet lies: mid-price fills, clean profit-target exits, and parameters that were chosen on the same symbol they are advertised on. This study removes all three.
We took a single short-put-spread VRP harvest, tuned it once on SPY in prior work, froze every parameter, and ran it unchanged across QQQ, IWM, AMZN, and NVDA from 2019 to 2026. We then added SPXW as a flagship out-of-sample check via a completely separate daily-resolution data path. Execution was modeled the way a real account experiences it: post-and-wait limit orders, exits that have to cross the spread, a stale-quote guard, and a signal that can only see the past.
This is the most honest thing we can publish about VRP harvesting. The 70% win rate the internet sells is real. The risk-adjusted edge that survives honest execution and out-of-sample deployment is close to zero.
| Symbol | Trades | Win rate | Profit factor | Sharpe | CAGR % | MaxDD % | Fill rate |
|---|---|---|---|---|---|---|---|
| SPY (tuned) | 335 | 71.6% | 1.10 | 0.23 | 1.05 | 6.48 | 13.7% |
| NVDA | 201 | 72.1% | 1.12 | 0.22 | 0.84 | 6.87 | 6.5% |
| AMZN | 139 | 71.2% | 1.08 | 0.13 | 0.42 | 7.38 | 3.4% |
| QQQ | 162 | 69.8% | 0.96 | −0.08 | −0.23 | 7.76 | 7.9% |
| IWM | 58 | 63.8% | 0.72 | −0.35 | −0.11 | 1.24 | 3.1% |
| SPXW † | 112 | 61.6% | 0.37 | −0.39 | −0.66 | 5.81 | 100% † |
Period: 2019-02-01 to 2026-04-02 (7.16 years). $100k notional, unlevered Half-Kelly.
† SPXW is sourced from the Historical API at daily-close resolution, not 1-minute. Its fill rate is 100% by construction (it transacts at daily-close mid less the measured −$0.04 haircut, not a post-and-wait limit). It is a valid out-of-sample direction confirmation but its Sharpe and fill rate are not methodologically comparable to the 1-minute rows. See §5.7.
The volatility risk premium is real. Implied volatility has, on average, exceeded subsequently realized volatility for decades. The economic rationale (insurance demand, dealer inventory) is sound. The open question was never "does VRP exist." It is: how much of that premium survives honest execution when you don't get to re-fit the strategy to every new market?
The genre of VRP content you find online almost always rests on three assumptions that real accounts never enjoy.
ask_edge; most don't fillThis study answers exactly that question, and nothing more. The takeaway is not that VRP is untradeable. The takeaway is that the honest, transferable, retail-executable edge is much smaller than the genre advertises - and the win rate is a misleading place to anchor your expectations.
A short put credit spread harvest on the underlying, gated by a market-regime VRP signal. Every number below was fixed before the cross-sectional run and never changed per symbol.
Defined risk; the canonical retail VRP expression. Sells a higher-strike put, buys a lower-strike put for protection. Maximum loss is the spread width minus the credit collected.
Roughly 90% out-of-the-money probability; far-wing premium. The wing is where insurance demand is densest and the empirical VRP signature is strongest.
The Calmar-leading tenor identified in prior SPY tuning. Long enough to collect meaningful theta, short enough that gamma exposure during the holding period is bounded.
Close when the spread has decayed halfway (PT) or when the loss equals the credit taken (SL). Standard VRP management; not a tuned parameter.
Candidate pool. On every entry bar, the engine evaluates all candidate widths and picks the one with the best expected value (delta-derived). EV is computed from observable greeks only.
The unlevered honest reference. Half-Kelly with a 0.25 cap deploys little capital per trade; absolute drawdowns and CAGR are correspondingly small. A leveraged variant exists but is explicitly not reported here because it has not been OOS-validated. See §6.
Post-open noise gone, spreads stabilized. The first thirty-five minutes of the regular session are intentionally excluded.
vrp_signal_v2.csv
One SPY/VIX market-regime signal, applied to all five symbols. Outputs one of four states - risk_on, neutral, reduce, risk_off - and a continuous Kelly multiplier. risk_off means no trade that day.
Stop trading if drawdown breaches 30%. Never triggered in this study, but present as a safety floor.
The signal in detail. A daily market-regime classifier built from VIX, VIX9D, VVIX, the VIX term structure, a realized-vs-implied VRP proxy and its 252-day percentile, high-yield spreads, and stress z-scores. Crucially, this is a market-wide signal computed from SPY/VIX. Applying the identical signal to QQQ, IWM, AMZN, and NVDA is what makes those four a genuine cross-sectional out-of-sample test rather than four separately fitted strategies. The signal construction is leak-free by design - for the methodology see Historical VRP Percentiles Without Lookahead Bias.
This is the entire point of the study. Each control below removes a specific way backtests lie.
The signal CSV is shifted forward one calendar day with a 7-day walk-back lookup. Any trading day's decision uses only data fully observable before that day's open. There is no way for the strategy to "know" today's VRP when deciding to trade today.
We do not fill at mid. We post a limit at the offer plus a few cents (ask_edge) and wait for someone else's order to cross our price. If nobody does within the wait window, the order is cancelled and we re-rank. This is why the fill rate is 3-14% and not 100%.
When a cross is detected, we re-check that the mid hasn't moved through our limit by more than a floor (−$0.05). A one-tick bid blip during a vol spike does not get to manufacture a phantom fill.
Profit-target and stop-loss exits are not free. We post a buy-to-close limit at the trigger and wait a few bars. If it fills, great. If it doesn't, we cross the spread and pay the offer - and that exit is tagged pt_x / sl_x so the execution tax is auditable. It is large. See §5.3 and §5.5.
When multiple candidate spreads cross on the same bar, the winner is chosen by a timestamp-seeded random shuffle - never by which one had the best expected value. Any EV-aware tiebreak is a look-ahead oracle that silently inflates win rates.
The configuration in §2 was tuned once, on SPY, in prior work. For this study it was frozen. QQQ, IWM, AMZN, and NVDA received zero per-symbol optimization. What you see on those four is what a trader who tuned on SPY in 2019 and walked away would actually have experienced.
The execution logic above is identical to the open-sourced fillsim simulator we extracted from this exact backtester, precisely because every off-the-shelf options framework we evaluated assumed mid-fills. For deeper treatment of the fill-model sensitivity, see the companion piece VRP Backtest: The Fill Model Is the Edge.
Every number in this article is reproducible from the engine commands in §8 against the raw artifacts in the Historical Analytics API replay infrastructure. For practitioners who want to run their own variant, the Historical API exposes the same option chains and greeks the SPXW row used.
| Field | Value |
|---|---|
| Symbols | SPY, QQQ, IWM, AMZN, NVDA (1-minute local chains); SPXW (daily-close via Historical API) |
| Period | 2019-02-01 to 2026-04-02 (7.16 years) |
| Option chains | 1-minute bid/ask + greeks, per symbol |
| Spot | 1-minute underlying tape |
| Signal inputs | EOD macro (VIX complex, HY spreads) |
| Starting capital | $100,000, unlevered Half-Kelly |
All five intraday symbols carry a dense weekly/daily expiry ladder, so the 14-DTE entry window is continuously reachable. SPXW is handled separately (§5.7): its 1-minute weekly chain was not available locally, so it was sourced from the FlashAlpha Historical API at daily-close resolution and is reported as an out-of-sample direction check, not a sixth execution-comparable row. The reasons minute-level options analytics are this hard to assemble are catalogued in Why Historical Options Analytics Are Rare.
Across five very different underlyings - a broad-market ETF, a tech ETF, a small-cap ETF, and two megacap single names - the win rate clusters tightly at 64-72%. The VRP edge in probability terms is robust and travels well cross-sectionally. If you stop reading here, VRP harvesting looks like a free lunch. This is exactly where most articles stop.
| Symbol | Sharpe | Profit factor | CAGR % | Verdict |
|---|---|---|---|---|
| SPY (in-sample symbol) | 0.23 | 1.10 | 1.05 | Marginally positive - on the symbol it was tuned on |
| NVDA | 0.22 | 1.12 | 0.84 | Marginally positive |
| AMZN | 0.13 | 1.08 | 0.42 | Barely positive |
| QQQ | −0.08 | 0.96 | −0.23 | Negative |
| IWM | −0.35 | 0.72 | −0.11 | Clearly negative |
Two of four out-of-sample symbols lost money. The best non-tuned result (NVDA, Sharpe 0.22) is still a Sharpe you would not trade. The premium is real; the tradeable, honest, transferable edge is close to zero.
| Symbol | Avg win | Avg loss | Win:loss ratio |
|---|---|---|---|
| SPY | $369 | −$850 | 1 : 2.3 |
| QQQ | $311 | −$751 | 1 : 2.4 |
| AMZN | $407 | −$933 | 1 : 2.3 |
| NVDA | $385 | −$888 | 1 : 2.3 |
| IWM | $53 | −$130 | 1 : 2.4 |
A 70% win rate at a 1:2.3 win/loss ratio is, by arithmetic, a coin flip. This is the defining feature of short-vol: you win small, often, and lose big, occasionally - and it is precisely why the win rate is the wrong number to look at. The strategy is selling insurance; the premium and the claims very nearly cancel.
| Symbol | Proposed | Filled | Fill rate | Avg edge captured |
|---|---|---|---|---|
| SPY | 2,438 | 335 | 13.7% | −$0.037 |
| QQQ | 2,055 | 162 | 7.9% | −$0.042 |
| NVDA | 3,072 | 201 | 6.5% | −$0.044 |
| AMZN | 4,114 | 139 | 3.4% | −$0.040 |
| IWM | 1,896 | 58 | 3.1% | −$0.039 |
A backtest assuming mid-fills books all of the proposed trades at a better price. Reality fills a single-digit-to-low-teens percentage of them, and the ones that do fill cross at roughly 4 cents worse than mid - because the orders that fill are disproportionately the ones the market is running through (adverse selection). Per spread, on 100-multiplier contracts, that is a structural ~$4 headwind on every fill before the trade even begins.
Close-reason breakdown (pt = clean limit fill at target; pt_x = target hit but had to cross the spread to actually get out):
| Symbol | pt (clean) | pt_x (crossed) | sl | sl_x | expiry |
|---|---|---|---|---|---|
| SPY | 103 | 133 | 63 | 32 | 4 |
| QQQ | 29 | 76 | 37 | 12 | 8 |
| AMZN | 44 | 47 | 25 | 14 | 9 |
| NVDA | 68 | 71 | 43 | 13 | 6 |
| IWM | 12 | 21 | 7 | 13 | 5 |
On every symbol, more profit-target exits required crossing the spread than filled cleanly at the limit. The idealized "close at the 50% target" that naive backtests book is, in practice, the minority outcome. This single effect - invisible in any mid-fill backtest - is a primary driver of the gap between the seductive win rate and the unimpressive Sharpe.
Net P&L by entry year (selected):
| Symbol | 2020 (COVID) | 2022 (bear/vol) | 2023 | 2024 |
|---|---|---|---|---|
| SPY | +$4,443 | +$3,692 | +$1,258 | −$172 |
| QQQ | +$2,226 | −$3,212 | −$1,982 | −$70 |
| NVDA | +$2,369 | −$4,290 | +$1,832 | −$1,896 |
| AMZN | n/a | +$776 | +$4,632 | −$3,550 |
The signal's risk_off gate genuinely cushioned 2020 (it sidestepped the worst of the COVID crash and re-entered into the rebound). But 2022 - a slow grinding vol-elevated bear - punished the single names hard even with the gate, because the strategy still traded in neutral/risk_on days while drift went against short puts.
The most uncomfortable finding: the signal's risk_on ("deploy full size") regime did not outperform neutral. In fact it underperformed on most symbols:
| Symbol | neutral P&L (win%) |
risk_on P&L (win%) |
|---|---|---|
| SPY | +$5,604 (73%) | +$2,174 (70%) |
| QQQ | −$3,000 (69%) | +$1,364 (71%) |
| NVDA | +$5,918 (75%) | +$266 (69%) |
| AMZN | +$4,669 (70%) | −$1,644 (73%) |
| IWM | +$145 (72%) | −$910 (60%) |
Regime conviction does not equal forward edge. The regime the signal is most confident about (risk_on, full Kelly) carried less edge than the middling neutral state on four of five symbols. A signal that sizes up into its highest-conviction state and earns less there is a signal whose conviction is, at best, uncorrelated with forward edge - a critical caveat for anyone tempted to lever the risk_on bucket.
SPXW - SPX weeklies, the single most-traded options complex in the world and the canonical home of the VRP trade - is the cleanest out-of-sample test available: the strategy was never tuned on it, and it had to be sourced from a different data path entirely (the FlashAlpha Historical API, at daily-close resolution, because no 1-minute SPX weekly chain was available locally).
It lost money too.
| Metric | SPXW (daily-resolution, OOS) |
|---|---|
| Trades | 112 over 7.16 yr |
| Win rate | 61.6% |
| Profit factor | 0.37 |
| Sharpe | −0.39 |
| CAGR | −0.66% |
| Avg win / avg loss | $39 / −$170 (1 : 4.4) |
risk_on regime | 95 trades, −$5,266, 60% win |
neutral regime | 17 trades, +$625, 71% win |
Two things matter here. First, the same "risk_on underperforms" pattern from §5.6 reappears independently - on SPX weeklies, sourced from a different feed, the "deploy full size" regime is again where essentially all the losses are (−$5,266 of −$4,641 net came from risk_on; neutral was net positive). A pattern that survives a complete change of instrument and data pipeline is not noise. Second, the win/loss asymmetry is even worse (1:4.4) because narrow 5-point SPX spreads collect tiny credits against a wide max loss - the short-vol payoff shape in its most unforgiving form.
Read the caveat. SPXW is not execution-comparable to the five 1-minute symbols and the article does not treat it as such:
ask_edge mechanic does not exist at daily bars, SPXW transacts at daily-close combo-mid minus the −$0.04/contract adverse-selection haircut empirically measured on the five 1-minute symbols (§5.4). That is a principled bridge to the same execution cost, but it means SPXW's fill rate is 100% by construction and its exits are all forced crosses (pt_x/sl_x), so its fill rate and Sharpe magnitude are not apples-to-apples with the 1-minute rows.SPXW is therefore cited as a directional out-of-sample confirmation - "the frozen strategy also loses on the flagship instrument, and the high-conviction-regime failure reproduces on an independent data path" - and never as a sixth execution-comparable Sharpe.
The premium shows up reliably as a high win rate everywhere. It does not reliably show up as money once you (a) pay realistic execution and (b) refuse to re-fit per symbol.
A 70% win rate with a 1:2.3 payoff is structurally breakeven. Any post leading with win rate and not showing the loss distribution is selling you the setup, not the result.
The single biggest gap between the hyped version and this one is not the signal - it is fills. Mid-fill assumptions, clean-target exits, and 100% fill rates are where the fictional returns live. See the companion piece VRP Backtest: The Fill Model Is the Edge for the controlled fill-model sensitivity study.
SPY (tuned) was the best result. Every symbol the strategy had never seen did worse, two of them losing. Honest deployment is the non-tuned result, not the tuned one.
The risk_on regime - the one a leveraged variant would size into hardest - was the weaker regime. This is the precise mechanism by which "+5,000% backtest" leveraged VRP posts blow up.
This is not an argument that VRP is untradeable. It is an argument that the honest, transferable, retail-executable edge is small, regime-fragile, and nowhere near what the genre advertises - and that anyone trading it should size for breakeven-with-tails, not for the win rate.
Everything in this article regenerates from two commands:
bash run_cross_symbol.sh # 5 symbols, 1-min local chains
python spxw_api_backtest.py --start 2019-02-01 \
--end 2026-04-02 --label vrpfrozen # SPXW, daily-close via API (§5.7)
python aggregate_cross_symbol.py # cross_symbol_summary.{csv,json}
Per-symbol engine invocation (identical except --symbol):
python intraday_bt_ev_rank.py --symbol <SYM> \
--start 2019-02-01 --end 2026-04-02 \
--delta 0.10 --dte 14 --pt 0.50 --sl 1.0 \
--vrp-signal vrp_signal_v2.csv \
--kelly-default 0.05 --kelly-mult 0.5 --vrp-on-mult 1.0 \
--kelly-max 0.25 --max-drawdown 0.30
Raw artifacts: evrank_<sym>_vrpfrozen_{summary.json,trades.csv.gz,equity_curve.csv}. The SPXW row consumed the same Historical API surface a paying user would call. See the Historical API documentation and our broader replay infrastructure write-up: Historical Options Analytics: Replay GEX, VRP, Dealer Positioning.
| Metric | SPY | QQQ | IWM | AMZN | NVDA | SPXW † |
|---|---|---|---|---|---|---|
| Trades | 335 | 162 | 58 | 139 | 201 | 112 |
| Trades / yr | 46.8 | 22.6 | 8.1 | 19.4 | 28.1 | 15.6 |
| Win rate | 71.6% | 69.8% | 63.8% | 71.2% | 72.1% | 61.6% |
| Profit factor | 1.096 | 0.956 | 0.719 | 1.081 | 1.124 | 0.366 |
| Sharpe (ann.) | 0.231 | −0.078 | −0.347 | 0.126 | 0.221 | −0.387 |
| CAGR % | 1.05 | −0.23 | −0.11 | 0.42 | 0.84 | −0.66 |
| Total return % | 7.78 | −1.64 | −0.77 | 3.02 | 6.19 | −4.64 |
| Max drawdown % | 6.48 | 7.76 | 1.24 | 7.38 | 6.87 | 5.81 |
| Calmar | 0.162 | −0.030 | −0.087 | 0.056 | 0.122 | −0.114 |
| Avg P&L / trade | $23.22 | −$10.10 | −$13.19 | $21.76 | $30.77 | −$41.44 |
| Avg win / Avg loss | $369 / −$850 | $311 / −$751 | $53 / −$130 | $407 / −$933 | $385 / −$888 | $39 / −$170 |
| Fill rate | 13.7% | 7.9% | 3.1% | 3.4% | 6.5% | 100% † |
| Avg edge captured | −$0.037 | −$0.042 | −$0.039 | −$0.040 | −$0.044 | n/a † |
| Avg days held | 2.8 | 3.6 | 3.9 | 4.0 | 3.6 | n/a |
Generated 2026-05-17 from the frozen-config cross-sectional run. $100k start, unlevered.
† SPXW: daily-close resolution via the Historical API; 100% fill by construction (transacts at mid less the −$0.04 measured haircut, not a post-and-wait limit); exits are all forced crosses so PT-hit rate is structurally 0. Out-of-sample direction check only - see §5.7. The other five are 1-minute MM-fill from local chains.
risk_on regime - where the signal sizes up to full Kelly - earned less than the middling neutral state. The same pattern reproduces on SPXW from a completely separate data feed. A regime gate whose highest-conviction state is not its highest-edge state is signaling that conviction does not equal forward edge. Practically: do not lever a strategy that relies on this signal being right when it is most confident.The FlashAlpha Historical API exposes the option chains, greeks, and underlying tape used in this study so you can run your own variant under your own fill model. The companion piece VRP Backtest: The Fill Model Is the Edge walks through the controlled fill-model comparison. See pricing for access tiers.
by Tomasz Dobrowolski
by Tomasz Dobrowolski
by Tomasz Dobrowolski
Get tick-by-tick visibility into market shifts with full-chain analytics streaming in real time.
Screen millions of option pairs per second using your custom EV rules, filters, and setups.
Instantly send structured orders to Interactive Brokers right from your scan results.