VRP Short Put Spreads: Honest 7-Year Backtest, 5 Symbols (2026) | FlashAlpha

VRP Short Put Spreads: Honest 7-Year Backtest, 5 Symbols (2026)

Single VRP short-put-spread strategy, tuned once on SPY, frozen, deployed unchanged on QQQ/IWM/AMZN/NVDA/SPXW with honest fills. 70% win rate held. Sharpe and edge did not.

T
Tomasz Dobrowolski Quant Engineer
May 20, 2026
49 min read
VRP VolatilityRiskPremium ShortPutSpread OptionsBacktest OptionsStrategy ExecutionRisk

⚠ Framing correction (read first). This article's thesis - "the edge doesn't survive honest execution" - is overstated. A follow-up controlled study, VRP Backtest: The Fill Model Is the Edge, shows the result is bounded by the fill-model assumption: honest fills give breakeven to negative (these numbers); idealized mid-fills (the universal public-backtest default) give strongly positive. The correct conclusion is execution-fragility and a result-range straddling zero, not "VRP is dead." The data below is accurate for the honest (ask_edge) model specifically; read it as the pessimistic bound, not the verdict.

Search "volatility risk premium strategy" and you will find a hundred posts with an equity curve going up and to the right and a win rate north of 80%. They are almost all built on three quiet lies: mid-price fills, clean profit-target exits, and parameters that were chosen on the same symbol they are advertised on. This study removes all three.

We took a single short-put-spread VRP harvest, tuned it once on SPY in prior work, froze every parameter, and ran it unchanged across QQQ, IWM, AMZN, and NVDA from 2019 to 2026. We then added SPXW as a flagship out-of-sample check via a completely separate daily-resolution data path. Execution was modeled the way a real account experiences it: post-and-wait limit orders, exits that have to cross the spread, a stale-quote guard, and a signal that can only see the past.

This is the most honest thing we can publish about VRP harvesting. The 70% win rate the internet sells is real. The risk-adjusted edge that survives honest execution and out-of-sample deployment is close to zero.

64-72%
Win rate held across all 5 symbols - the VRP probability edge travels
3-14%
Of proposed orders actually filled under post-and-wait limits
2 of 4
Out-of-sample symbols lost money; SPXW (flagship) also lost

TL;DR - the cross-sectional table

Symbol Trades Win rate Profit factor Sharpe CAGR % MaxDD % Fill rate
SPY (tuned) 335 71.6% 1.10 0.23 1.05 6.48 13.7%
NVDA 201 72.1% 1.12 0.22 0.84 6.87 6.5%
AMZN 139 71.2% 1.08 0.13 0.42 7.38 3.4%
QQQ 162 69.8% 0.96 −0.08 −0.23 7.76 7.9%
IWM 58 63.8% 0.72 −0.35 −0.11 1.24 3.1%
SPXW † 112 61.6% 0.37 −0.39 −0.66 5.81 100% †

Period: 2019-02-01 to 2026-04-02 (7.16 years). $100k notional, unlevered Half-Kelly.
† SPXW is sourced from the Historical API at daily-close resolution, not 1-minute. Its fill rate is 100% by construction (it transacts at daily-close mid less the measured −$0.04 haircut, not a post-and-wait limit). It is a valid out-of-sample direction confirmation but its Sharpe and fill rate are not methodologically comparable to the 1-minute rows. See §5.7.

1. Why this study exists

The volatility risk premium is real. Implied volatility has, on average, exceeded subsequently realized volatility for decades. The economic rationale (insurance demand, dealer inventory) is sound. The open question was never "does VRP exist." It is: how much of that premium survives honest execution when you don't get to re-fit the strategy to every new market?

The genre of VRP content you find online almost always rests on three assumptions that real accounts never enjoy.

The hyped VRP backtest
  • Mid-price fills, instantly, every time
  • Profit targets and stops execute at the trigger price
  • One parameter set, one symbol, one window - in-sample
  • 100% fill rate (every proposed trade books)
  • Equity curve looks like a savings account chart
This study (honest reference)
  • Post-and-wait limit orders at ask_edge; most don't fill
  • Exits cross the spread when the limit doesn't catch
  • One frozen config tuned on SPY, deployed unchanged on 4 others
  • 3-14% fill rate (cross-sectional reality)
  • Adverse selection visible: fills cost ~$0.04 worse than mid

This study answers exactly that question, and nothing more. The takeaway is not that VRP is untradeable. The takeaway is that the honest, transferable, retail-executable edge is much smaller than the genre advertises - and the win rate is a misleading place to anchor your expectations.

2. The strategy (frozen specification)

A short put credit spread harvest on the underlying, gated by a market-regime VRP signal. Every number below was fixed before the cross-sectional run and never changed per symbol.

Structure: short put vertical spread

Defined risk; the canonical retail VRP expression. Sells a higher-strike put, buys a lower-strike put for protection. Maximum loss is the spread width minus the credit collected.

Short leg target: 0.10 delta

Roughly 90% out-of-the-money probability; far-wing premium. The wing is where insurance demand is densest and the empirical VRP signature is strongest.

DTE target: 14 days (±2)

The Calmar-leading tenor identified in prior SPY tuning. Long enough to collect meaningful theta, short enough that gamma exposure during the holding period is bounded.

Profit target / stop loss: 50% / 100% of credit

Close when the spread has decayed halfway (PT) or when the loss equals the credit taken (SL). Standard VRP management; not a tuned parameter.

Spread widths: 5 / 10 / 15 / 20 / 25 / 30 points

Candidate pool. On every entry bar, the engine evaluates all candidate widths and picks the one with the best expected value (delta-derived). EV is computed from observable greeks only.

Sizing: Half-Kelly, default 0.05, cap 0.25

The unlevered honest reference. Half-Kelly with a 0.25 cap deploys little capital per trade; absolute drawdowns and CAGR are correspondingly small. A leveraged variant exists but is explicitly not reported here because it has not been OOS-validated. See §6.

Entry time: 10:05 ET

Post-open noise gone, spreads stabilized. The first thirty-five minutes of the regular session are intentionally excluded.

Regime gate: vrp_signal_v2.csv

One SPY/VIX market-regime signal, applied to all five symbols. Outputs one of four states - risk_on, neutral, reduce, risk_off - and a continuous Kelly multiplier. risk_off means no trade that day.

Circuit breaker: 30% peak-to-trough

Stop trading if drawdown breaches 30%. Never triggered in this study, but present as a safety floor.

The signal in detail. A daily market-regime classifier built from VIX, VIX9D, VVIX, the VIX term structure, a realized-vs-implied VRP proxy and its 252-day percentile, high-yield spreads, and stress z-scores. Crucially, this is a market-wide signal computed from SPY/VIX. Applying the identical signal to QQQ, IWM, AMZN, and NVDA is what makes those four a genuine cross-sectional out-of-sample test rather than four separately fitted strategies. The signal construction is leak-free by design - for the methodology see Historical VRP Percentiles Without Lookahead Bias.

3. The honesty controls (what makes this citable)

This is the entire point of the study. Each control below removes a specific way backtests lie.

  1. Leak-free signal

    The signal CSV is shifted forward one calendar day with a 7-day walk-back lookup. Any trading day's decision uses only data fully observable before that day's open. There is no way for the strategy to "know" today's VRP when deciding to trade today.

  2. Honest fills (post-and-wait limits)

    We do not fill at mid. We post a limit at the offer plus a few cents (ask_edge) and wait for someone else's order to cross our price. If nobody does within the wait window, the order is cancelled and we re-rank. This is why the fill rate is 3-14% and not 100%.

  3. Stale-quote guard

    When a cross is detected, we re-check that the mid hasn't moved through our limit by more than a floor (−$0.05). A one-tick bid blip during a vol spike does not get to manufacture a phantom fill.

  4. Patient-then-cross exits

    Profit-target and stop-loss exits are not free. We post a buy-to-close limit at the trigger and wait a few bars. If it fills, great. If it doesn't, we cross the spread and pay the offer - and that exit is tagged pt_x / sl_x so the execution tax is auditable. It is large. See §5.3 and §5.5.

  5. EV-blind tiebreak

    When multiple candidate spreads cross on the same bar, the winner is chosen by a timestamp-seeded random shuffle - never by which one had the best expected value. Any EV-aware tiebreak is a look-ahead oracle that silently inflates win rates.

  6. Frozen, never re-fit

    The configuration in §2 was tuned once, on SPY, in prior work. For this study it was frozen. QQQ, IWM, AMZN, and NVDA received zero per-symbol optimization. What you see on those four is what a trader who tuned on SPY in 2019 and walked away would actually have experienced.

The execution logic above is identical to the open-sourced fillsim simulator we extracted from this exact backtester, precisely because every off-the-shelf options framework we evaluated assumed mid-fills. For deeper treatment of the fill-model sensitivity, see the companion piece VRP Backtest: The Fill Model Is the Edge.

Methodology transparency

Every number in this article is reproducible from the engine commands in §8 against the raw artifacts in the Historical Analytics API replay infrastructure. For practitioners who want to run their own variant, the Historical API exposes the same option chains and greeks the SPXW row used.

4. Data

Field Value
Symbols SPY, QQQ, IWM, AMZN, NVDA (1-minute local chains); SPXW (daily-close via Historical API)
Period 2019-02-01 to 2026-04-02 (7.16 years)
Option chains 1-minute bid/ask + greeks, per symbol
Spot 1-minute underlying tape
Signal inputs EOD macro (VIX complex, HY spreads)
Starting capital $100,000, unlevered Half-Kelly

All five intraday symbols carry a dense weekly/daily expiry ladder, so the 14-DTE entry window is continuously reachable. SPXW is handled separately (§5.7): its 1-minute weekly chain was not available locally, so it was sourced from the FlashAlpha Historical API at daily-close resolution and is reported as an out-of-sample direction check, not a sixth execution-comparable row. The reasons minute-level options analytics are this hard to assemble are catalogued in Why Historical Options Analytics Are Rare.

5. Findings

5.1 The win rate is stable. That is the headline everyone stops at.

Across five very different underlyings - a broad-market ETF, a tech ETF, a small-cap ETF, and two megacap single names - the win rate clusters tightly at 64-72%. The VRP edge in probability terms is robust and travels well cross-sectionally. If you stop reading here, VRP harvesting looks like a free lunch. This is exactly where most articles stop.

5.2 The risk-adjusted edge does not survive honest, frozen, out-of-sample deployment.

Symbol Sharpe Profit factor CAGR % Verdict
SPY (in-sample symbol) 0.23 1.10 1.05 Marginally positive - on the symbol it was tuned on
NVDA 0.22 1.12 0.84 Marginally positive
AMZN 0.13 1.08 0.42 Barely positive
QQQ −0.08 0.96 −0.23 Negative
IWM −0.35 0.72 −0.11 Clearly negative

Two of four out-of-sample symbols lost money. The best non-tuned result (NVDA, Sharpe 0.22) is still a Sharpe you would not trade. The premium is real; the tradeable, honest, transferable edge is close to zero.

5.3 Why a 70% win rate nets to nothing: the payoff is brutally asymmetric.

Symbol Avg win Avg loss Win:loss ratio
SPY$369−$8501 : 2.3
QQQ$311−$7511 : 2.4
AMZN$407−$9331 : 2.3
NVDA$385−$8881 : 2.3
IWM$53−$1301 : 2.4
Expected value of a 70% / 1:2.3 short-vol payoff $$ E[\text{trade}] = 0.70 \times 1 - 0.30 \times 2.3 \approx -0.0 $$

A 70% win rate at a 1:2.3 win/loss ratio is, by arithmetic, a coin flip. This is the defining feature of short-vol: you win small, often, and lose big, occasionally - and it is precisely why the win rate is the wrong number to look at. The strategy is selling insurance; the premium and the claims very nearly cancel.

5.4 Only 3-14% of orders filled. And the fills were adversely selected.

Symbol Proposed Filled Fill rate Avg edge captured
SPY2,43833513.7%−$0.037
QQQ2,0551627.9%−$0.042
NVDA3,0722016.5%−$0.044
AMZN4,1141393.4%−$0.040
IWM1,896583.1%−$0.039

A backtest assuming mid-fills books all of the proposed trades at a better price. Reality fills a single-digit-to-low-teens percentage of them, and the ones that do fill cross at roughly 4 cents worse than mid - because the orders that fill are disproportionately the ones the market is running through (adverse selection). Per spread, on 100-multiplier contracts, that is a structural ~$4 headwind on every fill before the trade even begins.

5.5 The "profit target" is mostly a forced spread cross.

Close-reason breakdown (pt = clean limit fill at target; pt_x = target hit but had to cross the spread to actually get out):

Symbol pt (clean) pt_x (crossed) sl sl_x expiry
SPY10313363324
QQQ297637128
AMZN444725149
NVDA687143136
IWM12217135

On every symbol, more profit-target exits required crossing the spread than filled cleanly at the limit. The idealized "close at the 50% target" that naive backtests book is, in practice, the minority outcome. This single effect - invisible in any mid-fill backtest - is a primary driver of the gap between the seductive win rate and the unimpressive Sharpe.

5.6 Year by year: the signal helps in crises, and the "full-risk" regime doesn't earn its name.

Net P&L by entry year (selected):

Symbol 2020 (COVID) 2022 (bear/vol) 2023 2024
SPY+$4,443+$3,692+$1,258−$172
QQQ+$2,226−$3,212−$1,982−$70
NVDA+$2,369−$4,290+$1,832−$1,896
AMZNn/a+$776+$4,632−$3,550

The signal's risk_off gate genuinely cushioned 2020 (it sidestepped the worst of the COVID crash and re-entered into the rebound). But 2022 - a slow grinding vol-elevated bear - punished the single names hard even with the gate, because the strategy still traded in neutral/risk_on days while drift went against short puts.

The most uncomfortable finding: the signal's risk_on ("deploy full size") regime did not outperform neutral. In fact it underperformed on most symbols:

Symbol neutral P&L (win%) risk_on P&L (win%)
SPY+$5,604 (73%)+$2,174 (70%)
QQQ−$3,000 (69%)+$1,364 (71%)
NVDA+$5,918 (75%)+$266 (69%)
AMZN+$4,669 (70%)−$1,644 (73%)
IWM+$145 (72%)−$910 (60%)

Regime conviction does not equal forward edge. The regime the signal is most confident about (risk_on, full Kelly) carried less edge than the middling neutral state on four of five symbols. A signal that sizes up into its highest-conviction state and earns less there is a signal whose conviction is, at best, uncorrelated with forward edge - a critical caveat for anyone tempted to lever the risk_on bucket.

5.7 The flagship instrument, fully out-of-sample, confirms it

SPXW - SPX weeklies, the single most-traded options complex in the world and the canonical home of the VRP trade - is the cleanest out-of-sample test available: the strategy was never tuned on it, and it had to be sourced from a different data path entirely (the FlashAlpha Historical API, at daily-close resolution, because no 1-minute SPX weekly chain was available locally).

It lost money too.

Metric SPXW (daily-resolution, OOS)
Trades112 over 7.16 yr
Win rate61.6%
Profit factor0.37
Sharpe−0.39
CAGR−0.66%
Avg win / avg loss$39 / −$170 (1 : 4.4)
risk_on regime95 trades, −$5,266, 60% win
neutral regime17 trades, +$625, 71% win

Two things matter here. First, the same "risk_on underperforms" pattern from §5.6 reappears independently - on SPX weeklies, sourced from a different feed, the "deploy full size" regime is again where essentially all the losses are (−$5,266 of −$4,641 net came from risk_on; neutral was net positive). A pattern that survives a complete change of instrument and data pipeline is not noise. Second, the win/loss asymmetry is even worse (1:4.4) because narrow 5-point SPX spreads collect tiny credits against a wide max loss - the short-vol payoff shape in its most unforgiving form.

Read the caveat. SPXW is not execution-comparable to the five 1-minute symbols and the article does not treat it as such:

  • It is daily-close resolution: one 16:00 chain snapshot per trading day.
  • Because the 1-minute post-and-wait ask_edge mechanic does not exist at daily bars, SPXW transacts at daily-close combo-mid minus the −$0.04/contract adverse-selection haircut empirically measured on the five 1-minute symbols (§5.4). That is a principled bridge to the same execution cost, but it means SPXW's fill rate is 100% by construction and its exits are all forced crosses (pt_x/sl_x), so its fill rate and Sharpe magnitude are not apples-to-apples with the 1-minute rows.

SPXW is therefore cited as a directional out-of-sample confirmation - "the frozen strategy also loses on the flagship instrument, and the high-conviction-regime failure reproduces on an independent data path" - and never as a sixth execution-comparable Sharpe.

6. What this means

  1. VRP is real; the free lunch is not.

    The premium shows up reliably as a high win rate everywhere. It does not reliably show up as money once you (a) pay realistic execution and (b) refuse to re-fit per symbol.

  2. The win rate is the most misleading number in options content.

    A 70% win rate with a 1:2.3 payoff is structurally breakeven. Any post leading with win rate and not showing the loss distribution is selling you the setup, not the result.

  3. Execution is the strategy.

    The single biggest gap between the hyped version and this one is not the signal - it is fills. Mid-fill assumptions, clean-target exits, and 100% fill rates are where the fictional returns live. See the companion piece VRP Backtest: The Fill Model Is the Edge for the controlled fill-model sensitivity study.

  4. Out-of-sample is brutal and necessary.

    SPY (tuned) was the best result. Every symbol the strategy had never seen did worse, two of them losing. Honest deployment is the non-tuned result, not the tuned one.

  5. Don't lever the confidence.

    The risk_on regime - the one a leveraged variant would size into hardest - was the weaker regime. This is the precise mechanism by which "+5,000% backtest" leveraged VRP posts blow up.

This is not an argument that VRP is untradeable. It is an argument that the honest, transferable, retail-executable edge is small, regime-fragile, and nowhere near what the genre advertises - and that anyone trading it should size for breakeven-with-tails, not for the win rate.

7. Limitations and caveats (read before citing)

  • Out-of-sample in symbol, not in time. The signal's thresholds were originally chosen on a 2018-2026 SPY window. Deploying the frozen config on four other symbols is genuine cross-sectional OOS, but it is not walk-forward-in-time OOS. We do not claim it is.
  • This is the unlevered honest reference. Half-Kelly, default 0.05, cap 0.25 deploys little capital, so absolute CAGR and drawdowns are small (1-8%). A leveraged variant of this exact strategy shows a far larger in-sample CAGR - that figure is explicitly not reported here because it is in-sample and not OOS-validated. Do not cite a leveraged number against this study.
  • Sharpe is trade-level (per-trade Sharpe × √trades-per-year), internally consistent across these rows but not directly comparable to a daily-return buy-and-hold Sharpe.
  • IWM is statistically thin (58 trades / 7 yr; 3.1% fill rate). Treat its negative Sharpe as directional, not conclusive.
  • SPXW is a daily-resolution, out-of-sample direction check, not an execution-comparable row. It is sourced from the FlashAlpha Historical API at daily-close cadence, transacts at mid-less-measured-haircut (100% fill by construction), and its exits are all forced crosses. Cite it for "the frozen strategy also loses on the flagship instrument and the regime pattern reproduces," never for a comparable Sharpe or fill rate. Full treatment in §5.7.
  • No commissions/fees are modeled; a real account's results would be worse, not better. This strengthens, not weakens, the central finding.

8. Reproduction

Everything in this article regenerates from two commands:

bash run_cross_symbol.sh                              # 5 symbols, 1-min local chains
python spxw_api_backtest.py --start 2019-02-01 \
       --end 2026-04-02 --label vrpfrozen             # SPXW, daily-close via API (§5.7)
python aggregate_cross_symbol.py                      # cross_symbol_summary.{csv,json}

Per-symbol engine invocation (identical except --symbol):

python intraday_bt_ev_rank.py --symbol <SYM> \
  --start 2019-02-01 --end 2026-04-02 \
  --delta 0.10 --dte 14 --pt 0.50 --sl 1.0 \
  --vrp-signal vrp_signal_v2.csv \
  --kelly-default 0.05 --kelly-mult 0.5 --vrp-on-mult 1.0 \
  --kelly-max 0.25 --max-drawdown 0.30

Raw artifacts: evrank_<sym>_vrpfrozen_{summary.json,trades.csv.gz,equity_curve.csv}. The SPXW row consumed the same Historical API surface a paying user would call. See the Historical API documentation and our broader replay infrastructure write-up: Historical Options Analytics: Replay GEX, VRP, Dealer Positioning.

Appendix A: Full metrics table

Metric SPY QQQ IWM AMZN NVDA SPXW †
Trades33516258139201112
Trades / yr46.822.68.119.428.115.6
Win rate71.6%69.8%63.8%71.2%72.1%61.6%
Profit factor1.0960.9560.7191.0811.1240.366
Sharpe (ann.)0.231−0.078−0.3470.1260.221−0.387
CAGR %1.05−0.23−0.110.420.84−0.66
Total return %7.78−1.64−0.773.026.19−4.64
Max drawdown %6.487.761.247.386.875.81
Calmar0.162−0.030−0.0870.0560.122−0.114
Avg P&L / trade$23.22−$10.10−$13.19$21.76$30.77−$41.44
Avg win / Avg loss$369 / −$850$311 / −$751$53 / −$130$407 / −$933$385 / −$888$39 / −$170
Fill rate13.7%7.9%3.1%3.4%6.5%100% †
Avg edge captured−$0.037−$0.042−$0.039−$0.040−$0.044n/a †
Avg days held2.83.63.94.03.6n/a

Generated 2026-05-17 from the frozen-config cross-sectional run. $100k start, unlevered.
† SPXW: daily-close resolution via the Historical API; 100% fill by construction (transacts at mid less the −$0.04 measured haircut, not a post-and-wait limit); exits are all forced crosses so PT-hit rate is structurally 0. Out-of-sample direction check only - see §5.7. The other five are 1-minute MM-fill from local chains.

Frequently Asked Questions

Because the payoff is asymmetric. Average win is roughly $370, average loss is roughly $850 - a 1:2.3 ratio. At 70% wins and 30% losses, expected value per trade is 0.70 × 1 minus 0.30 × 2.3, which is approximately zero. Short-vol strategies win small often and lose big occasionally; the win rate alone does not capture the loss distribution.
An honest fill model posts a limit order at or near the bid/ask and only books a fill when an actual market order crosses that limit. In this study, only 3 to 14 percent of proposed orders filled, and the ones that did filled at roughly four cents worse than mid because the orders that get hit are disproportionately the ones the market is running through (adverse selection). Mid-fill assumptions, the public-backtest default, book all proposed trades at a better price than reality allows.
Per-symbol optimization manufactures in-sample edge that does not transfer. The point of the study was to estimate the edge a trader would actually experience: tune once on one symbol, freeze, deploy unchanged on others. Anything else is fitting four separate strategies and reporting the union, which is how most public VRP backtests inflate their results.
On four of five 1-minute symbols, the risk_on regime - where the signal sizes up to full Kelly - earned less than the middling neutral state. The same pattern reproduces on SPXW from a completely separate data feed. A regime gate whose highest-conviction state is not its highest-edge state is signaling that conviction does not equal forward edge. Practically: do not lever a strategy that relies on this signal being right when it is most confident.
No. The volatility risk premium is real and economically grounded. What the study shows is that under honest post-and-wait execution and a frozen out-of-sample configuration, the risk-adjusted edge is close to zero on five 1-minute underlyings and the SPXW direction check. The follow-up companion piece on fill-model sensitivity shows that idealized mid-fills produce strongly positive results, so the honest result is best read as the pessimistic bound of a range that straddles zero - not as a verdict against VRP.

Related Reading

Run your own honest backtest

The FlashAlpha Historical API exposes the option chains, greeks, and underlying tape used in this study so you can run your own variant under your own fill model. The companion piece VRP Backtest: The Fill Model Is the Edge walks through the controlled fill-model comparison. See pricing for access tiers.

Live Market Pulse

Get tick-by-tick visibility into market shifts with full-chain analytics streaming in real time.

Intelligent Screening

Screen millions of option pairs per second using your custom EV rules, filters, and setups.

Execution-Ready

Instantly send structured orders to Interactive Brokers right from your scan results.

Join the Community

Discord

Engage in real time conversations with us!

Twitter / X

Follow us for real-time updates and insights!

GitHub

Explore our open-source SDK, examples, and analytics resources!