GEX Backtest: 8 Years of SPY Show Gamma Exposure Mostly Tracks VIX
A pre-registered GEX backtest on 1,972 SPY days (2018-2026) finds gamma exposure works raw, but GEX/DEX/VEX/CHEX add little after VIX and ATM IV controls.
A pre-registered GEX backtest on 1,972 SPY days (2018-2026) finds gamma exposure works raw, but GEX/DEX/VEX/CHEX add little after VIX and ATM IV controls.
If you have searched for GEX backtest, does gamma exposure work, gamma exposure accuracy, or GEX vs VIX, this is the test you probably wanted but rarely get from dealer-positioning marketing.
Most options dashboards present four dealer-exposure Greeks as four separate signals: GEX (gamma exposure), DEX (delta exposure), VEX (vanna exposure), and CHEX (charm exposure). The pitch is intuitive: dealers hedge, those hedges create flows, and those flows should forecast volatility, returns, or IV changes.
The mechanical story is real. The predictive story is much thinner.
We wrote down the hypotheses before running the statistics, then tested them on an 8-year SPY panel. Features are measured at close t; outcomes are shifted forward, so the backtest does not use future information. The key control is simple: if a dealer-exposure metric is useful, it should add information beyond volatility variables a trader can already observe, especially VIX and ATM IV.
GEX has a real raw relationship with next-day realized volatility, and it still has modest incremental value over VIX alone. But once ATM IV is added to the control set, GEX goes quiet. DEX is dead on arrival. VEX is mostly a VIX/IV proxy. CHEX is weak and not robust in the residualized rank tests. Sold as four independent signals, the exposure stack behaves more like one-and-a-half dimensions.
The dataset is one row per SPY trading day. Each row carries dealer-signed GEX, DEX, VEX, CHEX, gamma flip, VIX, ATM IV, realized volatility, and pre-computed forward outcomes. Positive exposure means dealers are net long that Greek under this convention.
| Item | Backtest setting |
|---|---|
| Universe | SPY only |
| Window | 2018-04-16 to 2026-04-02 |
| Sample | 1,972 EOD snapshots; 1,971 usable next-day outcome rows |
| Primary signals | GEX, DEX, VEX, CHEX |
| Primary outcomes | Next-day realized vol, next-day return, next-day ATM IV change |
| Controls | VIX, then VIX + ATM IV |
| Primary tests | Quintile sorts, top-minus-bottom t-tests, Spearman rank correlation |
The central question is not whether dealer exposures are mechanically meaningful. They are. The question is narrower and more useful: do the exposure Greeks add predictive information after you already know VIX and ATM IV?
Start with the classic GEX claim: positive dealer gamma suppresses realized volatility; negative dealer gamma amplifies it. We sorted every day into quintiles by same-session net_gex, then measured next-day realized vol as |log return| * sqrt(252).
What "quintile" means here. Take all 1,972 trading days, rank them by GEX from most negative to most positive, and cut into five equal groups of ~394 days each. Q1 = the 20% of days with the most negative GEX (dealers most short gamma). Q5 = the 20% of days with the most positive GEX (dealers most long gamma). Q2, Q3, Q4 are the groups in between. Same idea for VIX quintiles later: V1 = calmest 20% of days, V5 = most stressed 20%.
| GEX quintile | n | Mean net_gex ($B) | Mean next-day RV (%) | Median next-day RV (%) |
|---|---|---|---|---|
| Q1 - most negative GEX | 395 | -7.91 | 16.97 | 13.54 |
| Q2 - moderately negative | 394 | -2.79 | 18.57 | 12.91 |
| Q3 - roughly neutral | 394 | +0.25 | 12.71 | 10.29 |
| Q4 - moderately positive | 394 | +2.97 | 9.24 | 6.45 |
| Q5 - most positive GEX | 394 | +6.50 | 6.34 | 4.86 |
The headline number is hard to ignore: Q5 minus Q1 is -10.63 vol points, t = -13.00, p = 1.0e-33. Spearman ρ is -0.36, p = 4.6e-60 on n=1,971.
What "Spearman ρ" means. Rank correlation from -1 to +1. +1 = when X goes up, Y always goes up. -1 = when X goes up, Y always goes down. 0 = no relationship. The -0.36 here means: rank the 1,972 days by GEX from most negative to most positive, and they tend (with noise) to rank in the opposite order on next-day realized vol. Not a perfect rule, but a moderate and statistically real pattern. Later in the article, after controlling for VIX and ATM IV, that number drops to -0.03 - indistinguishable from random.
This is the chart that can sell a subscription. It is also not enough. A strong raw GEX backtest can still be a volatility-regime proxy. Negative-GEX days and high-VIX days often describe the same market state.
Fact check: the Q1/Q2 mean inversion is real. It comes from outlier-heavy COVID-era days; the median row is much cleaner. This is why the rank correlation matters more than a strict mean monotonicity claim.
For the residualized tests, we regress both the signal and the outcome on the control set, then compute Spearman correlation between the residuals. First we control for VIX alone. Then we control for VIX + ATM IV. Bonferroni-adjusted significance for the four primary tests is p < 0.0125.
| Signal to outcome | Raw Spearman (p) | After VIX control (p) | After VIX + ATM IV control (p) |
|---|---|---|---|
| GEX to next-day RV | -0.36 (4.6e-60) | -0.14 (1.2e-9) | -0.03 (0.18) |
| DEX to next-day return | -0.03 (0.19) | +0.01 (0.69) | +0.02 (0.40) |
| VEX to next-day ATM IV change | -0.16 (2.1e-13) | -0.05 (0.02) | -0.01 (0.77) |
| CHEX to next-day return | -0.05 (0.03) | -0.01 (0.63) | -0.00 (0.93) |
The raw signals are mostly volatility signals wearing more sophisticated labels. GEX survives the VIX-only control, but at roughly 40% of the raw rank-correlation magnitude. Add ATM IV and the GEX residual drops to ρ = -0.03 with p = 0.18. DEX, VEX, and CHEX do not carry robust independent predictive information in these residualized rank tests.
For GEX, the top-minus-bottom residualized realized-vol difference tells the same story: -10.63 vol points raw, -3.15 after VIX, and -0.99 after VIX + ATM IV (p = 0.25).
GEX is not fake. The raw effect is real. The incremental information after VIX and ATM IV is the part that fails. That distinction matters: useful regime descriptor is not the same thing as independent forecasting edge.
The most useful visual is a double-sort: first split days by VIX quintile, then within each VIX bucket split by GEX quintile. Each cell is mean next-day realized vol in annualized percentage points.
| VIX quintile GEX quintile |
Q1 most negative |
Q2 | Q3 | Q4 | Q5 most positive |
|---|---|---|---|---|---|
| V1 lowest VIX | 8.02 | 7.32 | 6.62 | 5.22 | 5.07 |
| V2 | 11.70 | 10.30 | 8.85 | 6.48 | 6.04 |
| V3 | 12.00 | 12.05 | 12.15 | 9.62 | 8.56 |
| V4 | 15.87 | 15.35 | 17.18 | 12.31 | 8.06 |
| V5 highest VIX | 20.57 | 24.91 | 37.69 | 21.71 | 15.91 |
The table is persuasive, but the precise reading matters. V1 and V2 are close to textbook: more positive GEX means lower next-day RV. V3 and V4 still show much lower RV in the positive-GEX buckets than in the negative or middle buckets, but they are not strictly monotonic. V5, the highest-VIX bucket, is a non-monotonic mess.
That is the practical caveat for traders. GEX's predictive value is mostly a calm-to-moderate market phenomenon. In the top VIX quintile, where an extra risk signal would be most valuable, this double-sort does not show a clean edge.
Verdict: useful as a regime descriptor; modestly incremental over VIX alone; not significant after VIX + ATM IV. The raw GEX backtest is real, but it is mostly a vol-regime backtest.
| DEX quintile | n | Mean net_dex ($B) | Mean next-day return | Pct positive days |
|---|---|---|---|---|
| Q1 (most negative) | 395 | -67.5 | +0.03% | 51.4% |
| Q2 | 394 | -0.7 | +0.07% | 56.3% |
| Q3 | 394 | +29.5 | +0.09% | 57.6% |
| Q4 | 394 | +52.7 | +0.01% | 53.6% |
| Q5 (most positive) | 394 | +90.1 | +0.03% | 57.6% |
Top-minus-bottom return difference is 0.00%, t = 0.04, p = 0.97. Raw Spearman ρ is -0.03 with p = 0.19. DEX does not predict next-day SPY direction in this EOD test.
| VEX quintile | n | Mean net_vex ($B) | Mean next-day ATM IV change | Pct positive IV-change days |
|---|---|---|---|---|
| Q1 (most negative) | 395 | -299.7 | +0.24 vol pts | 55.2% |
| Q2 | 394 | -173.6 | +0.24 vol pts | 50.0% |
| Q3 | 394 | -98.8 | +0.09 vol pts | 47.0% |
| Q4 | 394 | -23.3 | +0.04 vol pts | 42.4% |
| Q5 (most positive) | 394 | +80.5 | -0.60 vol pts | 35.3% |
At the raw level, VEX looks useful: ρ = -0.16, p = 2.1e-13. But VEX correlates with VIX at +0.72 and with ATM IV at +0.76. After controlling for VIX + ATM IV, the residualized VEX test falls to ρ = -0.01, p = 0.77.
| CHEX quintile | n | Mean net_chex ($M) | Mean next-day return | Pct positive days |
|---|---|---|---|---|
| Q1 (most negative) | 395 | -1.04 | +0.11% | 53.9% |
| Q2 | 394 | +1.12 | +0.07% | 57.4% |
| Q3 | 394 | +1.98 | +0.04% | 57.9% |
| Q4 | 394 | +2.89 | -0.00% | 52.5% |
| Q5 (most positive) | 394 | +4.90 | +0.01% | 54.8% |
CHEX has one interesting raw result: sign agreement between CHEX and next-day SPY return is 54.9% on n=1,967, p = 1.5e-5. But the pre-registered residualized rank test collapses under VIX control (ρ = -0.01, p = 0.63) and under VIX + ATM IV (ρ = -0.00, p = 0.93). A separate OLS file does show a significant CHEX coefficient after VIX + AR(1) controls, so the clean wording is not "CHEX can never matter." It is: CHEX is not robust in the rank/quintile framework used for the primary article claim.
The reason the controls matter is visible in the correlation matrix. These are Spearman rank correlations across all 1,972 SPY observations.
| GEX | DEX | VEX | CHEX | VIX | ATM IV | |
|---|---|---|---|---|---|---|
| GEX | +1.00 | +0.73 | -0.54 | +0.59 | -0.49 | -0.63 |
| DEX | +0.73 | +1.00 | -0.89 | +0.68 | -0.58 | -0.67 |
| VEX | -0.54 | -0.89 | +1.00 | -0.65 | +0.72 | +0.76 |
| CHEX | +0.59 | +0.68 | -0.65 | +1.00 | -0.39 | -0.46 |
| VIX | -0.49 | -0.58 | +0.72 | -0.39 | +1.00 | +0.91 |
| ATM IV | -0.63 | -0.67 | +0.76 | -0.46 | +0.91 | +1.00 |
DEX and VEX correlate at -0.89. VEX and VIX correlate at +0.72. VIX and ATM IV correlate at +0.91. That does not make the exposure Greeks useless, but it does make them dangerous to treat as independent features.
If someone sells GEX, DEX, VEX, and CHEX as four unrelated predictive signals, this correlation matrix is the rebuttal. They are four views of the same options chain, and much of what they capture is already in volatility level.
The regime split is where the GEX backtest becomes most useful for real trading decisions. The high-VIX row is not cherry-picked; it was pre-registered.
| Regime | n | Top-minus-bottom RV diff | t-stat | p-value | Spearman ρ |
|---|---|---|---|---|---|
| All days | 1,971 | -10.63 | -13.00 | 1.0e-33 | -0.36 |
| Pre-COVID | 463 | -11.43 | -6.78 | 4.5e-10 | -0.41 |
| COVID shock | 72 | -59.88 | -3.88 | 0.001 | -0.50 |
| Post-COVID | 1,436 | -9.62 | -9.58 | 8.7e-20 | -0.32 |
| Low-VIX days | 1,478 | -8.09 | -11.81 | 3.9e-28 | -0.32 |
| High-VIX days | 493 | -1.89 | -0.78 | 0.44 | -0.02 |
Across all days, GEX looks excellent. Inside high-VIX days, the effect is statistically absent: top-minus-bottom difference -1.89 vol points, t = -0.78, p = 0.44, Spearman ρ = -0.02.
That is the trading lesson. GEX is best at labeling calm regimes. It is not a clean crisis detector in this EOD SPY sample.
The 70/30 chronological split looks impressive at first glance.
| Split | Window | n | Top-minus-bottom diff | Spearman ρ |
|---|---|---|---|---|
| In-sample | 2018-04 to 2023-11 | 1,380 | -10.77 | -0.367 |
| Out-of-sample | 2023-12 to 2026-04 | 591 | -10.79 | -0.374 |
The naive interpretation is that GEX is exceptionally robust. The more cautious interpretation is that GEX is proxying a stable volatility-regime variable. The residualized controls point to the second explanation: VIX and ATM IV absorb the effect.
Here are the raw artifacts used for the article. The goal is simple: do not trust the prose if the CSVs disagree with it.
One-click bundle: download all CSVs, hypotheses, and Python scripts.
Individual result tables are also available: GEX to RV, DEX to return, VEX to IV change, CHEX to return, regime splits, and train/test split.
The analysis combines daily dealer-exposure summaries, daily stock summaries, and daily VRP snapshots from the FlashAlpha historical pipeline. The master dataset aligns features at row t with future outcomes using shifted columns, so the next-day tests do not leak future data into the signal.
The backtest pulled directly from the historical subdomain, one day at a time (the endpoints accept an at= timestamp and return a point-in-time snapshot):
# Dealer-exposure summary (GEX, DEX, VEX, CHEX, gamma flip, walls)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
"https://historical.flashalpha.com/v1/exposure/summary/SPY?at=2024-06-14T20:00:00Z"
# Stock summary (spot, VIX context, ATM IV, realized vol)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
"https://historical.flashalpha.com/v1/stock/SPY/summary?at=2024-06-14T20:00:00Z"
# VRP snapshot (implied-vs-realized spread, regime, harvest score)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
"https://historical.flashalpha.com/v1/vrp/SPY?at=2024-06-14T20:00:00Z"
# Loop over 1,972 trading days, join on (ts, symbol), shift outcomes by -1
# for next-day tests. Residualize signals and outcomes on VIX, then on
# VIX + ATM IV. Re-run the quintile sorts, Spearman tests, and regime splits.
For live or historical replay outside this article, see the FlashAlpha Historical API. Current public docs list SPY historical coverage from 2018-04-16, with more symbols backfilled on request for Alpha customers.
Gamma exposure works at the raw regime level, but the independent GEX edge is much smaller than the marketing version. Across 1,972 SPY days, raw GEX to next-day realized vol is ρ = -0.36. After VIX + ATM IV controls, it is ρ = -0.03 with p = 0.18. DEX has no next-day return signal, VEX is mostly a VIX/IV proxy, and CHEX is fragile in the residualized rank tests. Use dealer exposure as context. Do not treat it as four clean standalone alpha signals.
by Tomasz Dobrowolski
by Tomasz Dobrowolski
by Tomasz Dobrowolski
Get tick-by-tick visibility into market shifts with full-chain analytics streaming in real time.
Screen millions of option pairs per second using your custom EV rules, filters, and setups.
Instantly send structured orders to Interactive Brokers right from your scan results.