GEX Backtest: 8 Years of SPY Show Gamma Exposure Mostly Tracks VIX | FlashAlpha

GEX Backtest: 8 Years of SPY Show Gamma Exposure Mostly Tracks VIX

A pre-registered GEX backtest on 1,972 SPY days (2018-2026) finds gamma exposure works raw, but GEX/DEX/VEX/CHEX add little after VIX and ATM IV controls.

T
Tomasz Dobrowolski Quant Engineer
Apr 23, 2026
35 min read
GEX GammaExposure Backtesting SPY VIX DealerPositioning VEX CHEX

If you have searched for GEX backtest, does gamma exposure work, gamma exposure accuracy, or GEX vs VIX, this is the test you probably wanted but rarely get from dealer-positioning marketing.

Most options dashboards present four dealer-exposure Greeks as four separate signals: GEX (gamma exposure), DEX (delta exposure), VEX (vanna exposure), and CHEX (charm exposure). The pitch is intuitive: dealers hedge, those hedges create flows, and those flows should forecast volatility, returns, or IV changes.

The mechanical story is real. The predictive story is much thinner.

We wrote down the hypotheses before running the statistics, then tested them on an 8-year SPY panel. Features are measured at close t; outcomes are shifted forward, so the backtest does not use future information. The key control is simple: if a dealer-exposure metric is useful, it should add information beyond volatility variables a trader can already observe, especially VIX and ATM IV.

1,972
SPY EOD snapshots
-0.36
Raw GEX vs RV rank corr.
-0.03
After VIX + ATM IV
0.18
Full-control p-value
Bottom line

GEX has a real raw relationship with next-day realized volatility, and it still has modest incremental value over VIX alone. But once ATM IV is added to the control set, GEX goes quiet. DEX is dead on arrival. VEX is mostly a VIX/IV proxy. CHEX is weak and not robust in the residualized rank tests. Sold as four independent signals, the exposure stack behaves more like one-and-a-half dimensions.

SPY 2018-2026 · n = 1,972 days
Mean next-day realized vol by GEX quintile
Sort every day by gamma exposure. Cut into 5 equal groups. Look at next day's actual volatility. More positive GEX = calmer next day. The raw pattern is clean.
Q1 - most negative GEX
17.0%
Q2
18.6%
Q3
12.7%
Q4
9.2%
Q5 - most positive GEX
6.3%
The catch
After controlling for VIX and ATM IV, the GEX-only signal drops from rank correlation -0.36 (strong) to -0.03 (statistical noise, p=0.18). Most of what you see above is vol-regime, not unique GEX information. Details and regime breakdowns below.
flashalpha.com · 8 years SPY backtest

What We Tested

The dataset is one row per SPY trading day. Each row carries dealer-signed GEX, DEX, VEX, CHEX, gamma flip, VIX, ATM IV, realized volatility, and pre-computed forward outcomes. Positive exposure means dealers are net long that Greek under this convention.

ItemBacktest setting
UniverseSPY only
Window2018-04-16 to 2026-04-02
Sample1,972 EOD snapshots; 1,971 usable next-day outcome rows
Primary signalsGEX, DEX, VEX, CHEX
Primary outcomesNext-day realized vol, next-day return, next-day ATM IV change
ControlsVIX, then VIX + ATM IV
Primary testsQuintile sorts, top-minus-bottom t-tests, Spearman rank correlation

The central question is not whether dealer exposures are mechanically meaningful. They are. The question is narrower and more useful: do the exposure Greeks add predictive information after you already know VIX and ATM IV?


Naive GEX Looks Excellent

Start with the classic GEX claim: positive dealer gamma suppresses realized volatility; negative dealer gamma amplifies it. We sorted every day into quintiles by same-session net_gex, then measured next-day realized vol as |log return| * sqrt(252).

What "quintile" means here. Take all 1,972 trading days, rank them by GEX from most negative to most positive, and cut into five equal groups of ~394 days each. Q1 = the 20% of days with the most negative GEX (dealers most short gamma). Q5 = the 20% of days with the most positive GEX (dealers most long gamma). Q2, Q3, Q4 are the groups in between. Same idea for VIX quintiles later: V1 = calmest 20% of days, V5 = most stressed 20%.

GEX quintilenMean net_gex ($B)Mean next-day RV (%)Median next-day RV (%)
Q1 - most negative GEX395-7.9116.9713.54
Q2 - moderately negative394-2.7918.5712.91
Q3 - roughly neutral394+0.2512.7110.29
Q4 - moderately positive394+2.979.246.45
Q5 - most positive GEX394+6.506.344.86

The headline number is hard to ignore: Q5 minus Q1 is -10.63 vol points, t = -13.00, p = 1.0e-33. Spearman ρ is -0.36, p = 4.6e-60 on n=1,971.

What "Spearman ρ" means. Rank correlation from -1 to +1. +1 = when X goes up, Y always goes up. -1 = when X goes up, Y always goes down. 0 = no relationship. The -0.36 here means: rank the 1,972 days by GEX from most negative to most positive, and they tend (with noise) to rank in the opposite order on next-day realized vol. Not a perfect rule, but a moderate and statistically real pattern. Later in the article, after controlling for VIX and ATM IV, that number drops to -0.03 - indistinguishable from random.

This is the chart that can sell a subscription. It is also not enough. A strong raw GEX backtest can still be a volatility-regime proxy. Negative-GEX days and high-VIX days often describe the same market state.

Fact check: the Q1/Q2 mean inversion is real. It comes from outlier-heavy COVID-era days; the median row is much cleaner. This is why the rank correlation matters more than a strict mean monotonicity claim.


The Control That Changes the Story

For the residualized tests, we regress both the signal and the outcome on the control set, then compute Spearman correlation between the residuals. First we control for VIX alone. Then we control for VIX + ATM IV. Bonferroni-adjusted significance for the four primary tests is p < 0.0125.

Signal to outcomeRaw Spearman (p)After VIX control (p)After VIX + ATM IV control (p)
GEX to next-day RV -0.36 (4.6e-60) -0.14 (1.2e-9) -0.03 (0.18)
DEX to next-day return -0.03 (0.19) +0.01 (0.69) +0.02 (0.40)
VEX to next-day ATM IV change -0.16 (2.1e-13) -0.05 (0.02) -0.01 (0.77)
CHEX to next-day return -0.05 (0.03) -0.01 (0.63) -0.00 (0.93)

The raw signals are mostly volatility signals wearing more sophisticated labels. GEX survives the VIX-only control, but at roughly 40% of the raw rank-correlation magnitude. Add ATM IV and the GEX residual drops to ρ = -0.03 with p = 0.18. DEX, VEX, and CHEX do not carry robust independent predictive information in these residualized rank tests.

For GEX, the top-minus-bottom residualized realized-vol difference tells the same story: -10.63 vol points raw, -3.15 after VIX, and -0.99 after VIX + ATM IV (p = 0.25).

The honest result

GEX is not fake. The raw effect is real. The incremental information after VIX and ATM IV is the part that fails. That distinction matters: useful regime descriptor is not the same thing as independent forecasting edge.


GEX vs VIX Double-Sort

The most useful visual is a double-sort: first split days by VIX quintile, then within each VIX bucket split by GEX quintile. Each cell is mean next-day realized vol in annualized percentage points.

VIX quintile
GEX quintile
Q1
most negative
Q2 Q3 Q4 Q5
most positive
V1 lowest VIX 8.02 7.32 6.62 5.22 5.07
V2 11.70 10.30 8.85 6.48 6.04
V3 12.00 12.05 12.15 9.62 8.56
V4 15.87 15.35 17.18 12.31 8.06
V5 highest VIX 20.57 24.91 37.69 21.71 15.91

The table is persuasive, but the precise reading matters. V1 and V2 are close to textbook: more positive GEX means lower next-day RV. V3 and V4 still show much lower RV in the positive-GEX buckets than in the negative or middle buckets, but they are not strictly monotonic. V5, the highest-VIX bucket, is a non-monotonic mess.

That is the practical caveat for traders. GEX's predictive value is mostly a calm-to-moderate market phenomenon. In the top VIX quintile, where an extra risk signal would be most valuable, this double-sort does not show a clean edge.

Historical API · Alpha tier · from $1,199/mo billed annually
Build this double-sort heatmap for your own symbol
Same dealer-signed GEX, VIX, and ATM IV series used for the 5x5 grid above. SPY, QQQ, IWM since 2018-04. Join on date, bin into quintiles, plot - the whole recipe is in the methodology section below.
View pricing →

Exposure Verdicts

GEX: weak survivor

Verdict: useful as a regime descriptor; modestly incremental over VIX alone; not significant after VIX + ATM IV. The raw GEX backtest is real, but it is mostly a vol-regime backtest.

DEX: dead on arrival

DEX quintilenMean net_dex ($B)Mean next-day returnPct positive days
Q1 (most negative)395-67.5+0.03%51.4%
Q2394-0.7+0.07%56.3%
Q3394+29.5+0.09%57.6%
Q4394+52.7+0.01%53.6%
Q5 (most positive)394+90.1+0.03%57.6%

Top-minus-bottom return difference is 0.00%, t = 0.04, p = 0.97. Raw Spearman ρ is -0.03 with p = 0.19. DEX does not predict next-day SPY direction in this EOD test.

VEX: mostly a VIX proxy

VEX quintilenMean net_vex ($B)Mean next-day ATM IV changePct positive IV-change days
Q1 (most negative)395-299.7+0.24 vol pts55.2%
Q2394-173.6+0.24 vol pts50.0%
Q3394-98.8+0.09 vol pts47.0%
Q4394-23.3+0.04 vol pts42.4%
Q5 (most positive)394+80.5-0.60 vol pts35.3%

At the raw level, VEX looks useful: ρ = -0.16, p = 2.1e-13. But VEX correlates with VIX at +0.72 and with ATM IV at +0.76. After controlling for VIX + ATM IV, the residualized VEX test falls to ρ = -0.01, p = 0.77.

CHEX: weak and fragile

CHEX quintilenMean net_chex ($M)Mean next-day returnPct positive days
Q1 (most negative)395-1.04+0.11%53.9%
Q2394+1.12+0.07%57.4%
Q3394+1.98+0.04%57.9%
Q4394+2.89-0.00%52.5%
Q5 (most positive)394+4.90+0.01%54.8%

CHEX has one interesting raw result: sign agreement between CHEX and next-day SPY return is 54.9% on n=1,967, p = 1.5e-5. But the pre-registered residualized rank test collapses under VIX control (ρ = -0.01, p = 0.63) and under VIX + ATM IV (ρ = -0.00, p = 0.93). A separate OLS file does show a significant CHEX coefficient after VIX + AR(1) controls, so the clean wording is not "CHEX can never matter." It is: CHEX is not robust in the rank/quintile framework used for the primary article claim.


Why They Overlap

The reason the controls matter is visible in the correlation matrix. These are Spearman rank correlations across all 1,972 SPY observations.

GEXDEXVEXCHEXVIXATM IV
GEX+1.00+0.73-0.54+0.59-0.49-0.63
DEX+0.73+1.00-0.89+0.68-0.58-0.67
VEX-0.54-0.89+1.00-0.65+0.72+0.76
CHEX+0.59+0.68-0.65+1.00-0.39-0.46
VIX-0.49-0.58+0.72-0.39+1.00+0.91
ATM IV-0.63-0.67+0.76-0.46+0.91+1.00

DEX and VEX correlate at -0.89. VEX and VIX correlate at +0.72. VIX and ATM IV correlate at +0.91. That does not make the exposure Greeks useless, but it does make them dangerous to treat as independent features.

One-and-a-half signals

If someone sells GEX, DEX, VEX, and CHEX as four unrelated predictive signals, this correlation matrix is the rebuttal. They are four views of the same options chain, and much of what they capture is already in volatility level.


High-VIX Regime Test

The regime split is where the GEX backtest becomes most useful for real trading decisions. The high-VIX row is not cherry-picked; it was pre-registered.

RegimenTop-minus-bottom RV difft-statp-valueSpearman ρ
All days1,971-10.63-13.001.0e-33-0.36
Pre-COVID463-11.43-6.784.5e-10-0.41
COVID shock72-59.88-3.880.001-0.50
Post-COVID1,436-9.62-9.588.7e-20-0.32
Low-VIX days1,478-8.09-11.813.9e-28-0.32
High-VIX days493-1.89-0.780.44-0.02

Across all days, GEX looks excellent. Inside high-VIX days, the effect is statistically absent: top-minus-bottom difference -1.89 vol points, t = -0.78, p = 0.44, Spearman ρ = -0.02.

That is the trading lesson. GEX is best at labeling calm regimes. It is not a clean crisis detector in this EOD SPY sample.


Train/Test Stability

The 70/30 chronological split looks impressive at first glance.

SplitWindownTop-minus-bottom diffSpearman ρ
In-sample2018-04 to 2023-111,380-10.77-0.367
Out-of-sample2023-12 to 2026-04591-10.79-0.374

The naive interpretation is that GEX is exceptionally robust. The more cautious interpretation is that GEX is proxying a stable volatility-regime variable. The residualized controls point to the second explanation: VIX and ATM IV absorb the effect.


What Traders Should Do With This

  • Use GEX as a regime label, not a standalone forecast. Positive GEX can describe a calmer market state. It does not, by itself, survive the VIX + ATM IV control as an independent next-day realized-vol signal.
  • Do not count the exposure stack as four independent factors. DEX and VEX are nearly mirror images in this sample, and VEX is tightly linked to volatility level.
  • Demand orthogonal tests. A vendor chart that beats a coin flip is not enough. The right question is whether the signal adds anything after obvious baselines like VIX, ATM IV, prior return, and outcome persistence.
  • Do not confuse statistical correlation with tradable PnL. Even GEX's VIX-only residual effect, ρ = -0.14, needs transaction costs, latency, data availability, and execution rules before it becomes a strategy.

Limitations

  1. EOD panel only. This article tests 16:00 ET snapshots against next-day outcomes. FlashAlpha's Historical API supports minute-level replay, but this specific research panel is EOD-only. The canonical intraday CHEX claim - charm flow into the last hour - needs a separate minute-level study.
  2. SPY only. SPY is the deepest, most-hedged ETF. Single stocks, less liquid ETFs, and index options may behave differently.
  3. Linear residual controls. Residualization uses OLS. A nonlinear model could, in theory, recover interaction effects not measured here.
  4. Correlation, not PnL. The tests measure statistical association, not executable trades after costs.
  5. 0DTE deserves its own article. The post-COVID split includes the 0DTE growth era, but this is not a dedicated intraday 0DTE mechanics test.
  6. Dealer-sign convention matters. Positive means dealers net long that Greek. Providers using the opposite sign convention would invert signs without changing the conclusion.
  7. Stress observations are limited. COVID, 2022, and the 2025 stress period provide meaningful turmoil, but high-VIX inference still rests on fewer days than calm-market inference.

Download the Backtest Data

Here are the raw artifacts used for the article. The goal is simple: do not trust the prose if the CSVs disagree with it.

One-click bundle: download all CSVs, hypotheses, and Python scripts.

Core files

Individual result tables are also available: GEX to RV, DEX to return, VEX to IV change, CHEX to return, regime splits, and train/test split.


Methodology

The analysis combines daily dealer-exposure summaries, daily stock summaries, and daily VRP snapshots from the FlashAlpha historical pipeline. The master dataset aligns features at row t with future outcomes using shifted columns, so the next-day tests do not leak future data into the signal.

The backtest pulled directly from the historical subdomain, one day at a time (the endpoints accept an at= timestamp and return a point-in-time snapshot):

# Dealer-exposure summary (GEX, DEX, VEX, CHEX, gamma flip, walls)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
  "https://historical.flashalpha.com/v1/exposure/summary/SPY?at=2024-06-14T20:00:00Z"

# Stock summary (spot, VIX context, ATM IV, realized vol)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
  "https://historical.flashalpha.com/v1/stock/SPY/summary?at=2024-06-14T20:00:00Z"

# VRP snapshot (implied-vs-realized spread, regime, harvest score)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
  "https://historical.flashalpha.com/v1/vrp/SPY?at=2024-06-14T20:00:00Z"

# Loop over 1,972 trading days, join on (ts, symbol), shift outcomes by -1
# for next-day tests. Residualize signals and outcomes on VIX, then on
# VIX + ATM IV. Re-run the quintile sorts, Spearman tests, and regime splits.

For live or historical replay outside this article, see the FlashAlpha Historical API. Current public docs list SPY historical coverage from 2018-04-16, with more symbols backfilled on request for Alpha customers.


The Takeaway

TL;DR

Gamma exposure works at the raw regime level, but the independent GEX edge is much smaller than the marketing version. Across 1,972 SPY days, raw GEX to next-day realized vol is ρ = -0.36. After VIX + ATM IV controls, it is ρ = -0.03 with p = 0.18. DEX has no next-day return signal, VEX is mostly a VIX/IV proxy, and CHEX is fragile in the residualized rank tests. Use dealer exposure as context. Do not treat it as four clean standalone alpha signals.


Related Articles

Historical API · Alpha tier · from $1,199/mo billed annually
Run the same pre-registered test on your own signal
Pull the same dealer-signed GEX, DEX, VEX, CHEX, stock summaries, and VRP used in this 8-year study. Same leak-free, walk-forward API the FlashAlpha research team used. Coverage: SPY, QQQ, IWM today - more symbols on request.
View pricing →
Data freshness: intraday through the previous trading day's close, refreshed by the daily pipeline. Live coverage at /v1/tickers.

Live Market Pulse

Get tick-by-tick visibility into market shifts with full-chain analytics streaming in real time.

Intelligent Screening

Screen millions of option pairs per second using your custom EV rules, filters, and setups.

Execution-Ready

Instantly send structured orders to Interactive Brokers right from your scan results.

Join the Community

Discord

Engage in real time conversations with us!

Twitter / X

Follow us for real-time updates and insights!

GitHub

Explore our open-source SDK, examples, and analytics resources!