What is a GEX backtest?

A GEX backtest tests whether gamma exposure measured at one time predicts future market outcomes such as realized volatility, returns, or VIX changes. This article tests SPY end-of-day gamma exposure from 2018-04-16 to 2026-04-02 against next-day realized volatility, then controls the result for VIX and ATM IV.

Is gamma exposure reliable?

Gamma exposure is reliable as a regime descriptor. Positive GEX days and negative GEX days describe meaningfully different market states on SPY, and that relationship is stable across the 2018-2026 sample including a 70/30 train/test split. It is less reliable as an independent forecast: the raw GEX signal loses most of its strength once VIX and ATM IV are accounted for, and the high-VIX regime double-sort is non-monotonic. Treat GEX as a regime label rather than a standalone alpha factor.

Not exactly. GEX has modest incremental information over VIX alone in this test. But when ATM IV is added alongside VIX, GEX no longer has statistically significant residual predictive power for next-day SPY realized volatility.

Do DEX, VEX, and CHEX predict SPY returns or IV?

DEX does not predict next-day SPY return in the primary test. VEX has a raw relationship with next-day ATM IV change, but it is highly correlated with VIX and ATM IV and disappears after controls. CHEX has weak raw sign agreement with next-day return, but it is not robust in the residualized rank tests.

What does the GEX vs VIX double-sort show?

The double-sort bins days by VIX quintile and GEX quintile. Low-to-moderate VIX rows show lower next-day realized volatility in positive-GEX buckets. The highest-VIX row is non-monotonic, with no clean GEX signal. That suggests GEX is mostly useful as a calm-market regime label, not a crisis edge.

What are the main limitations of this GEX backtest?

The article uses an end-of-day SPY panel only, not an intraday minute-level panel. It tests correlation rather than executable PnL. Controls are linear OLS residual controls. Results may not generalize to single stocks, less liquid ETFs, or intraday 0DTE trading.

GEX Backtest: 8 Years of SPY Show Gamma Exposure Mostly Tracks VIX

Q: Does gamma exposure work?

In this SPY sample, gamma exposure works at the raw regime level: GEX has Spearman rho=-0.36 with next-day realized volatility, p=4.6e-60. After VIX control it falls to rho=-0.14, p=1.2e-9. After VIX plus ATM IV control it falls to rho=-0.03, p=0.18, which is not statistically significant.

A pre-registered GEX backtest on 1,972 SPY days (2018-2026) finds gamma exposure works raw, but GEX/DEX/VEX/CHEX add little after VIX and ATM IV controls.

Tomasz Dobrowolski Quant Engineer

Apr 23, 2026

35 min read

GEX GammaExposure Backtesting SPY VIX DealerPositioning VEX CHEX

If you have searched for GEX backtest, does gamma exposure work, gamma exposure accuracy, or GEX vs VIX, this is the test you probably wanted but rarely get from dealer-positioning marketing.

Most options dashboards present four dealer-exposure Greeks as four separate signals: GEX (gamma exposure), DEX (delta exposure), VEX (vanna exposure), and CHEX (charm exposure). The pitch is intuitive: dealers hedge, those hedges create flows, and those flows should forecast volatility, returns, or IV changes.

The mechanical story is real. The predictive story is much thinner.

We wrote down the hypotheses before running the statistics, then tested them on an 8-year SPY panel. Features are measured at close t; outcomes are shifted forward, so the backtest does not use future information. The key control is simple: if a dealer-exposure metric is useful, it should add information beyond volatility variables a trader can already observe, especially VIX and ATM IV.

1,972

SPY EOD snapshots

-0.36

Raw GEX vs RV rank corr.

-0.03

After VIX + ATM IV

0.18

Full-control p-value

Bottom line

GEX has a real raw relationship with next-day realized volatility, and it still has modest incremental value over VIX alone. But once ATM IV is added to the control set, GEX goes quiet. DEX is dead on arrival. VEX is mostly a VIX/IV proxy. CHEX is weak and not robust in the residualized rank tests. Sold as four independent signals, the exposure stack behaves more like one-and-a-half dimensions.

SPY 2018-2026 · n = 1,972 days

Mean next-day realized vol by GEX quintile

Sort every day by gamma exposure. Cut into 5 equal groups. Look at next day's actual volatility. More positive GEX = calmer next day. The raw pattern is clean.

Q1 - most negative GEX

17.0%

18.6%

12.7%

9.2%

Q5 - most positive GEX

6.3%

The catch

After controlling for VIX and ATM IV, the GEX-only signal drops from rank correlation -0.36 (strong) to -0.03 (statistical noise, p=0.18). Most of what you see above is vol-regime, not unique GEX information. Details and regime breakdowns below.

flashalpha.com · 8 years SPY backtest

What We Tested

The dataset is one row per SPY trading day. Each row carries dealer-signed GEX, DEX, VEX, CHEX, gamma flip, VIX, ATM IV, realized volatility, and pre-computed forward outcomes. Positive exposure means dealers are net long that Greek under this convention.

Item	Backtest setting
Universe	SPY only
Window	2018-04-16 to 2026-04-02
Sample	1,972 EOD snapshots; 1,971 usable next-day outcome rows
Primary signals	GEX, DEX, VEX, CHEX
Primary outcomes	Next-day realized vol, next-day return, next-day ATM IV change
Controls	VIX, then VIX + ATM IV
Primary tests	Quintile sorts, top-minus-bottom t-tests, Spearman rank correlation

The central question is not whether dealer exposures are mechanically meaningful. They are. The question is narrower and more useful: do the exposure Greeks add predictive information after you already know VIX and ATM IV?

Naive GEX Looks Excellent

Start with the classic GEX claim: positive dealer gamma suppresses realized volatility; negative dealer gamma amplifies it. We sorted every day into quintiles by same-session net_gex, then measured next-day realized vol as |log return| * sqrt(252).

What "quintile" means here. Take all 1,972 trading days, rank them by GEX from most negative to most positive, and cut into five equal groups of ~394 days each. Q1 = the 20% of days with the most negative GEX (dealers most short gamma). Q5 = the 20% of days with the most positive GEX (dealers most long gamma). Q2, Q3, Q4 are the groups in between. Same idea for VIX quintiles later: V1 = calmest 20% of days, V5 = most stressed 20%.

GEX quintile	n	Mean net_gex ($B)	Mean next-day RV (%)	Median next-day RV (%)
Q1 - most negative GEX	395	-7.91	16.97	13.54
Q2 - moderately negative	394	-2.79	18.57	12.91
Q3 - roughly neutral	394	+0.25	12.71	10.29
Q4 - moderately positive	394	+2.97	9.24	6.45
Q5 - most positive GEX	394	+6.50	6.34	4.86

The headline number is hard to ignore: Q5 minus Q1 is -10.63 vol points, t = -13.00, p = 1.0e-33. Spearman ρ is -0.36, p = 4.6e-60 on n=1,971.

What "Spearman ρ" means. Rank correlation from -1 to +1. +1 = when X goes up, Y always goes up. -1 = when X goes up, Y always goes down. 0 = no relationship. The -0.36 here means: rank the 1,972 days by GEX from most negative to most positive, and they tend (with noise) to rank in the opposite order on next-day realized vol. Not a perfect rule, but a moderate and statistically real pattern. Later in the article, after controlling for VIX and ATM IV, that number drops to -0.03 - indistinguishable from random.

This is the chart that can sell a subscription. It is also not enough. A strong raw GEX backtest can still be a volatility-regime proxy. Negative-GEX days and high-VIX days often describe the same market state.

Fact check: the Q1/Q2 mean inversion is real. It comes from outlier-heavy COVID-era days; the median row is much cleaner. This is why the rank correlation matters more than a strict mean monotonicity claim.

The Control That Changes the Story

For the residualized tests, we regress both the signal and the outcome on the control set, then compute Spearman correlation between the residuals. First we control for VIX alone. Then we control for VIX + ATM IV. Bonferroni-adjusted significance for the four primary tests is p < 0.0125.

Signal to outcome	Raw Spearman (p)	After VIX control (p)	After VIX + ATM IV control (p)
GEX to next-day RV	-0.36 (4.6e-60)	-0.14 (1.2e-9)	-0.03 (0.18)
DEX to next-day return	-0.03 (0.19)	+0.01 (0.69)	+0.02 (0.40)
VEX to next-day ATM IV change	-0.16 (2.1e-13)	-0.05 (0.02)	-0.01 (0.77)
CHEX to next-day return	-0.05 (0.03)	-0.01 (0.63)	-0.00 (0.93)

The raw signals are mostly volatility signals wearing more sophisticated labels. GEX survives the VIX-only control, but at roughly 40% of the raw rank-correlation magnitude. Add ATM IV and the GEX residual drops to ρ = -0.03 with p = 0.18. DEX, VEX, and CHEX do not carry robust independent predictive information in these residualized rank tests.

For GEX, the top-minus-bottom residualized realized-vol difference tells the same story: -10.63 vol points raw, -3.15 after VIX, and -0.99 after VIX + ATM IV (p = 0.25).

The honest result

GEX is not fake. The raw effect is real. The incremental information after VIX and ATM IV is the part that fails. That distinction matters: useful regime descriptor is not the same thing as independent forecasting edge.

GEX vs VIX Double-Sort

The most useful visual is a double-sort: first split days by VIX quintile, then within each VIX bucket split by GEX quintile. Each cell is mean next-day realized vol in annualized percentage points.

VIX quintile GEX quintile	Q1 most negative	Q2	Q3	Q4	Q5 most positive
V1 lowest VIX	8.02	7.32	6.62	5.22	5.07
V2	11.70	10.30	8.85	6.48	6.04
V3	12.00	12.05	12.15	9.62	8.56
V4	15.87	15.35	17.18	12.31	8.06
V5 highest VIX	20.57	24.91	37.69	21.71	15.91

The table is persuasive, but the precise reading matters. V1 and V2 are close to textbook: more positive GEX means lower next-day RV. V3 and V4 still show much lower RV in the positive-GEX buckets than in the negative or middle buckets, but they are not strictly monotonic. V5, the highest-VIX bucket, is a non-monotonic mess.

That is the practical caveat for traders. GEX's predictive value is mostly a calm-to-moderate market phenomenon. In the top VIX quintile, where an extra risk signal would be most valuable, this double-sort does not show a clean edge.

Historical API · Alpha tier · from $1,199/mo billed annually

Build this double-sort heatmap for your own symbol

Same dealer-signed GEX, VIX, and ATM IV series used for the 5x5 grid above. SPY, QQQ, IWM since 2018-04. Join on date, bin into quintiles, plot - the whole recipe is in the methodology section below.

View pricing →

Exposure Verdicts

GEX: weak survivor

Verdict: useful as a regime descriptor; modestly incremental over VIX alone; not significant after VIX + ATM IV. The raw GEX backtest is real, but it is mostly a vol-regime backtest.

DEX: dead on arrival

DEX quintile	n	Mean net_dex ($B)	Mean next-day return	Pct positive days
Q1 (most negative)	395	-67.5	+0.03%	51.4%
Q2	394	-0.7	+0.07%	56.3%
Q3	394	+29.5	+0.09%	57.6%
Q4	394	+52.7	+0.01%	53.6%
Q5 (most positive)	394	+90.1	+0.03%	57.6%

Top-minus-bottom return difference is 0.00%, t = 0.04, p = 0.97. Raw Spearman ρ is -0.03 with p = 0.19. DEX does not predict next-day SPY direction in this EOD test.

VEX: mostly a VIX proxy

VEX quintile	n	Mean net_vex ($B)	Mean next-day ATM IV change	Pct positive IV-change days
Q1 (most negative)	395	-299.7	+0.24 vol pts	55.2%
Q2	394	-173.6	+0.24 vol pts	50.0%
Q3	394	-98.8	+0.09 vol pts	47.0%
Q4	394	-23.3	+0.04 vol pts	42.4%
Q5 (most positive)	394	+80.5	-0.60 vol pts	35.3%

At the raw level, VEX looks useful: ρ = -0.16, p = 2.1e-13. But VEX correlates with VIX at +0.72 and with ATM IV at +0.76. After controlling for VIX + ATM IV, the residualized VEX test falls to ρ = -0.01, p = 0.77.

CHEX: weak and fragile

CHEX quintile	n	Mean net_chex ($M)	Mean next-day return	Pct positive days
Q1 (most negative)	395	-1.04	+0.11%	53.9%
Q2	394	+1.12	+0.07%	57.4%
Q3	394	+1.98	+0.04%	57.9%
Q4	394	+2.89	-0.00%	52.5%
Q5 (most positive)	394	+4.90	+0.01%	54.8%

CHEX has one interesting raw result: sign agreement between CHEX and next-day SPY return is 54.9% on n=1,967, p = 1.5e-5. But the pre-registered residualized rank test collapses under VIX control (ρ = -0.01, p = 0.63) and under VIX + ATM IV (ρ = -0.00, p = 0.93). A separate OLS file does show a significant CHEX coefficient after VIX + AR(1) controls, so the clean wording is not "CHEX can never matter." It is: CHEX is not robust in the rank/quintile framework used for the primary article claim.

Why They Overlap

The reason the controls matter is visible in the correlation matrix. These are Spearman rank correlations across all 1,972 SPY observations.

	GEX	DEX	VEX	CHEX	VIX	ATM IV
GEX	+1.00	+0.73	-0.54	+0.59	-0.49	-0.63
DEX	+0.73	+1.00	-0.89	+0.68	-0.58	-0.67
VEX	-0.54	-0.89	+1.00	-0.65	+0.72	+0.76
CHEX	+0.59	+0.68	-0.65	+1.00	-0.39	-0.46
VIX	-0.49	-0.58	+0.72	-0.39	+1.00	+0.91
ATM IV	-0.63	-0.67	+0.76	-0.46	+0.91	+1.00

DEX and VEX correlate at -0.89. VEX and VIX correlate at +0.72. VIX and ATM IV correlate at +0.91. That does not make the exposure Greeks useless, but it does make them dangerous to treat as independent features.

One-and-a-half signals

If someone sells GEX, DEX, VEX, and CHEX as four unrelated predictive signals, this correlation matrix is the rebuttal. They are four views of the same options chain, and much of what they capture is already in volatility level.

High-VIX Regime Test

The regime split is where the GEX backtest becomes most useful for real trading decisions. The high-VIX row is not cherry-picked; it was pre-registered.

Regime	n	Top-minus-bottom RV diff	t-stat	p-value	Spearman ρ
All days	1,971	-10.63	-13.00	1.0e-33	-0.36
Pre-COVID	463	-11.43	-6.78	4.5e-10	-0.41
COVID shock	72	-59.88	-3.88	0.001	-0.50
Post-COVID	1,436	-9.62	-9.58	8.7e-20	-0.32
Low-VIX days	1,478	-8.09	-11.81	3.9e-28	-0.32
High-VIX days	493	-1.89	-0.78	0.44	-0.02

Across all days, GEX looks excellent. Inside high-VIX days, the effect is statistically absent: top-minus-bottom difference -1.89 vol points, t = -0.78, p = 0.44, Spearman ρ = -0.02.

That is the trading lesson. GEX is best at labeling calm regimes. It is not a clean crisis detector in this EOD SPY sample.

Train/Test Stability

The 70/30 chronological split looks impressive at first glance.

Split	Window	n	Top-minus-bottom diff	Spearman ρ
In-sample	2018-04 to 2023-11	1,380	-10.77	-0.367
Out-of-sample	2023-12 to 2026-04	591	-10.79	-0.374

The naive interpretation is that GEX is exceptionally robust. The more cautious interpretation is that GEX is proxying a stable volatility-regime variable. The residualized controls point to the second explanation: VIX and ATM IV absorb the effect.

What Traders Should Do With This

Use GEX as a regime label, not a standalone forecast. Positive GEX can describe a calmer market state. It does not, by itself, survive the VIX + ATM IV control as an independent next-day realized-vol signal.
Do not count the exposure stack as four independent factors. DEX and VEX are nearly mirror images in this sample, and VEX is tightly linked to volatility level.
Demand orthogonal tests. A vendor chart that beats a coin flip is not enough. The right question is whether the signal adds anything after obvious baselines like VIX, ATM IV, prior return, and outcome persistence.
Do not confuse statistical correlation with tradable PnL. Even GEX's VIX-only residual effect, ρ = -0.14, needs transaction costs, latency, data availability, and execution rules before it becomes a strategy.

Limitations

EOD panel only. This article tests 16:00 ET snapshots against next-day outcomes. FlashAlpha's Historical API supports minute-level replay, but this specific research panel is EOD-only. The canonical intraday CHEX claim - charm flow into the last hour - needs a separate minute-level study.
SPY only. SPY is the deepest, most-hedged ETF. Single stocks, less liquid ETFs, and index options may behave differently.
Linear residual controls. Residualization uses OLS. A nonlinear model could, in theory, recover interaction effects not measured here.
Correlation, not PnL. The tests measure statistical association, not executable trades after costs.
0DTE deserves its own article. The post-COVID split includes the 0DTE growth era, but this is not a dedicated intraday 0DTE mechanics test.
Dealer-sign convention matters. Positive means dealers net long that Greek. Providers using the opposite sign convention would invert signs without changing the conclusion.
Stress observations are limited. COVID, 2022, and the 2025 stress period provide meaningful turmoil, but high-VIX inference still rests on fewer days than calm-market inference.

Download the Backtest Data

Here are the raw artifacts used for the article. The goal is simple: do not trust the prose if the CSVs disagree with it.

One-click bundle: download all CSVs, hypotheses, and Python scripts.

Core files

master_dataset.csv - full 1,972-row panel
hypotheses.md - pre-registered hypotheses
summary_headline.csv - one row per headline test
orthogonal_results.csv - VIX and VIX + IV controls
controls_double_sort.csv - GEX x VIX heatmap data

Re-run scripts

Individual result tables are also available: GEX to RV, DEX to return, VEX to IV change, CHEX to return, regime splits, and train/test split.

Methodology

The analysis combines daily dealer-exposure summaries, daily stock summaries, and daily VRP snapshots from the FlashAlpha historical pipeline. The master dataset aligns features at row t with future outcomes using shifted columns, so the next-day tests do not leak future data into the signal.

The backtest pulled directly from the historical subdomain, one day at a time (the endpoints accept an at= timestamp and return a point-in-time snapshot):

# Dealer-exposure summary (GEX, DEX, VEX, CHEX, gamma flip, walls)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
  "https://historical.flashalpha.com/v1/exposure/summary/SPY?at=2024-06-14T20:00:00Z"

# Stock summary (spot, VIX context, ATM IV, realized vol)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
  "https://historical.flashalpha.com/v1/stock/SPY/summary?at=2024-06-14T20:00:00Z"

# VRP snapshot (implied-vs-realized spread, regime, harvest score)
curl -H "X-Api-Key: $FLASHALPHA_API_KEY" \
  "https://historical.flashalpha.com/v1/vrp/SPY?at=2024-06-14T20:00:00Z"

# Loop over 1,972 trading days, join on (ts, symbol), shift outcomes by -1
# for next-day tests. Residualize signals and outcomes on VIX, then on
# VIX + ATM IV. Re-run the quintile sorts, Spearman tests, and regime splits.

For live or historical replay outside this article, see the FlashAlpha Historical API. Current public docs list SPY historical coverage from 2018-04-16, with more symbols backfilled on request for Alpha customers.

The Takeaway

TL;DR

Gamma exposure works at the raw regime level, but the independent GEX edge is much smaller than the marketing version. Across 1,972 SPY days, raw GEX to next-day realized vol is ρ = -0.36. After VIX + ATM IV controls, it is ρ = -0.03 with p = 0.18. DEX has no next-day return signal, VEX is mostly a VIX/IV proxy, and CHEX is fragile in the residualized rank tests. Use dealer exposure as context. Do not treat it as four clean standalone alpha signals.

Historical API · Alpha tier · from $1,199/mo billed annually

Run the same pre-registered test on your own signal

Pull the same dealer-signed GEX, DEX, VEX, CHEX, stock summaries, and VRP used in this 8-year study. Same leak-free, walk-forward API the FlashAlpha research team used. Coverage: SPY, QQQ, IWM today - more symbols on request.

View pricing →

Data freshness: intraday through the previous trading day's close, refreshed by the daily pipeline. Live coverage at /v1/tickers.

Upgrade to Alpha → API Spec

#Options #Backtesting #VRP #PutSpreads #SPY #Quant #HistoricalData

GEX Backtest: 8 Years of SPY Show Gamma Exposure Mostly Tracks VIX

What We Tested

Naive GEX Looks Excellent

The Control That Changes the Story

GEX vs VIX Double-Sort

Exposure Verdicts

GEX: weak survivor

DEX: dead on arrival

VEX: mostly a VIX proxy

CHEX: weak and fragile

Why They Overlap

High-VIX Regime Test

Train/Test Stability

What Traders Should Do With This

Limitations

Download the Backtest Data

Methodology

The Takeaway

Related Articles

The SPY Put Spread Matrix - 18 Million Spreads, 8 Years, Theoretical vs Realized

SpotGamma vs Unusual Whales — Honest Comparison (2026)

GEX Calculator & Dashboard — Free Gamma Exposure Tools (2026)

Live Market Pulse

Intelligent Screening

Execution-Ready

Join the Community

Discord

Twitter / X

GitHub

Welcome to FlashAlpha!

What We Tested

Naive GEX Looks Excellent

The Control That Changes the Story

GEX vs VIX Double-Sort

Exposure Verdicts

GEX: weak survivor

DEX: dead on arrival

VEX: mostly a VIX proxy

CHEX: weak and fragile

Why They Overlap

High-VIX Regime Test

Train/Test Stability

What Traders Should Do With This

Limitations

Download the Backtest Data

Methodology

The Takeaway

Related Articles

The SPY Put Spread Matrix - 18 Million Spreads, 8 Years, Theoretical vs Realized

SpotGamma vs Unusual Whales — Honest Comparison (2026)

GEX Calculator & Dashboard — Free Gamma Exposure Tools (2026)

Live Market Pulse

Intelligent Screening

Execution-Ready

Join the Community

Discord

Twitter / X

GitHub