# Exposure Backtest — Pre-Registered Hypotheses

Pre-registered **before** running any statistics. Locks the question so exploratory
p-hacking is off the table. Any deviation must be logged with reason.

## Dataset

- Source: QuestDB `spy-dataset` instance (port 8813) → `SPY_StockSummary` + `SPY_VRP`
- Universe: **SPY only**
- Window: **2018-04-16 → 2026-04-02** (1972 EOD snapshots, ~7.96 years)
- Snapshot time: **16:00 ET daily close** (only granularity available)
- No intraday data present → CHEX last-hour hypothesis is out of scope for v1

## Convention

All exposures are **dealer-signed** (positive = dealers net long that Greek).
Features are known at close t; outcomes measured from t+1 onward. No look-ahead.

## Four primary hypotheses

### H1 — GEX predicts next-day realized volatility
- **Signal:** `net_gex` at close t
- **Outcome:** realized vol from close t to close t+1 (annualized, 1-day close-to-close abs return × √252)
  plus the 5-day forward realized vol as secondary
- **Prediction:** negative GEX days → higher next-day RV than positive GEX days
- **Test:** quintile sort; t-test top vs bottom quintile mean RV; Spearman rank correlation
- **Null:** no difference in RV across quintiles

### H2 — DEX predicts next-day return direction
- **Signal:** `net_dex` at close t
- **Outcome:** next-day log return, close t → close t+1
- **Prediction:** sign and/or magnitude of DEX relates to next-day drift (direction unclear a priori —
  positive DEX could mean dealer buying pressure OR already-hedged long → fade; we let the data speak)
- **Test:** quintile sort on mean return; hit rate for sign(DEX)=sign(next_ret); Spearman
- **Null:** zero mean return per quintile

### H3 — VEX predicts next-day ATM IV change
- **Signal:** `net_vex` at close t
- **Outcome:** `atm_iv[t+1] - atm_iv[t]`
- **Prediction:** more negative VEX (dealers shorter vega) → they buy vol on up-moves / sell on down →
  positive VEX↔IV changes or no relation
- **Test:** quintile sort on mean ΔIV; Spearman
- **Null:** zero mean ΔIV per quintile

### H4 — CHEX predicts next-day return (pinning proxy)
- **Signal:** `net_chex` at close t
- **Outcome:** next-day log return
- **Prediction:** positive CHEX (charm decay pushes dealers to buy overnight) → positive next-day drift
- **Test:** quintile sort on mean return; hit rate; Spearman
- **Null:** zero mean return per quintile
- **Caveat:** true charm test needs minute-level last-hour-of-day return which we don't have;
  the next-day proxy is our EOD substitute

## Secondary / robustness hypotheses

- **H1b:** GEX regime (`exposure_regime` string, "positive_gamma"/"negative_gamma"/"inflection")
  → split RV by regime.
- **Distance to gamma flip** (`underlying_price / gamma_flip - 1`) → mean reversion of next-day return.
- **0DTE concentration** (`zero_dte_pct_of_total`) → relation to next-day RV.

## Baselines each signal must beat

Each primary test is also run against:

1. **Random / coin flip** (50% hit rate baseline)
2. **Prior-day return** (momentum — especially for H2, H4)
3. **VIX level** (vol regime — especially for H1)
4. **Outcome autocorrelation** (for RV/IV change targets which are highly autocorrelated)

Signal "works" only if it adds incremental predictive power beyond these. We do not claim
predictive power if the effect disappears once VIX is controlled for.

## Regime splits

- Pre-COVID: 2018-04-16 → 2020-02-14
- COVID shock: 2020-02-17 → 2020-05-31
- Post-COVID: 2020-06-01 → 2026-04-02
- High-VIX days (VIX > rolling 1y 75th pct) vs low-VIX days

## Train/test

- In-sample: first 70% by date (2018-04-16 → ~2023-11)
- Out-of-sample: last 30% (~2023-12 → 2026-04-02)
- A finding is "robust" if the sign and rough magnitude holds in both.

## What will be reported regardless of outcome

- All four primary tests with full quintile tables, correlations, p-values.
- Negative results published same as positive ones. "X doesn't predict Y" is the article's
  value proposition: rigorous honesty, not cherry-picking.
