QUANT LANDING
For quant & systematic desks
Historical replay (GEX/DEX/VEX/CHEX, VRP, chains, SVI surfaces, point-in-time greeks) — response shapes match live (a few need a thin adapter), so research and production share one codepath. The screener is live-only.
Quant use-case page →
1. Summary
Backtesting options strategies is structurally harder than backtesting equity momentum or fixed-income carry. The payoff is path-dependent. The instrument set re-generates every week. Fills are wide and asymmetric. And the data problem—point-in-time chains, computed dealer exposure, arbitrage-free surfaces—is usually bigger than the signal problem.
This guide is written for a quant or systematic developer who wants to:
- Research a dealer-positioning, VRP, or dispersion signal on real historical data.
- Build a candidate set across the options universe, not just one ticker.
- Run a fill model that reflects what you would have actually paid.
- Avoid the ten systematic mistakes that inflate backtested Sharpe before live trading erases the edge.
- Connect the same code to a live production signal without rewriting the data layer.
The historical endpoint set described below lives on a separate host, historical.flashalpha.com, and every analytic takes an ?at= timestamp (ET wall-clock, no trailing Z): GEX/DEX/VEX/CHEX exposure replay (/v1/exposure/*?at=), full option chains with greeks and IV at minute resolution (/v1/optionquote/{t}?at=), IV surfaces and SVI parameters (/v1/adv_volatility/{t}?at=). Everything replays since 2018, and the response shape matches the live endpoints, so one parser works on both. Two things do not follow that pattern, and getting them wrong is the most common mistake in this stack:
- VRP exposes its own history and point-in-time snapshot on the live host:
GET /v1/vrp/{t}/history?days= for the daily time series, and GET /v1/vrp/{t}?date=YYYY-MM-DD for the persisted snapshot of a single past date.
- The screener (
POST /v1/screener) runs on the live universe only. There is no historical screener and no date/at parameter on it — a point-in-time candidate set has to be reconstructed by the researcher (see Kink 1).
2. What Systematic Options Research Actually Needs
Three requirements separate a backtest you can trust from a backtest that lies to you.
2.1 Point-in-Time Data
A request for historical.flashalpha.com/v1/exposure/gex/SPY?at=2020-03-16T14:30:00 must return the state of the book exactly as it existed at that minute (ET wall-clock). Open interest, greeks, and computed gamma exposure as of 14:30:00—not end-of-day OI stamped back, not a forward fill from the prior session.
This matters for two concrete reasons. First, dealer-positioning signals use OI to compute GEX; end-of-day OI isn't known at 14:30. Second, option greeks depend on implied vol, which moves continuously; a 15:30 vol observation is not the same as a 14:30 vol observation on a volatile day. Any historical dataset that stamps a single daily value and calls it "intraday" is baking lookahead bias into every signal that uses it.
Lookahead bias in gamma exposure
$$\text{GEX}(t) = \sum_{k} \text{OI}_k(t) \cdot \Gamma_k(S_t, \sigma_k(t), T_k - t)$$
Each term depends on spot \(S_t\), the per-strike IV \(\sigma_k(t)\), and time to expiry at exactly time \(t\). Substituting end-of-day values for any of these introduces forward-looking information.
2.2 No Lookahead Bias
Lookahead is not only about timestamps. It also lives in:
- OI revision. Exchanges publish preliminary OI and revise it the next morning. A historical dataset assembled from the revised numbers makes yesterday's GEX look different from what any live system would have computed.
- Methodology drift. If the analytics vendor updates the GEX formula, historical replays under the new formula produce different numbers than a live system running the old formula would have produced. Archive raw responses if you need bit-exact reproducibility.
- Surface parameterization. SVI calibration parameters (a, b, ρ, m, σ) are an end-of-day fit in this dataset. An intraday request returns the most recent prior EOD SVI fit. If your signal uses the SVI latent at minute resolution, you have one observation per trading day, not per minute. Use the minute-level surface grid for intraday surface features; use SVI params for daily cross-sectional work.
2.3 Same Schema Research → Production
The single most underrated property of a historical API: research and production should call the same function. If your backtest calls hx.exposure_summary("SPY", at="2022-06-15T14:30:00") and your live signal calls fa.exposure_summary("SPY") and the response shapes differ, you will discover the discrepancy the first time the live system hits a field the backtest parser never saw. Every undocumented difference between historical and live response shapes is a production incident waiting to happen.
The FlashAlpha API uses largely the same JSON schema across both modes (with a few documented exceptions, below), and switching is a host swap: point the same flashalpha SDK at historical.flashalpha.com and pass an at= timestamp (live calls use the default lab.flashalpha.com host and omit at). A few historical responses diverge from their live counterparts and need a thin adapter: the option-quote endpoint returns a flat array with renamed fields (implied_vol instead of iv, open_interest instead of oi) plus historical-only fields (iv_bid, iv_ask, vanna, charm, rho); the max-pain and stock-summary responses also have minor shape differences from live. Test your parser against both modes before you backtest at scale.
3. The Data: What the Historical API Covers
3.1 Dealer Exposure Replay (GEX/DEX/VEX/CHEX)
Gamma exposure (GEX), delta exposure (DEX), vanna exposure (VEX), and charm exposure (CHEX) replay at minute resolution from 2018. All four are available via their own endpoints and via the unified exposure summary, which returns the full dealer-positioning state in one call.
# Historical exposure summary - the one-call option (historical host, ?at=)
curl -H "X-Api-Key: YOUR_KEY" \
"https://historical.flashalpha.com/v1/exposure/summary/SPY?at=2020-03-16T14:30:00"
# Or per-metric:
curl -H "X-Api-Key: YOUR_KEY" \
"https://historical.flashalpha.com/v1/exposure/gex/SPY?at=2020-03-16T14:30:00"
from flashalpha import FlashAlpha
# Same SDK, pointed at the historical host
hx = FlashAlpha(api_key="YOUR_KEY", base_url="https://historical.flashalpha.com")
# Exposure summary: GEX, DEX, VEX, CHEX, gamma flip, regime, walls
snap = hx.exposure_summary("SPY", at="2020-03-16T14:30:00")
print(f"Regime: {snap['regime']['label']}")
print(f"Net GEX: {snap['net_gex']:,.0f}")
print(f"Gamma flip: ${snap['regime']['gamma_flip']}")
print(f"Call wall: ${snap['levels']['call_wall']}")
print(f"Put wall: ${snap['levels']['put_wall']}")
The response includes as_of, so you can confirm the timestamp is what you asked for. On March 16 2020 the net GEX was deeply negative; the COVID-crash dealer-positioning replay is the canonical stress-test for any strategy that conditions on GEX regime. For that event in detail, see SPY March 16 2020: a COVID-crash dealer-positioning replay and the 8-year backtest in GEX/DEX/VEX/CHEX: an 8-year backtest vs VIX.
3.2 VRP Time Series and Percentiles
The volatility risk premium (VRP) measures implied minus realized vol. The history endpoint returns a daily time series of VRP, z-score, and percentile rank against the trailing window, all point-in-time: the percentile for day T is computed using only data through day T, with no future observations mixed in.
# VRP daily history series (live host; days = 1..365)
curl -H "X-Api-Key: YOUR_KEY" \
"https://lab.flashalpha.com/v1/vrp/SPY/history?days=252"
# Point-in-time VRP snapshot for a specific past date (live host, ?date=YYYY-MM-DD)
curl -H "X-Api-Key: YOUR_KEY" \
"https://lab.flashalpha.com/v1/vrp/SPY?date=2022-10-14"
The vrp.percentile field in the point-in-time response is the fraction of days in the trailing window where VRP was below the current reading, computed with only prior observations. This is the operative number for a premium-selling trigger: if it is 85 or above, implied vol has been richer than this 85% of the time in the lookback window, conditioning only on past data. For the methodology and bias analysis see Historical VRP percentiles, no lookahead bias.
3.3 Full Option Chains at Minute Resolution
The full chain endpoint returns every listed contract—bid, ask, IV, delta, gamma, theta, vega, vanna, charm, rho, OI—for a symbol at any minute since 2018. This is the raw material for signal research where you need to construct your own greeks, custom GEX variants, or custom surface metrics.
# Historical full chain = /v1/optionquote on the historical host, ?at=
curl -H "X-Api-Key: YOUR_KEY" \
"https://historical.flashalpha.com/v1/optionquote/SPY?at=2020-03-16T14:30:00"
Honest resolution table for the chain endpoint historically:
| Field | Resolution in history |
| Bid, ask, IV, delta, gamma, theta, vega | Minute-level (9:30–16:00 ET) |
| Vanna, charm, rho (historical-only fields) | Minute-level |
| Open interest | EOD-stamped (one value per trading day) |
| Volume | Always 0 in historical; use OI for liquidity proxy |
SVI-smoothed vol (svi_vol) | Always null (svi_vol_gated: "backtest_mode") |
If your GEX calculation uses OI and you replay at minute resolution, you are using that day's opening OI for every minute of the session. That is what any live system would have done—OI is published once per morning. So the EOD stamp is correct for intraday GEX replay, not a limitation. For deeper analysis see Historical options chain: any strike, any minute since 2018.
3.4 IV Surface and SVI Parameters
The 50x50 implied vol surface grid (moneyness × maturity) evolves at minute resolution historically, driven by per-contract quotes. The SVI calibration parameters are EOD-stamped. Use the surface grid for intraday surface features; use SVI params for daily cross-sectional analysis, dispersion, and relative-value research.
surface = hx.surface("SPY", at="2022-06-15T14:30:00")
adv = hx.adv_volatility("SPY", at="2022-06-15T14:30:00")
# surface: minute-level 50x50 grid
# adv: EOD-stamped SVI params {a, b, rho, m, sigma} +
# arbitrage-free flags + variance-swap strike
3.5 The Universe Screener: Building a Candidate Set
Every cross-sectional strategy starts with a candidate set. The screener (POST /v1/screener) ranks and filters across the full symbol universe on GEX, VRP, IV rank, 0DTE contribution, and custom score formulas. Important: the screener runs on the live universe only. There is no historical screener and no date/at parameter — it always reflects the current state of the market. Use it live to design and prototype a filter, and to drive a live production signal; for a backtest candidate set, you reconstruct the cross-section yourself from a point-in-time universe list and the per-name historical analytics (see the backtest loop and Kink 1 below).
curl -X POST "https://lab.flashalpha.com/v1/screener" \
-H "X-Api-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"filters": [
{"field": "vrp_percentile", "op": "gte", "value": 75},
{"field": "iv_rank", "op": "gte", "value": 50},
{"field": "regime", "op": "eq", "value": "positive_gamma"}
],
"sort": {"field": "vrp_zscore", "direction": "desc"},
"limit": 30
}'
Because the screener has no historical mode, a methodologically-correct backtest cannot call it on a past rebalance date. Instead, take a point-in-time universe list (index membership as of that date — see Kink 1), pull the per-name historical analytics with ?at= for each member, and apply your filter/sort in your own code. That reproduces what the screener does, but on point-in-time data and without survivorship bias. Reserve the live screener for prototyping the filter and for the production signal.
4. Building a Backtest: The Full Stack
The canonical loop for a systematic options backtest on this data:
Backtest pipeline
$$\text{Signal}(t) \;\xrightarrow{\text{screener}}\; \text{Candidates}(t) \;\xrightarrow{\text{history}}\; \text{Per-name data}(t) \;\xrightarrow{\text{fill model}}\; \text{Fills}(t) \;\xrightarrow{\text{accum}}\; \text{Metrics}$$
4.1 Signal Design
A signal is a function from the point-in-time state of one or more endpoints to a trade decision (direction, strike, expiry, structure, size). Concrete examples:
- Dealer-positioning (GEX): Enter a momentum position when net GEX is negative and the 5-day average delta of net GEX is declining.
- VRP premium-selling: Sell a put credit spread when VRP percentile ≥ 80 and IV rank ≥ 50, exit when VRP z-score reverts below its median.
- Dispersion: Long single-name straddle, short index straddle, when the ratio of single-name IV to index IV is below its 6-month median.
Signals must be computable from endpoint fields that are available point-in-time. If the signal uses GEX, that's fine—GEX is available at minute resolution. If it uses SVI parameters in a time series, that's EOD; your signal updates once per day, not per minute. If it uses OI trajectory, OI is EOD and the "trajectory" is only one observation per day.
4.2 Candidate Set: Live Screener vs Backtest Reconstruction
The live screener is for prototyping the filter and for the production signal — it has no date parameter (it always screens the current universe):
import requests
LAB = "https://lab.flashalpha.com"
HEADERS = {"X-Api-Key": "YOUR_KEY", "Content-Type": "application/json"}
def live_screener(vrp_pctile: int = 75, limit: int = 30):
"""Live universe only - no historical/date mode."""
body = {
"filters": {
"op": "and",
"conditions": [
{"field": "vrp_percentile", "operator": "gte", "value": vrp_pctile},
{"field": "iv_rank", "operator": "gte", "value": 40},
{"field": "regime", "operator": "in",
"value": ["positive_gamma", "neutral"]}
]
},
"sort": [{"field": "vrp_zscore", "direction": "desc"}],
"limit": limit
}
r = requests.post(f"{LAB}/v1/screener", headers=HEADERS, json=body)
r.raise_for_status()
return [row["symbol"] for row in r.json()["data"]]
For a backtest candidate set you reconstruct the cross-section yourself: take a point-in-time universe list, pull each member's historical analytics with ?at=, and apply the same filter/sort in code. No survivorship bias, no live-screener lookahead.
from flashalpha import FlashAlpha
hx = FlashAlpha(api_key="YOUR_KEY", base_url="https://historical.flashalpha.com")
def backtest_candidates(universe: list[str], ts: str,
vrp_pctile: int = 75, limit: int = 30):
"""Reconstruct the screen point-in-time from per-name history."""
rows = []
for sym in universe: # universe = index members as of `ts`
try:
vrp = hx.vrp(sym, at=ts)
vol = hx.volatility(sym, at=ts)
except Exception:
continue # not covered / no data at ts
if (vrp["vrp"]["percentile"] >= vrp_pctile
and vol["iv_rv_spreads"]["vrp_30d"] > 0):
rows.append((sym, vrp["vrp"]["z_score"]))
rows.sort(key=lambda r: r[1], reverse=True)
return [sym for sym, _ in rows[:limit]]
# Example: candidates as of 2022-10-03, 15:30 ET
candidates = backtest_candidates(universe_on("2022-10-03"),
"2022-10-03T15:30:00")
4.3 Per-Name History Retrieval
from flashalpha import FlashAlpha
hx = FlashAlpha(api_key="YOUR_KEY", base_url="https://historical.flashalpha.com")
def fetch_signal_state(symbol: str, ts: str) -> dict:
"""Fetch point-in-time VRP + exposure for a candidate."""
vrp = hx.vrp(symbol, at=ts)
summ = hx.exposure_summary(symbol, at=ts)
vol = hx.volatility(symbol, at=ts)
return {
"symbol": symbol,
"as_of": ts,
"vrp_pctile": vrp["vrp"]["percentile"],
"vrp_zscore": vrp["vrp"]["z_score"],
"vrp_regime": vrp["regime"]["vrp_regime"],
"gex_regime": summ["regime"]["label"],
"gamma_flip": summ["regime"]["gamma_flip"],
"net_gex": summ["net_gex"],
"vrp_30d": vol["iv_rv_spreads"]["vrp_30d"],
"atm_iv": vol["atm_iv"],
}
4.4 The Fill Model Is the Edge
This is where most backtests lie. The naive fill model uses the midpoint price of the option at decision time. That model is wrong in every direction that matters for premium sellers and spread traders:
- You sell at the bid, not the mid. For a short put spread with a mid of $1.20 and a $0.10 wide bid-ask, your actual fill is closer to $1.10 than $1.20. At 250 trades per year, that $0.10 slip compounds.
- The bid-ask widens before earnings and into events. Exactly when VRP looks richest, spreads are widest. A screener that fires on high VRP percentile during earnings week will systematically show fills that aren't achievable.
- Volume is zero in historical chains. The historical API does not carry intraday volume data. Use OI as a liquidity proxy and apply a conservative fill model for low-OI strikes.
A tractable fill model for backtesting premium-selling strategies:
Conservative fill model
$$\text{Fill}_{\text{sell}} = \text{Mid}(t) - \alpha \cdot \frac{\text{Ask}(t) - \text{Bid}(t)}{2}$$
$$\text{Fill}_{\text{buy}} = \text{Mid}(t) + \alpha \cdot \frac{\text{Ask}(t) - \text{Bid}(t)}{2}$$
Where \(\alpha \in [0.3, 0.7]\) is a market-impact parameter. Use \(\alpha = 0.5\) (mid-spread assumption) as a baseline; test sensitivity to \(\alpha\) before claiming edge. Apply an OI-based liquidity discount for strikes with OI below a threshold.
def fill_price(bid: float, ask: float, side: str,
alpha: float = 0.5) -> float:
"""
Realistic fill: alpha fraction of the half-spread from mid.
side: 'sell' (premium seller) or 'buy' (premium buyer)
"""
mid = (bid + ask) / 2
half_spread = (ask - bid) / 2
if side == "sell":
return mid - alpha * half_spread
else:
return mid + alpha * half_spread
def fill_spread(leg1: dict, leg1_side: str,
leg2: dict, leg2_side: str,
alpha: float = 0.5) -> float:
"""Net credit or debit for a two-leg options spread."""
f1 = fill_price(leg1["bid"], leg1["ask"], leg1_side, alpha)
f2 = fill_price(leg2["bid"], leg2["ask"], leg2_side, alpha)
return f1 - f2
4.5 Metrics
Standard risk-adjusted metrics for options strategies require additional care. Sharpe assumes normal returns; options premium-selling has negative skewness and excess kurtosis. Report Sharpe as a directional indicator but lean on Sortino, Calmar, and maximum drawdown for sizing decisions. For strategies with known tail events (selling puts into COVID), the conditional drawdown is the number that matters.
- Annualized Sharpe: directional, not sufficient alone.
- Sortino ratio: penalizes only downside deviation—correct for skewed strategies.
- Maximum drawdown and Calmar ratio: size positions so a maximum-drawdown event does not exceed a pre-defined risk budget.
- Win rate and payoff ratio: required for premium-selling strategies; a 70% win rate with a 3:1 loss-to-win payoff is a losing strategy.
- P&L per unit of delta/gamma/vega: useful for isolating the greek-specific source of returns.
5. Example Strategies
5.1 Dealer-Positioning Momentum (Negative-Gamma Momentum)
The thesis: when dealers are net short gamma (net GEX negative), they must buy into rallies and sell into drops to stay delta-neutral. Their hedging activity amplifies directional moves. A momentum signal conditioned on negative-gamma regime should outperform an unconditional momentum signal because the dealer hedging creates a mechanical tailwind.
Signal construction:
- At 9:45 ET, pull
/v1/exposure/summary/{t} to determine the gamma regime.
- If
net_gex < 0 and the 5-day exponential moving average of net_gex is declining, the regime is "amplifying."
- Compute the opening 15-minute return (9:30 to 9:45). If positive in an amplifying regime, go long a 1-week ATM call debit spread; if negative, go long a 1-week ATM put debit spread.
- Exit at 14:30 or at 2x the initial mid-price, whichever comes first.
- Flat when regime is positive gamma or undefined.
Historical grounding: the 8-year GEX backtest (see the linked study) shows that negative-GEX regimes on SPY correlate with realized vol that exceeds VIX-implied vol more often than positive-GEX regimes. The conditional signal is not guaranteed alpha; it is a structurally motivated filter.
5.2 VRP Premium-Selling
The thesis: implied vol systematically overstates realized vol on average (the VRP). When VRP is in the upper tercile of its trailing distribution, selling premium (short straddles, short strangles, put credit spreads) has historically had a positive expected value. When VRP is compressed or negative, stand down.
Signal:
- Daily, pull
/v1/vrp/{t} for each candidate. Gate on vrp.percentile ≥ 75 and regime.vrp_regime == "harvestable".
- Sell a 30-delta put credit spread 21 DTE, short put at the put wall from
/v1/exposure/levels/{t}, long put $5 further out-of-the-money.
- Exit at 50% of the initial credit or 7 DTE, whichever comes first.
- Apply the fill model with α = 0.5. Apply a position-size cap of 2% of notional per trade.
For the detailed backtest study see Historical VRP percentiles, no lookahead bias and Architectures for VRP harvesting.
5.3 Realized-Vol Dispersion
The thesis: the implied vol of an index is higher than the implied vol of its constituents weighted by their index weights. The excess is the correlation risk premium. Dispersion trades exploit this: long single-name straddles, short index straddle, size-neutral on vega.
The historical API supports dispersion research: pull /v1/adv_volatility/{name} for each constituent and /v1/adv_volatility/SPY for the index on the same timestamp. The dispersion metric is:
Dispersion spread
$$\text{Disp}(t) = \sigma_{\text{index}}^{\text{IV}}(t) - \sqrt{\sum_{i} w_i^2 \cdot (\sigma_i^{\text{IV}}(t))^2 + 2\sum_{i<j} w_i w_j \cdot \rho_{ij}^{\text{implied}} \cdot \sigma_i^{\text{IV}}(t) \cdot \sigma_j^{\text{IV}}(t)}$$
Enter when Disp(t) is above its trailing median (index vol rich relative to constituents); unwind when it reverts. The SVI surface endpoint gives the arbitrage-free ATM vol per name needed for this calculation.
6. Validation and Pitfalls
Before reviewing the full kinks list, the high-level validation discipline:
- Walk-forward, not in-sample optimization. Fit signal parameters on a training window, evaluate on the next held-out window, roll forward. Never select parameters using the full sample period.
- Out-of-sample periods should include stress events. A backtest that excludes March 2020, August 2024, and Q4 2018 is not a backtest; it is an in-sample fit on calm markets.
- Costs must be large enough to matter. If removing commissions and slippage changes the Sharpe by more than 0.3, the edge is in the cost model, not the signal.
- Regime sensitivity test. Split the backtest by VIX tercile (low/mid/high). If the strategy only works in one regime, state that clearly. Do not average across regimes and claim it works "in general."
7. The Kinks and Common Mistakes
The ten failure modes, ordered roughly by how often they appear and how silently they operate.
Kink 1: Survivorship Bias in Cross-Sectional Work
A cross-sectional options backtest that uses a fixed ticker list—e.g., the current S&P 500 constituents—is biased. Names that got delisted, merged, or dropped from the index between 2018 and today are excluded from history but would have been in the universe at the time. Their exclusion means your universe is tilted toward long-term survivors, which are generally less volatile and less likely to blow up.
Mitigation: construct the universe from a point-in-time index membership list. Use ETF constituents that were in the index at the start of each calendar year. The screener works on the live universe; for historical cross-sectional studies, you need to supply the correct universe date-specifically.
Kink 2: Lookahead Bias in Computed Analytics
Lookahead is not only about raw timestamps. It lives in derived analytics. If VRP percentile is computed using the full 8-year sample, a reading in 2019 is ranked against observations that hadn't happened yet. The historical VRP endpoint computes percentile with a trailing window, but you need to verify the window definition matches your signal. A 52-week window should use only the trailing 252 trading days, not a fixed 2018–present baseline.
Kink 3: Restatement and Non-Determinism
Analytics that depend on a methodology that gets updated will produce different historical numbers if you replay today versus replaying a year ago. This is not a bug in the data; it is a property of any system that recomputes rather than archives. Consequences:
- Do not assume a historical response today matches a historical response from 18 months ago if the analytics methodology was updated.
- Archive raw responses at the time of your backtest if you need bit-exact reproducibility for a subsequent live/research comparison.
- Report the endpoint version or methodology version in your research notes.
Kink 4: Ignoring Fills and Slippage
The most common error in published options backtests. A 1-DTE short straddle on SPY mid-priced at $3.20 with a $0.25 bid-ask spread has an effective premium of $2.95 to open (selling at bid) and $3.45 to close (buying at ask). That $0.50 round-trip is 15.6% of the premium on entry. On a strategy that returns 20% annualized at mid, half the edge disappears when fills are modeled honestly. Apply the fill model in section 4.4 and perform fill-sensitivity analysis (vary α from 0.3 to 0.7).
Kink 5: Overfitting Strike, Expiry, and Signal Threshold Selection
Options strategies have many free parameters: which strike (30-delta? 20-delta? ATM?), which expiry (7 DTE? 21 DTE? 45 DTE?), which signal threshold (VRP percentile ≥ 70? 75? 80?), which exit (50% profit? 21 DTE? EOD?). Optimizing all of these on the same sample produces a strategy that is precisely tuned to past history and will not generalize. Treat signal threshold as a hyper-parameter, fit it on the training window only, and test the out-of-sample period with the threshold locked.
Kink 6: Regime Dependence Without Disclosure
VRP premium-selling has a well-known asymmetry: it wins slowly in calm markets and loses quickly in vol spikes. A backtest that ends in January 2020 looks very different from one that ends in May 2020. Many published results fail to disclose the maximum drawdown in the worst regime or the time to recovery after a drawdown event. Report Calmar ratio, worst drawdown, and drawdown duration alongside Sharpe. If the strategy requires an active drawdown breaker (e.g., go flat when VIX > 30), disclose that explicitly and backtest it as a system parameter, not an ad-hoc override.
Kink 7: Point-in-Time OI vs Revised OI
Exchanges publish preliminary OI after the close and finalize it the following morning. Some historical datasets use the finalized OI for all past dates. If your signal uses OI to compute GEX or max pain, using finalized OI for a 14:30 signal is lookahead: you're using tomorrow morning's print for a trade decision made this afternoon. Verify that the historical OI you are using is the preliminary morning publication, not the post-session revision.
Kink 8: Train/Test Leakage via Information Shared Across Splits
Even with a chronological train/test split, leakage can enter through shared normalization: z-scoring a signal on the full sample, computing rolling statistics using data from the test window in the denominator, or selecting features based on correlation to the full-sample label. The fix is rigid pipeline serialization: fit all preprocessors (scalers, rolling windows, quantile functions) on the training data only, then transform both train and test with those frozen parameters.
Kink 9: The Research-vs-Production Data Mismatch
Described in section 2.3 but worth restating as a kink. The shape of the historical option-quote response differs from the live response: flat array vs wrapped object, renamed fields (implied_vol vs iv), historical-only fields (vanna, charm). If your backtest parser only tests against the historical shape and your production parser only tests against the live shape, the first time you try to run backtest code in a live context you will hit a key error on a field that has a different name in the other endpoint. Write a single parser function that handles both shapes explicitly, and test it against both a historical response and a live response before deploying.
Kink 10: Assignment Risk on Short Strikes Near Expiry
A failure mode that kink 4 does not capture, because it is not about fills at entry—it is about what happens to the position as expiry approaches. A short put that goes in-the-money within the final 2–3 DTE carries early-assignment risk (for American-style options) and, if held to expiration, delivers a long equity position rather than a cash settlement. Most backtests model the short put as expiring worthless or being bought back at a modeled price, without accounting for the possibility that the broker assigns the underlying shares, triggering margin calls and forced liquidations that are not in the P&L model.
For credit spreads, the risk is not assignment but pin risk: if the underlying closes exactly between the short and long strikes at expiry, one leg expires in-the-money and the other does not. The resulting net position can be a naked short or long the underlying overnight, with full gap risk before the position is resolved. Any backtest that treats a credit spread as simply "expired worthless" or "expired at max loss" without modeling the pin-risk scenario is missing a tail outcome that occurs with non-trivial frequency around major strikes at OpEx. The fix: in the exit loop, flag any position that reaches 1 DTE with the short strike within 1% of spot, and apply a conservative exit fill rather than holding to expiry.
8. Worked Example: A VRP Backtest Sketch
The following is a condensed but complete sketch of a 30-delta put credit spread backtest on SPY using the VRP signal. Not a publication-quality study; a worked illustration of the pipeline pieces from this guide.
from datetime import date, timedelta
import pandas as pd
import requests
from flashalpha import FlashAlpha
HIST_BASE = "https://historical.flashalpha.com"
API_KEY = "YOUR_KEY"
hx = FlashAlpha(api_key=API_KEY, base_url=HIST_BASE)
# ---- Parameters ----
SIGNAL_OPEN_HOUR = "15:30:00" # signal 30 min before close (15:30 ET)
VRP_PCTILE_THRESH = 75 # enter when VRP >= 75th percentile
ALPHA_FILL = 0.5 # fill model: 50% of half-spread
PROFIT_TARGET_PCT = 0.50 # close at 50% of initial credit
DTE_OPEN = 21
DTE_CLOSE = 7
MAX_NOTIONAL_PCT = 0.02 # 2% of portfolio per trade
# ---- Build rebalance calendar ----
start, end = date(2019, 1, 2), date(2024, 12, 31)
cal_days = pd.bdate_range(start, end, freq="B")
# Resample to every 5 business days (weekly rebalance)
rebalance_dates = cal_days[::5]
trades = []
for rd in rebalance_dates:
ts = f"{rd.isoformat()}T{SIGNAL_OPEN_HOUR}"
# 1. Screen for candidates
try:
vrp_data = hx.vrp("SPY", at=ts)
vol_data = hx.volatility("SPY", at=ts)
exp_data = hx.exposure_summary("SPY", at=ts)
except Exception:
continue # skip dates with no data (holidays, etc.)
# 2. Gate on signal
if (vrp_data["vrp"]["percentile"] < VRP_PCTILE_THRESH or
vol_data["iv_rv_spreads"]["vrp_30d"] <= 0 or
exp_data["regime"]["label"] not in
("positive_gamma", "neutral")):
continue
# 3. Select structure: put credit spread at the put wall
put_wall = exp_data["levels"]["put_wall"]
chain_resp = requests.get(
f"{HIST_BASE}/v1/optionquote/SPY",
headers={"X-Api-Key": API_KEY},
params={"at": ts}
)
chain_resp.raise_for_status()
chain = chain_resp.json() # flat array per REST /v1/optionquote/{t}?at=
# Find short put near the put wall, DTE_OPEN days out
short_put = next(
(c for c in chain
if c["option_type"] == "P"
and abs(c["strike"] - put_wall) <= 2.5
and DTE_OPEN - 3 <= c["days_to_expiry"] <= DTE_OPEN + 3
), None
)
if short_put is None:
continue
long_put_strike = short_put["strike"] - 5
long_put = next(
(c for c in chain
if c["option_type"] == "P"
and c["strike"] == long_put_strike
and c["days_to_expiry"] == short_put["days_to_expiry"]
), None
)
if long_put is None:
continue
# 4. Apply fill model
short_fill = fill_price(
short_put["bid"], short_put["ask"], "sell", ALPHA_FILL
)
long_fill = fill_price(
long_put["bid"], long_put["ask"], "buy", ALPHA_FILL
)
net_credit = short_fill - long_fill
if net_credit <= 0:
continue # no credit after fills
trades.append({
"open_date": rd,
"symbol": "SPY",
"short_strike": short_put["strike"],
"long_strike": long_put_strike,
"expiry": short_put["expiration"],
"net_credit": round(net_credit, 4),
"vrp_pctile": vrp_data["vrp"]["percentile"],
"vrp_zscore": vrp_data["vrp"]["z_score"],
"vrp_30d": vol_data["iv_rv_spreads"]["vrp_30d"],
"regime": exp_data["regime"]["label"],
})
df = pd.DataFrame(trades)
print(f"Trades: {len(df)}")
print(df[["open_date","short_strike","net_credit","vrp_pctile"]].head())
ALPHA TIER — HISTORICAL API
Run this backtest on real data
The Alpha plan unlocks the full Historical API since 2018 — GEX/DEX/VEX/CHEX, VRP, option chains, and SVI surfaces — all at minute resolution, with response shapes matching live (a few need a thin adapter). The screener stays live-only (no historical mode).
See Alpha pricing →
What this sketch leaves out (deliberately, as exercises in kink-avoidance):
- The exit loop: pull chain at each
DTE_CLOSE and profit-target check date, apply buy fill, record P&L.
- The assignment model: short puts near expiry risk assignment when ITM; handle the delta of the underlying position.
- The drawdown breaker: flatten all positions if VIX exceeds a threshold; record the re-entry gate.
- Walk-forward parameter selection: this sketch uses fixed parameters; in production, fit them on each rolling training window.
For a complete worked version with exit logic and walk-forward results, see the ML on options data guide and the VRP architecture study at Architectures for VRP harvesting.
9. Tooling: The Historical API and the MCP Connector
9.1 Historical API Endpoints
Replay endpoints live on historical.flashalpha.com and take an ?at= timestamp (ET wall-clock). VRP history/snapshot and the screener live on lab.flashalpha.com.
| Endpoint | Host | Use | Tier |
GET /v1/exposure/gex/{t}?at= | historical | Net GEX by strike, gamma flip, call/put walls | Alpha |
GET /v1/exposure/summary/{t}?at= | historical | Full dealer-positioning state: GEX/DEX/VEX/CHEX, regime, walls | Alpha |
GET /v1/optionquote/{t}?at= | historical | Full option chain: IV, greeks, OI, bid/ask (flat array) | Alpha |
GET /v1/volatility/{t}?at= | historical | IV, realized vol, skew, term structure, IV rank | Alpha |
GET /v1/surface/{t}?at= | historical | 50×50 IV surface grid (minute-level) | Alpha |
GET /v1/adv_volatility/{t}?at= | historical | SVI params (EOD), variance surface, arb flags | Alpha |
GET /v1/exposure/levels/{t}?at= | historical | Call wall, put wall, gamma flip, max pain | Alpha |
GET /v1/vrp/{t}/history?days= | lab | VRP daily time series (1–365 days) | Alpha |
GET /v1/vrp/{t}?date= | lab | Persisted VRP snapshot for one past date (YYYY-MM-DD) | Alpha |
POST /v1/screener | lab | Live-universe filter on GEX, VRP, IV rank, regime (no historical mode) | Growth |
9.2 Python SDKs
# pip install flashalpha (one SDK for both modes)
from flashalpha import FlashAlpha
# Historical replay: point at the historical host + pass at=
hx = FlashAlpha(api_key="YOUR_KEY", base_url="https://historical.flashalpha.com")
snap = hx.exposure_summary("SPY", at="2022-06-15T14:30:00")
# Live production: default host, no at= parameter
fa = FlashAlpha(api_key="YOUR_KEY")
snap = fa.exposure_summary("SPY")
# Shared parser: one function handles both response shapes
def parse_regime(response: dict) -> str:
return response.get("regime", {}).get("label", "unknown")
One SDK, two hosts. For the option-quote endpoint specifically, write a thin adapter to normalize the field names between historical (implied_vol, open_interest) and live (iv, oi) before passing to downstream code.
9.3 MCP Connector for AI-Assisted Research
If you use an LLM-augmented research workflow, the FlashAlpha quant MCP connector exposes the full Historical API as callable tools that an LLM can invoke with context:
# Claude Desktop / compatible client: add to mcp-config.json
{
"mcpServers": {
"flashalpha-quant": {
"url": "https://lab.flashalpha.com/mcp-oauth/quant"
}
}
}
Via the MCP connector, an assistant can call get_historical_gex("SPY", "2020-03-16T14:30:00Z") directly from a research notebook prompt, with the response fed back into the conversation context. The connector endpoint for the quant persona is https://lab.flashalpha.com/mcp-oauth/quant (OAuth; requires an Alpha key). For the MCP connector overview and auth setup see the MCP/OAuth guide.
10. FAQ
- How far back does the historical data go?
- Every analytics endpoint—GEX, DEX, VEX, CHEX, exposure summary, VRP, IV surface, SVI, option chains, stock quotes—replays since 2018 at minute-level resolution (9:30–16:00 ET on US trading days) via
historical.flashalpha.com with an ?at= timestamp. The screener is live-only and has no historical mode; reconstruct a backtest candidate set from a point-in-time universe list plus per-name ?at= analytics.
- Is the historical data truly point-in-time?
- Yes for the analytics layer. A request for timestamp T returns the state of the book as it would have been computed at T: spot, greeks, IV, and computed dealer exposure all as of that minute. OI is an EOD field; it takes the morning publication value for all intraday minutes on that date, which is what any live system would have used. SVI calibration parameters are EOD-stamped per trading day.
- Do historical and live endpoints return the same response shape?
- Mostly. Most analytics endpoints (exposure summary, VRP, volatility, surface, advanced volatility) match. A few diverge and need a thin adapter: the option-chain endpoint (flat array, renamed fields like
implied_vol/open_interest, historical-only greeks), and the max-pain and stock-summary responses, which have minor shape differences from live. Test your parser against both modes.
- Which plan do I need for historical backtesting?
- Alpha. It unlocks the full Historical API since 2018 with unlimited requests, no caching, advanced volatility (SVI), VRP analytics, and the option-chain endpoint at minute resolution. The screener is available from Growth. Tiers below Alpha return 403 on historical endpoints.
- Can I screen the full universe, not just one ticker?
- Yes, live. The screener endpoint ranks and filters across all covered symbols on the current universe — there is no
date parameter and no historical screener. For historical cross-sectional studies, build the candidate set yourself: take index membership as of the rebalance date, pull each name's analytics with ?at= from the historical host, and apply your filter/sort in code. That avoids both survivorship bias and live-screener lookahead.
- How do I handle the missing-volume field in historical option chains?
- Volume is always zero in the historical chain. Use OI as a liquidity proxy: apply a liquidity discount or filter strikes below a minimum OI threshold. For deep-OTM strikes with very low OI, widen the fill-model alpha or exclude them from the candidate set entirely.
- What is the right unit test for point-in-time correctness?
- Pull the same symbol at two timestamps 30 minutes apart on a volatile day (e.g., SPY on March 16 2020 at 13:00 and 13:30). The underlying price, net GEX, and regime should all differ. If they are identical or if the
as_of field shows the same value despite different query parameters, the endpoint is returning cached data rather than per-minute state.
- How should I handle event days (FOMC, CPI) in a backtest?
- Treat event days as a separate regime. Build a flag column in your backtest dataframe that marks FOMC decision days, CPI/PPI release days, and quarterly OpEx. Report strategy performance with and without those days. Many premium-selling strategies look markedly better or worse when event days are included or excluded; that behavior is itself an important empirical finding that should be disclosed.
11. Conclusion
The systematic options research pipeline has three honest bottlenecks: point-in-time data, a realistic fill model, and discipline against the ten kinks that inflate backtested Sharpe before live trading reveals the truth.
The Historical API covers the first: GEX, DEX, VEX, CHEX, VRP, full option chains, IV surfaces, and SVI parameters, all available at any minute since 2018, with response shapes that mostly match the live endpoints (a few documented exceptions — the option-chain, max-pain, and stock-summary responses — need a thin adapter). The second is a function of how you model fills—use the mid only if you want to benchmark a perfect-execution strategy, and test sensitivity across α from 0.3 to 0.7. The third is a checklist: survivorship, lookahead, restatement, fills, overfitting, regime dependence, OI revision, train/test leakage, schema mismatch, and bid-ask widening at high VRP.
Once the backtest is validated, the production migration is a one-parameter change: remove at= from each call and the same code runs live. That is the architectural reason to use a pre-computed analytics layer rather than building a raw-chain pipeline from scratch—not because it is convenient, but because the research-to-production data mismatch is itself one of the ten kinks.
Related Reading
Quant use-case page
Alpha pricing