Frame the question
We begin with a written, falsifiable claim about a specific mispricing: what should be true, and why it should persist.
Define the evidence
Before testing, we decide what would confirm or disconfirm the idea, so the answer is not chosen after the fact.
Control for leakage
We reconstruct data as it existed at decision time, guarding against lookahead, survivorship, and subtle information bleed.
Test through time
Walk-forward evaluation across regimes replaces in-sample fit, which reliably flatters ideas that do not hold up live.
Evaluate probability quality
We judge calibration, not just direction — whether stated probabilities match observed frequencies out of sample.
Compare to market
An estimate is only interesting where it disagrees with the market in a way that survives fees and structure.
Assess tradability
We ask whether the edge survives liquidity, queue position, and execution cost before it is treated as real.
Deploy carefully
Promising strategies go live in shadow, then at fractional size, with the live-versus-expected gap watched explicitly.
Monitor live behavior
Calibration, drift, and data health are tracked continuously; a strategy is never considered finished.
Study failure
When something breaks, we decompose why, whether signal, execution, sizing, or luck, before changing anything.
