Avoiding Look-Ahead Bias: A Case Study
A worked example showing why look-ahead bias inflates backtest returns. Compare 'as-reported' vs 'restated' values on a known restatement and quantify the alpha leak.
The pattern: query as-reported vs restated
Every fact in Valuein's warehouse carries an accepted_at timestamp. To pull the value as it was originally reported, filter by accepted_at <= filing_date + a small window. To pull the latest restated value, take MAX(accepted_at) per period. The query below shows both side-by-side.
from valuein_sdk import ValueinClient
with ValueinClient() as client:
df = client.query("""
WITH original AS (
SELECT ticker, fiscal_year, numeric_value AS as_reported
FROM fact
WHERE ticker = 'GE'
AND standard_concept = 'NetIncome'
AND fiscal_period = 'FY'
AND fiscal_year BETWEEN 2015 AND 2018
QUALIFY ROW_NUMBER() OVER (
PARTITION BY ticker, fiscal_year
ORDER BY accepted_at ASC
) = 1
),
latest AS (
SELECT ticker, fiscal_year, numeric_value AS restated
FROM fact
WHERE ticker = 'GE'
AND standard_concept = 'NetIncome'
AND fiscal_period = 'FY'
AND fiscal_year BETWEEN 2015 AND 2018
QUALIFY ROW_NUMBER() OVER (
PARTITION BY ticker, fiscal_year
ORDER BY accepted_at DESC
) = 1
)
SELECT
o.fiscal_year,
o.as_reported / 1e9 AS as_reported_b,
l.restated / 1e9 AS restated_b,
(l.restated - o.as_reported) / 1e9 AS revision_b
FROM original o
JOIN latest l USING (ticker, fiscal_year)
ORDER BY fiscal_year
""")
print(df)Why this matters: a worked example
GE restated 2017 financials in early 2018 after the Power segment writedowns. A naive value screen run today using restated data would correctly mark GE as troubled in 2017. The same screen run with PIT data — only seeing the as-reported numbers available on Dec 31, 2017 — would have rated GE attractively. The stock then dropped 50% in 2018. Models that 'cheated' by using restated data avoided this drawdown in backtests; models that used PIT data took the full hit, just like real money would have. That's a 200-500 bps alpha gap entirely attributable to look-ahead bias.
The PIT discipline in one query
Anywhere you run a screen or backtest, add an as_of_date filter. The pipeline guarantees accepted_at is the SEC acceptance timestamp — pre-restatement, pre-revision. Filtering on it is the single most important defense against look-ahead bias.
with ValueinClient() as client:
# Universe + facts as known on 2017-12-31
df = client.query("""
WITH universe_at AS (
SELECT symbol
FROM index_membership
WHERE index_name = 'sp500'
AND start_date <= DATE '2017-12-31'
AND (end_date IS NULL OR end_date > DATE '2017-12-31')
),
known_metrics AS (
SELECT
ticker,
MAX(CASE WHEN standard_concept = 'NetIncome' THEN numeric_value END) AS net_income,
MAX(CASE WHEN standard_concept = 'TotalRevenue' THEN numeric_value END) AS revenue
FROM fact
WHERE accepted_at <= TIMESTAMP '2017-12-31 23:59:59'
AND fiscal_period = 'FY'
AND fiscal_year = 2017
GROUP BY ticker
)
SELECT u.symbol, k.revenue, k.net_income,
k.net_income / NULLIF(k.revenue, 0) AS net_margin
FROM universe_at u
JOIN known_metrics k ON k.ticker = u.symbol
ORDER BY net_margin DESC NULLS LAST
LIMIT 50
""")Where to go next
Apply the same pattern to any backtest. The Point-in-Time Backtesting guide shows the full monthly-rebalance loop. The Methodology page documents how accepted_at differs from filing_date and report_date — and why it's the only timestamp safe to use for PIT analysis.
You've finished all guides
Ready to run code? See the examples.