How we turn raw XBRL into point-in-time queryable data
Bloomberg, WRDS, and Compustat make claims about point-in-time accuracy and survivorship-bias-free coverage. We document exactly how ours work — concept standardization, amendment handling, accepted_at semantics, and the validation checks that run on every release.
Why XBRL is hard
The SEC has required XBRL submissions since 2009. The format is machine-readable, but standardization stops there. Each filer picks their own taxonomy: us-gaap, ifrs-full, or a custom extension. A single concept like “revenue” resolves to a dozen possible XBRL tags depending on the company, the year, and whether ASC 606 had been adopted.
Restatements are not corrections — they are new filings with the same fiscal period but different values. Quarterly cash flow statements report year-to-date totals, not quarter-only figures. Foreign filers use 20-F instead of 10-K with subtly different concept names.
Anyone can parse XBRL. Producing a dataset where SELECT revenue FROM fact WHERE ticker = 'AAPL' returns the same values 30 years apart — that's the work.
Concept standardization
We map roughly 12,000 raw XBRL tags to ~200 canonical concepts. Mappings are versioned in a taxonomy_guide table that ships with every Parquet bucket — so you can audit every transformation we apply.
Worked example: Revenue
| Source XBRL tag | Used by | Note |
|---|---|---|
| us-gaap:Revenues | Apple, Microsoft | Most common |
| us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax | Tesla, Walmart | Post-ASC 606 adoption |
| us-gaap:SalesRevenueNet | Pre-2018 filers | Legacy tag, deprecated |
| us-gaap:RevenueFromContractWithCustomerIncludingAssessedTax | Some retailers | Includes sales tax pass-through |
| msft:Revenues | Microsoft (custom extension) | Custom XBRL extension |
All five resolve to standard_concept = 'TotalRevenue'. Every fact also keeps its source_concept, so you can trace any standardized value back to the exact XBRL tag the company filed.
Point-in-time, not point-in-hindsight
The most common look-ahead bias in financial data isn't malicious — it's using the wrong date column. Three timestamps live on every fact, and they mean different things.
report_dateWhen the period endede.g. 2024-09-28 (Apple FY2024)
Aligns financials to a fiscal calendar. Never use as a PIT cutoff — companies file weeks or months later.
filing_dateWhen the filing was submittede.g. 2024-11-01
Useful for filing-cadence analysis. Still not PIT-safe — filings can be accepted hours after the date stamp.
accepted_atWhen SEC accepted it (the canonical PIT field)e.g. 2024-11-01T06:01:36Z
The exact moment the data became public. Use this — and only this — for backtests and any look-ahead-free analysis.
Every PIT-safe MCP tool and SDK method accepts an as_of_date parameter. Internally, that filters on accepted_at <= as_of_date — the queryable equivalent of “what did the market know on this date?”
Amendments and restatements
When a company files a 10-K/A, the SEC treats it as a new filing — not an overwrite. Most data vendors collapse the amendment over the original, destroying the historical view. We keep both.
Two rows, one fiscal period
-- Apple FY2018 net income, original filing ticker fiscal_year standard_concept numeric_value accepted_at AAPL 2018 NetIncome 59531000000 2018-11-05T18:23:00Z -- Apple FY2018 net income, after restatement (hypothetical) ticker fiscal_year standard_concept numeric_value accepted_at AAPL 2018 NetIncome 59300000000 2019-02-12T14:51:00Z
A backtest that ran on 2018-12-01 sees the first row only — the original $59.531B. A current dashboard sees the latest accepted value. Both are correct; both are queryable. The PIT discipline is what guarantees you get the right one.
Quarterly cash flow derivation
In Q2 and Q3 10-Q filings, US GAAP requires cash flow statements to report year-to-date totals. Computing a clean quarterly time series requires subtracting the prior quarter — every time, for every issuer, for every line item.
Example: operating cash flow
Period numeric_value (YTD) derived_quarterly_value Q1 2024 12.0B 12.0B Q2 2024 28.0B 16.0B ← 28.0 − 12.0 Q3 2024 45.0B 17.0B ← 45.0 − 28.0 Q4 2024 62.0B 17.0B ← 62.0 − 45.0
Both columns ship in every Parquet bucket. Use COALESCE(derived_quarterly_value, numeric_value) when you want a true quarterly time series; use numeric_value when you specifically want the as-reported YTD figure.
Validation checks on every release
Every fact returned by the MCP server includes a _meta.data_quality block listing which checks passed. The set runs on every Parquet build before we publish.
no_duplicate_period_endsA company cannot report two FY2024 income statements. Detects dirty XBRL submissions and amendment collisions.
monotonic_period_endsQuarterly periods must be strictly ordered. Catches mis-tagged fiscal periods that would corrupt time-series queries.
no_byte_identical_metricsAdjacent periods with identical revenue, net income, and EPS down to the cent are almost always copy-paste filing errors.
amendment_lineage_intactEvery restated value must trace back to its original via the same accession_id chain. Orphan amendments are quarantined.
concept_coverage_thresholdEach S&P500 entity must resolve at least 80% of canonical concepts per period. Below threshold flags pipeline regression.
Verify it yourself
Every claim on this page is testable from the sample tier — no token, no signup. Pick any S&P500 ticker and inspect the lineage of any fact via verify_fact_lineage:
curl -X POST https://mcp.valuein.biz/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "verify_fact_lineage",
"arguments": {
"ticker": "AAPL",
"concept": "TotalRevenue",
"period_end": "2024-12-31"
}
}
}'The response chains the standardized value back to its source XBRL tag, the SEC accession ID, and the filing URL. If we changed it, you can see why.
Methodology you can audit, data you can trust.
Every step above ships with the data. Read the docs, query the sample tier, and compare against the SEC filings yourself.