MCP Server live — AI agents can now query 105M+ SEC facts. Connect your agent →
ValueinValuein
Methodology

How we turn raw XBRL into point-in-time queryable data

Bloomberg, WRDS, and Compustat make claims about point-in-time accuracy and survivorship-bias-free coverage. We document exactly how ours work — concept standardization, amendment handling, accepted_at semantics, and the validation checks that run on every release.

Why XBRL is hard

The SEC has required XBRL submissions since 2009. The format is machine-readable, but standardization stops there. Each filer picks their own taxonomy: us-gaap, ifrs-full, or a custom extension. A single concept like “revenue” resolves to a dozen possible XBRL tags depending on the company, the year, and whether ASC 606 had been adopted.

Restatements are not corrections — they are new filings with the same fiscal period but different values. Quarterly cash flow statements report year-to-date totals, not quarter-only figures. Foreign filers use 20-F instead of 10-K with subtly different concept names.

Anyone can parse XBRL. Producing a dataset where SELECT revenue FROM fact WHERE ticker = 'AAPL' returns the same values 30 years apart — that's the work.

Concept standardization

We map roughly 12,000 raw XBRL tags to ~200 canonical concepts. Mappings are versioned in a taxonomy_guide table that ships with every Parquet bucket — so you can audit every transformation we apply.

Worked example: Revenue

Source XBRL tagUsed byNote
us-gaap:RevenuesApple, MicrosoftMost common
us-gaap:RevenueFromContractWithCustomerExcludingAssessedTaxTesla, WalmartPost-ASC 606 adoption
us-gaap:SalesRevenueNetPre-2018 filersLegacy tag, deprecated
us-gaap:RevenueFromContractWithCustomerIncludingAssessedTaxSome retailersIncludes sales tax pass-through
msft:RevenuesMicrosoft (custom extension)Custom XBRL extension

All five resolve to standard_concept = 'TotalRevenue'. Every fact also keeps its source_concept, so you can trace any standardized value back to the exact XBRL tag the company filed.

Point-in-time, not point-in-hindsight

The most common look-ahead bias in financial data isn't malicious — it's using the wrong date column. Three timestamps live on every fact, and they mean different things.

report_dateWhen the period ended

e.g. 2024-09-28 (Apple FY2024)

Aligns financials to a fiscal calendar. Never use as a PIT cutoff — companies file weeks or months later.

filing_dateWhen the filing was submitted

e.g. 2024-11-01

Useful for filing-cadence analysis. Still not PIT-safe — filings can be accepted hours after the date stamp.

accepted_atWhen SEC accepted it (the canonical PIT field)

e.g. 2024-11-01T06:01:36Z

The exact moment the data became public. Use this — and only this — for backtests and any look-ahead-free analysis.

Every PIT-safe MCP tool and SDK method accepts an as_of_date parameter. Internally, that filters on accepted_at <= as_of_date — the queryable equivalent of “what did the market know on this date?”

Amendments and restatements

When a company files a 10-K/A, the SEC treats it as a new filing — not an overwrite. Most data vendors collapse the amendment over the original, destroying the historical view. We keep both.

Two rows, one fiscal period

-- Apple FY2018 net income, original filing
ticker     fiscal_year  standard_concept  numeric_value  accepted_at
AAPL       2018         NetIncome         59531000000    2018-11-05T18:23:00Z

-- Apple FY2018 net income, after restatement (hypothetical)
ticker     fiscal_year  standard_concept  numeric_value  accepted_at
AAPL       2018         NetIncome         59300000000    2019-02-12T14:51:00Z

A backtest that ran on 2018-12-01 sees the first row only — the original $59.531B. A current dashboard sees the latest accepted value. Both are correct; both are queryable. The PIT discipline is what guarantees you get the right one.

Quarterly cash flow derivation

In Q2 and Q3 10-Q filings, US GAAP requires cash flow statements to report year-to-date totals. Computing a clean quarterly time series requires subtracting the prior quarter — every time, for every issuer, for every line item.

Example: operating cash flow

Period        numeric_value (YTD)   derived_quarterly_value
Q1 2024       12.0B                 12.0B
Q2 2024       28.0B                 16.0B   ← 28.0 − 12.0
Q3 2024       45.0B                 17.0B   ← 45.0 − 28.0
Q4 2024       62.0B                 17.0B   ← 62.0 − 45.0

Both columns ship in every Parquet bucket. Use COALESCE(derived_quarterly_value, numeric_value) when you want a true quarterly time series; use numeric_value when you specifically want the as-reported YTD figure.

Validation checks on every release

Every fact returned by the MCP server includes a _meta.data_quality block listing which checks passed. The set runs on every Parquet build before we publish.

no_duplicate_period_ends

A company cannot report two FY2024 income statements. Detects dirty XBRL submissions and amendment collisions.

monotonic_period_ends

Quarterly periods must be strictly ordered. Catches mis-tagged fiscal periods that would corrupt time-series queries.

no_byte_identical_metrics

Adjacent periods with identical revenue, net income, and EPS down to the cent are almost always copy-paste filing errors.

amendment_lineage_intact

Every restated value must trace back to its original via the same accession_id chain. Orphan amendments are quarantined.

concept_coverage_threshold

Each S&P500 entity must resolve at least 80% of canonical concepts per period. Below threshold flags pipeline regression.

Verify it yourself

Every claim on this page is testable from the sample tier — no token, no signup. Pick any S&P500 ticker and inspect the lineage of any fact via verify_fact_lineage:

curl -X POST https://mcp.valuein.biz/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "verify_fact_lineage",
      "arguments": {
        "ticker": "AAPL",
        "concept": "TotalRevenue",
        "period_end": "2024-12-31"
      }
    }
  }'

The response chains the standardized value back to its source XBRL tag, the SEC accession ID, and the filing URL. If we changed it, you can see why.

Methodology you can audit, data you can trust.

Every step above ships with the data. Read the docs, query the sample tier, and compare against the SEC filings yourself.