MCP Server live — AI agents can now query 105M+ SEC facts. Connect your agent →
ValueinValuein
Methodology

Point-in-Time Accuracy

Financial backtests fail when data is available too early — you use information that wasn't publicly known at the time. Valuein timestamps every fact with knowledge_at: the exact moment the SEC accepted the filing. Filter by it and your backtest is safe.

knowledge_at
SEC acceptance timestamp
filing_date
Date filed with SEC
report_date
Fiscal period end

Why Most Databases Introduce Look-Ahead Bias

Most financial databases store data as it exists today, not as it was known historically. A fiscal year 2021 annual report filed in March 2022 is often backdated to December 31, 2021 — making it appear as if that data was available before it was. Backtests built on this data are invalid.

Timeline for a 2022 10-K Filing

Dec 31, 2022
report_date

Fiscal period ends. Results for the full year are computed internally. The public knows nothing yet.

Feb 15, 2023
filing_date

Company submits the 10-K to SEC EDGAR. Still not indexed in full — processing occurs over hours.

Feb 15, 2023 17:42 UTC
knowledge_at

SEC accepts and timestamps the filing. This is the earliest moment any investor could have seen this data. Filter by this.

The hidden trap

If a data vendor stores this filing against fiscal_year = 2022 with no timestamp, a backtest that says "use 2022 annual data as of Jan 1, 2023" will include it — but the filing wasn't accepted until Feb 15, 2023. Your simulated portfolio used information that didn't exist yet. This introduces look-ahead bias and inflates backtest performance.

The Three Key Fields

knowledge_at
fact
TIMESTAMPTZ

The exact UTC timestamp the SEC accepted the filing. Derived from filing.accepted_at. This is your PIT filter — use it exclusively for backtest-safe queries. It represents the earliest moment any investor could have read this data.

Always filter: WHERE knowledge_at <= your_date

filing_date
filing
DATE

The date the SEC received the filing. Very close to knowledge_at but lacks the exact time component. Suitable for rough date-range filtering but knowledge_at is more precise for PIT analysis.

Safe for range filtering, less precise than knowledge_at

report_date
filing
DATE

The fiscal period end date (e.g. December 31 for a calendar-year company). This is NOT a PIT field — using it as a filter introduces look-ahead bias because the data wasn't known until the filing date weeks or months later.

For display purposes only — never use as a PIT filter

Wrong vs. Right Queries

The difference between a biased and a valid backtest often comes down to a single WHERE clause.

Wrong — look-ahead bias
-- WRONG: look-ahead bias introduced
-- This returns data as if you knew it on Jan 1 2022,
-- but 10-K filings for fiscal year 2021 weren't published
-- until Feb–March 2022. You're using future information.
SELECT
  r.symbol,
  fa.numeric_value / 1e9 AS revenue_billions
FROM references r
JOIN filing f  ON f.entity_id = r.cik
JOIN fact fa   ON fa.accession_id = f.accession_id
WHERE fa.standard_concept = 'Revenues'
  AND f.fiscal_year = 2021          -- WRONG: fiscal year is NOT when data was known
  AND f.form_type   = '10-K'
ORDER BY revenue_billions DESC;
Right — PIT-safe
-- RIGHT: point-in-time safe using knowledge_at
-- Only returns data that was publicly available on 2022-01-01.
-- If a company filed its 2020 10-K late (e.g. Feb 2022),
-- it will NOT appear in this query -- correct behavior.
SELECT
  r.symbol,
  fa.numeric_value / 1e9 AS revenue_billions,
  fa.knowledge_at                            -- visible timestamp
FROM references r
JOIN filing f  ON f.entity_id = r.cik
JOIN fact fa   ON fa.accession_id = f.accession_id
WHERE fa.standard_concept = 'Revenues'
  AND f.form_type         = '10-K'
  AND fa.knowledge_at     <= '2022-01-01'   -- RIGHT: PIT filter
ORDER BY revenue_billions DESC;

Survivorship Bias

Look-ahead bias is temporal — using today's data in the past. Survivorship bias is structural — only analyzing companies that still exist today. Both inflate backtest returns and both are invisible unless your dataset is specifically built to prevent them.

🏚️

Delisted companies

Valuein tracks all entities including those that were delisted, acquired, or went bankrupt. The full plan includes 12,000+ tickers — active and inactive.

📅

Historical index membership

The index_membership table records exact start and end dates for each company in each index. A 2010 S&P 500 backtest uses the 2010 constituents, not today's.

🌐

PIT universe construction

Use get_pit_universe(as_of_date) to reconstruct the exact investable universe on any historical date — free of additions that happened after.

Survivorship-bias-free universe construction (SQL)
-- Build a survivorship-bias-free universe for March 2020
-- This returns exactly who was in the S&P 500 on that date --
-- before COVID additions/removals, before failures, before mergers.
SELECT
  cik,
  ticker,
  name,
  sector
FROM get_pit_universe(
  as_of_date => '2020-03-01',
  index       => 'SP500'
);

-- WRONG alternative (survivorship bias):
-- Using the current S&P 500 list for 2020 data excludes
-- companies that were dropped and includes companies that
-- didn't exist in the index yet.

PIT in the Python SDK

Every SDK method that returns time-series data accepts an as_of_date parameter. Pass it to transparently filter by knowledge_at.

from valuein_sdk import ValueinClient, ValueinError

try:
    with ValueinClient() as client:

        # PIT-safe: only data known as of the backtest date
        df = client.query("""
            SELECT r.symbol, fa.fiscal_year,
                   fa.numeric_value / 1e9 AS revenue_bn,
                   fa.knowledge_at
            FROM fact fa
            JOIN references r USING (entity_id)
            WHERE r.symbol              = 'AAPL'
              AND fa.standard_concept   = 'TotalRevenue'
              AND fa.fiscal_period      = 'FY'
              AND fa.knowledge_at      <= '2023-01-01'   -- PIT filter
            ORDER BY fa.fiscal_year DESC
            LIMIT 10
        """)

        # All rows have knowledge_at <= 2023-01-01
        print(df[["fiscal_year", "revenue_bn", "knowledge_at"]])

except ValueinError as e:
    print(f"Error: {e}")

Frequently Asked Questions

Why is knowledge_at sometimes later than filing_date?

The SEC processes filings asynchronously. A filing submitted on February 14 may not receive its EDGAR acceptance timestamp until late that evening or early the next day. knowledge_at captures the exact millisecond of acceptance — always later than or equal to the submission time.

Can I trust filing_date for PIT backtests?

It's usable for rough filtering but knowledge_at is strictly more accurate. Some data providers conflate the two. In Valuein's schema, filing_date is a DATE (day precision) and knowledge_at is a TIMESTAMPTZ (millisecond precision). For production backtests, always use knowledge_at.

How do I handle the fact that Q2 and Q3 10-Q cash flow figures are year-to-date?

Use COALESCE(derived_quarterly_value, numeric_value) on cash flow concepts. The pipeline computes derived_quarterly_value for Q2 and Q3 by subtracting the prior period YTD. This makes all quarters directly comparable without manual adjustments.

Does the sample tier support PIT queries?

Yes. The sample tier includes knowledge_at on all fact rows. You can build and validate PIT query patterns on free sample data before upgrading to sp500 or full.

Ready to build a PIT-safe backtest?

Start with the free sample tier — all PIT fields are included. Upgrade for full S&P 500 history or the complete 12,000+ ticker universe.