NEW: Access all datasets via MCP Server — no SDK required. Any AI agent, any MCP-compatible client.

Financial Dataset

SEC EDGAR data your backtest can trust.

Name: Valuein SEC EDGAR Financial Dataset
Creator: Valuein
License: https://valuein.biz/data-license

111M+ financial facts from 19,000+ entities spanning 30+ years. Point-in-time accurate. Zero survivorship bias. Query it with DuckDB, Python, or the MCP Server — then layer on the 78M-row smart-money dataset.

Get API Access View Data Catalog

111M+

Financial Facts

19,000+

Entities

30+

Years of History

78M+

Smart-Money Rows

Data Source

Built on the authoritative source for US public company financials

The SEC requires every public company to file structured XBRL financial statements. These filings are published as the EDGAR Financial Statements Data Sets — quarterly ZIP archives containing machine-readable data for every 10-K, 10-Q, 8-K, and 20-F filed with the Commission.

Each quarterly release includes five core files: num.txt (numeric values), sub.txt (submission metadata), tag.txt (XBRL tag definitions), pre.txt (presentation linkbase), and cal.txt (calculation linkbase).

Source Details

PublisherU.S. Securities and Exchange Commission

DatasetEDGAR Financial Statements Data Sets

FormatQuarterly ZIP archives (TSV files)

Update FrequencyQuarterly + amendments filed continuously

CoverageAll SEC-registered entities filing XBRL (2009+, with historical data back to 1993)

Source URLsec.gov/dera/data/financial-statement-data-sets

Processing Pipeline

From raw SEC filings to queryable Parquet in 10 steps

Every financial fact passes through a deterministic pipeline that standardizes, enriches, and validates before export. No manual intervention. No estimation.

SEC EDGAR Ingestion

Quarterly XBRL bulk downloads from SEC EDGAR Financial Statements Data Sets are ingested automatically. Each release contains num.txt, sub.txt, tag.txt, pre.txt, and cal.txt files covering every public filing.

XBRL Submission Parsing

Every XBRL submission in the quarterly dump is parsed. Filing metadata, entity info, tagged numeric values, and calculation linkbases are extracted and validated.

Entity & Security Normalization

CIK numbers are resolved to standardized entity records. SIC codes are mapped to sectors and industries. Exchange information, ticker symbols, and CUSIP identifiers are enriched from multiple sources.

Concept Standardization

11,966 raw XBRL tags are mapped to 292 canonical standard_concept values. Revenue synonyms, debt variants, and custom extensions all resolve to a single canonical concept name; unmapped tags fall through to Other.

Point-in-Time Indexing

Every fact receives a accepted_at timestamp equal to the SEC acceptance date of the filing that introduced it. No backfilling, no estimation. What was known on any date is precisely queryable.

Amendment Reconciliation

10-K/A and 10-Q/A filings (restated financials) are tracked separately. Original values and restated values coexist in the dataset, each with their own accepted_at timestamp.

Derived Quarterly Values

Q2 and Q3 cash flow statements in 10-Qs report year-to-date totals. The pipeline computes the incremental quarterly figure and stores it as derived_quarterly_value alongside the raw YTD number.

Index Membership Enrichment

S&P500 and Russell 1000/2000/3000 membership is tracked historically in index_membership with effective_date / removal_date and [) interval semantics. JOIN references on cik = cik for any membership question (current or historical) — there is no is_sp500 flag.

Parquet Export

Column-oriented Parquet files with ZSTD compression are generated for each tier. Optimized for DuckDB, Polars, and Spark. Exported to distributed object storage after every EDGAR quarterly release.

Manifest Update

manifest.json records the snapshot date, last_updated timestamp, and row counts for every table. SDKs and integrations use this to detect fresh data automatically.

Schema

Eleven core tables — seventeen with smart-money

Each is a column-oriented Parquet file with ZSTD compression — query with DuckDB, Polars, Spark, or any engine that reads Parquet. The key core tables are below; the six FULL-tier smart-money tables live in the smart-money dataset.

How the core tables relate — entity is the hub; every fact ties to an entity and the SEC filing it came from. Hover a table to trace its joins.

Core financialsReferenceDerived

Relationships (text)

security references entity via entity_id → cik (many → 1)
filing references entity via entity_id → cik (many → 1)
fact references entity via entity_id → cik (many → 1)
fact references filing via accession_id (many → 1)
fact references standard_concept via standard_concept (many → 1)
standard_concept references taxonomy_guide via standard_concept (many → 1)
ratio references entity via entity_id → cik (many → 1)
references references entity via cik (many → 1)
index_membership references references via cik = cik (many → 1)

entity

19,888 rows

Every SEC-registered entity that has filed XBRL financial statements. Includes active, delisted, bankrupt, and acquired companies.

Show 14 columns ↓

ciknamesic_codesic_descriptionsectorindustrystate_of_incorporationfiscal_year_endbusiness_addressmailing_addressformer_namesis_foreignflagscategory

security

9,099 rows

Ticker symbols and exchange listings (SCD Type 2 with valid_from / valid_to). One entity may have multiple securities — filter is_primary_ticker = TRUE for one row per CIK.

Show 14 columns ↓

identity_idsymbolexchangemicvalid_fromvalid_tois_activeis_primary_tickerfigicomposite_figishare_class_figisecurity_typemarket_sector

filing

2.35M rows

Every XBRL filing processed: 10-K, 10-Q, 8-K, 20-F, and their amendments since 1993. accepted_at is the SEC acceptance timestamp; superseded_by chains the amendment lineage.

Show 18 columns ↓

accession_identity_idform_typecore_typefiling_datereport_dateaccepted_atis_amendmentamendment_nosuperseded_byis_xbrlis_inline_xbrlis_xbrl_numericis_auditedprimary_documentsizefile_numberact

fact

111M+ rows

Core Table

The core table. Every standardized financial fact extracted from every filing. Supports point-in-time queries via accepted_at, quarterly derivation via derived_quarterly_value, and Bloomberg Option-C view (value_current vs. value_as_filed) for restated values.

Show 19 columns ↓

fact_identity_idaccession_idconceptstandard_conceptnumeric_valuederived_quarterly_valuevalue_currentvalue_as_filedfirst_filed_atrestatedunitreporting_currencyfiscal_yearfiscal_periodperiod_endperiod_span_daysis_cumulativeaccepted_at

valuation

42,100 rows

Pre-computed intrinsic value estimates per entity. Multiple model_type rows coexist per (entity_id, valuation_date): 'dcf', 'dcf_fcf', 'ddm'. Recomputed each pipeline run; not point-in-time.

Show 10 columns ↓

entity_idvaluation_datemodel_typeper_share_valuecurrent_pricemargin_of_safetyvaluation_labeldiscount_rategrowth_rateterminal_rate

taxonomy_guide

292 rows

Definitions for every standard_concept used in the fact table — human_name, definition, unit_type, balance_type, and source_reference (US-GAAP taxonomy reference).

Show 6 columns ↓

standard_concepthuman_namedefinitionunit_typebalance_typesource_reference

index_membership

6,485 rows

Historical index constituents (SP500, RUSSELL1000, RUSSELL2000, RUSSELL3000). Keyed on cik (since migration 0015). [) interval semantics. JOIN references on cik = cik to attach company metadata.

Show 10 columns ↓

cikindex_nameeffective_dateremoval_dateannouncement_dateremoval_announcement_dateremoval_reasonsuccessor_ciksourceconfidence

references

9,099 rows

Start Here

Derived flat join of entity + security. One row per security. Eliminates 2-table joins for company metadata. The starting point for any cross-company analysis. For index membership (current or historical), JOIN with index_membership on cik = cik.

Show 14 columns ↓

ciksymbolnamesectorindustryexchangemicis_activevalid_fromvalid_tosic_codeentity_typefigicomposite_figi

ratio

5.3M rows

Pipeline-computed financial ratios per entity per fiscal period. Append-on-restatement: accepted_at is the PIT vintage (max accepted_at of the input facts) — filter accepted_at <= as_of then take the latest vintage to avoid look-ahead bias. Filter by category for grouped screens.

Show 13 columns ↓

entity_idratio_namecategoryvalueconfidence_scoreunitperiod_endfiscal_yearfiscal_periodis_ttmaccepted_atcomputed_atingested_at

Browse full schema with types and descriptions

Point-in-Time Accuracy

Know exactly what was known, and when

Every fact in the dataset carries a accepted_at timestamp — the exact date and time the SEC accepted the filing that introduced that fact.

This is critical for backtesting. Without point-in-time data, your 2015 backtest unknowingly uses data that was only available in 2016 (look-ahead bias). The result: inflated returns that evaporate in live trading.

Concrete Example: AAPL Q1 FY2020

April 30, 2020: Apple files 10-Q for Q1 FY2020. Revenue: $58.3B. accepted_at = 2020-04-30.

June 15, 2020: Apple files 10-Q/A with restated figures. Revised revenue: $58.3B (confirmed). accepted_at = 2020-06-15.

Your backtest on May 1, 2020: Filtering by accepted_at <= '2020-05-01' returns only the original filing. The amendment is invisible — exactly as it would have been in real time.

Point-in-time query with the Python SDK

Fetch AAPL revenue as it was known on a specific date

Show the SDK query ↓

point_in_time.pypython

from valuein_sdk import ValueinClient, ValueinError sql = """SELECT  r.symbol,  f.filing_date,  f.period_end,  f.accepted_at,  fa.numeric_value / 1e9 AS revenue_billions,  f.form_typeFROM references rJOIN filing f   ON f.entity_id = r.cikJOIN fact fa    ON fa.accession_id = f.accession_idWHERE r.symbol = 'AAPL'  AND fa.standard_concept = 'Revenues'  AND f.form_type IN ('10-Q', '10-Q/A')  AND f.accepted_at <= '2020-05-01'ORDER BY f.period_end DESC, f.accepted_at DESCLIMIT 5;""" try:    with ValueinClient() as client:        df = client.run_query(sql)        print(df)except ValueinError as e:    print(f"Valuein error: {e}")

Why this matters

Eliminates look-ahead bias in walk-forward backtests
Supports event studies around filing dates
Amendment tracking shows original vs. restated values
Reproduces any historical research state exactly

Zero Survivorship Bias

Every company that ever filed. Including the ones that failed.

Most financial datasets only include companies that are still active today. That means your backtest never considers the Enrons, the Lehmans, the RadioShacks — companies that went bankrupt and dragged portfolios down.

The result? Inflated historical returns that don't replicate in live trading. Academic research estimates survivorship bias overstates annual returns by 1-2 percentage points.

Valuein includes every entity that ever filed XBRL financial statements with the SEC. Delisted, bankrupt, acquired, merged — they are all here, with their complete filing history up to the date they ceased operations.

Point-in-time and survivorship-free are the headline guarantees; the trust & security overview covers provenance, zero-retention, and reliability, and the methodology documents exactly how each is constructed.

Universe Composition

Active Entities

~8,000

Delisted / Acquired / Bankrupt

~8,000

~50% of the 19,000-entity universe consists of companies that are no longer actively trading. Excluding them fundamentally distorts any historical analysis.

Notable companies in the full universe

Enron Corp

Lehman Brothers

RadioShack

Toys R Us

Blockbuster

WorldCom

Bear Stearns

Washington Mutual

Kodak

Sears Holdings

All with complete financial statements through their final SEC filing.

Concept Standardization

11,966 XBRL tags. 292 standardized concepts.

The XBRL taxonomy is sprawling. Apple reports revenue as RevenueFromContractWithCustomerExcludingAssessedTax. Older filings use SalesRevenueNet. Some companies create custom extensions entirely. Cross-company analysis becomes impossible without standardization.

Valuein maps every raw tag to a canonical standard_concept — while preserving the original tag in the taxonomy_guide table. No black box. Full provenance.

Raw XBRL Tag		Standardized Concept
`us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax`		`Revenues`
`us-gaap:SalesRevenueNet`		`Revenues`
`us-gaap:Revenues`		`Revenues`
`us-gaap:SalesRevenueGoodsNet`		`Revenues`
`us-gaap:RevenueFromContractWithCustomerIncludingAssessedTax`		`Revenues`
`custom:TotalNetRevenues`		`Revenues`

This is just one concept. Revenue alone has 80+ raw XBRL synonyms across the filing universe. The taxonomy_guide table documents every mapping — browse it in the Data Catalog.

Amendment Tracking

Original and restated values, side by side

When a company files a 10-K/A or 10-Q/A, it is restating previously reported financial data. Most datasets silently overwrite the original values. Valuein keeps both.

The original filing and the amendment each have their own accepted_at timestamp. The is_amendment flag on the filing table distinguishes them. You can query original-only, amended-only, or compare both.

10-K/A: Amended annual report — restated annual financials
10-Q/A: Amended quarterly report — restated quarterly financials
Both original and restated values stored with distinct accepted_at
is_amendment flag on the filing table for easy filtering

Query both original and restated values

Compare a company's original 10-K with its amendment

Show the SDK query ↓

amendments.pypython

from valuein_sdk import ValueinClient, ValueinError sql = """SELECT  r.symbol,  f.form_type,  f.filing_date,  f.accepted_at,  f.is_amendment,  fa.standard_concept,  fa.numeric_value / 1e9 AS value_billionsFROM references rJOIN filing f   ON f.entity_id = r.cikJOIN fact fa    ON fa.accession_id = f.accession_idWHERE r.symbol = 'XYZ'  AND fa.standard_concept = 'Revenues'  AND f.form_type IN ('10-K', '10-K/A')  AND f.period_end = '2023-12-31'ORDER BY f.accepted_at ASC;""" try:    with ValueinClient() as client:        df = client.run_query(sql)        print(df)except ValueinError as e:    print(f"Valuein error: {e}")

Coverage

Coverage at a glance

Filing Types Covered

10-K

Annual report

10-Q

Quarterly report

8-K

Current report (material events)

20-F

Annual report (foreign private issuers)

10-K/A

Annual report amendment

10-Q/A

Quarterly report amendment

Date Range1993 to present

Update FrequencyQuarterly (with continuous amendments)

Standardized Concepts292

Raw XBRL Tags Mapped11,966 → 292

Delivery FormatParquet with ZSTD compression

Tier Breakdown

S&P500Free

Free

S&P500 · 500+ tickers · 1993–present

60 req/min · 1,000 req/hr

ProPopular

$49/mo

Active + delisted US universe · 19,000+ entities · 15-year history (2011→present)

100 req/min · 3,000 req/hr

Institutional

$499/mo

Two datasets: fundamentals (111M+ facts, 19,000+ entities, 1993–present) + smart-money (~78M rows · 6 tables: Forms 3/4/5/144 + 13F/13D/13G)

300 req/min · 10,000 req/hr

Tier	Data scope	Rate limit	Price
S&P500Free	S&P500 · 500+ tickers · 1993–present	60 req/min · 1,000 req/hr	Free
ProPopular	Active + delisted US universe · 19,000+ entities · 15-year history (2011→present)	100 req/min · 3,000 req/hr	$49/mo
Institutional	Two datasets: fundamentals (111M+ facts, 19,000+ entities, 1993–present) + smart-money (~78M rows · 6 tables: Forms 3/4/5/144 + 13F/13D/13G)	300 req/min · 10,000 req/hr	$499/mo

Compare plans in detail

Start querying 111M+ facts today

Register free to access the full S&P500 universe — no credit card required. Pro full-universe + 15-year history at $49/mo. Institutional with smart-money data (insider + institutional ownership) + webhooks + redistribution at $499/mo.

Get API Access Python SDK Docs

Also available via direct Parquet download.