MCP Server live — AI agents can now query 105M+ SEC facts. Connect your agent →
ValueinValuein
Schema

Parquet Schema Reference

Eight ZSTD-compressed Parquet tables. 105M+ facts, 12,000+ tickers, 1994–present. All tables use consistent join keys — cik for companies, accession_id for filings.

8
Tables
105M+
Facts
12,000+
Tickers
ZSTD
Compression

Entity-Relationship Diagram

All tables link via two universal keys: cik (company) and accession_id (filing). The references table is a derived flat join — use it as your query starting point to avoid manual joins.

┌─────────────┐     ┌────────────────┐     ┌──────────────────┐
│   entity    │     │    security    │     │ index_membership │
│─────────────│     │────────────────│     │──────────────────│
│ cik (PK)   │──┐  │ id (PK)        │──┐  │ id (PK)          │
│ name        │  └──│ entity_id (FK) │  └──│ security_id (FK) │
│ sector      │     │ symbol         │     │ index_name       │
│ sic_code    │     │ exchange       │     │ start_date       │
│ status      │     │ is_active      │     │ end_date         │
└─────────────┘     └────────────────┘     └──────────────────┘
       │
       │            ┌─────────────────┐
       └────────────│     filing      │
                    │─────────────────│
                    │ accession_id(PK)│
                    │ entity_id (FK)  │
                    │ form_type       │
                    │ filing_date     │
                    │ accepted_at     │
                    └─────────────────┘
                            │
                            └────────────┌──────────────────┐
                                         │       fact       │
                                         │──────────────────│
                                         │ entity_id (FK)   │
                                         │ accession_id(FK) │
                                         │ standard_concept │
                                         │ numeric_value    │
                                         │ knowledge_at     │
                                         └──────────────────┘

 ┌────────────────────────────────────────────────────────┐
 │  references  (derived — start every cross-co query here)│
 │────────────────────────────────────────────────────────│
 │  cik, name, sector, industry, sic_code, status        │
 │  symbol, exchange, is_active                           │
 │  is_sp500  ← replaces 3-table join                    │
 └────────────────────────────────────────────────────────┘

Tables

entity

~12,000+ rowsPK: cik

One row per SEC-registered company. Contains company profile data: name, sector, SIC code, location, CEO, description, and status. The root table — all other tables link back via entity_id = entity.cik.

Tip: Always filter status = 'ACTIVE' unless you specifically need delisted or inactive companies.
23 fieldsBrowse in catalog

security

~18,000+ rowsPK: id

Exchange listings with SCD Type 2 history. Multiple rows per company — one per ticker/exchange combination. FIGI identifiers included for cross-system mapping.

Tip: Filter is_active = TRUE for current tickers. Use valid_from / valid_to for historical ticker reconstruction.
17 fieldsBrowse in catalog

filing

~800,000+ rowsPK: accession_id

SEC EDGAR filing index. One row per accepted filing. The bridge between companies and their financial facts. accepted_at is the source for fact.knowledge_at.

Tip: Use form_type = '10-K' for annual data, '10-Q' for quarterly. Watch is_amendment for restated filings.
11 fieldsBrowse in catalog

fact

~105M+ rowsPK: entity_id + accession_id + standard_concept

The core financial data table. 105M+ XBRL-sourced data points normalized to ~150 standard_concept labels. Each row is a single financial value for a specific company, period, and concept.

Tip: Start with standard_concept (not raw concept) for cross-company queries. Use knowledge_at for PIT filtering. Use COALESCE(derived_quarterly_value, numeric_value) for cash flow metrics.
30 fieldsBrowse in catalog

valuation

~50,000+ rowsPK: id

Pipeline-computed DCF and DDM intrinsic values. One row per company per valuation run. Includes WACC, growth rates, and margin of safety vs. market price.

Tip: Always check data_quality field. model_type = 'dcf' is the primary model. DDM only applies to dividend-paying companies.
26 fieldsBrowse in catalog

taxonomy_guide

~~150 rowsPK: standard_concept

Lookup table for standard_concept labels used in the fact table. Maps canonical names to human-readable labels, definitions, and GAAP taxonomy references.

Tip: Join fact on standard_concept to get human_name and definition for any concept.
13 fieldsBrowse in catalog

index_membership

~50,000+ rowsPK: id

Historical index constituent records. One row per security/index/membership-period. Essential for point-in-time universe construction — prevents survivorship bias.

Tip: Filter end_date IS NULL for current members. Use start_date / end_date for historical constituent reconstruction.
5 fieldsBrowse in catalog

references

Start here
~18,000+ rowsPK: symbol + cik

Derived flat join of entity + security + index_membership. One row per security. Eliminates the need for 3-table joins in most queries. Always start here.

Tip: Use is_sp500 = TRUE to filter to current S&P 500. is_active = TRUE for current tickers. This is the recommended entry point for all cross-company queries.
18 fieldsBrowse in catalog

Key Columns Guide

The most important columns across tables. Master these and you can write any query.

ColumnTableTypeNullableDescription
knowledge_at
PIT
factTIMESTAMPTZNoPIT timestamp — when this fact became publicly known via SEC EDGAR acceptance. Use knowledge_at <= trade_date for look-ahead-free backtests.
standard_concept
factVARCHARYesCanonical normalized concept name (e.g. Revenues, NetIncomeLoss). Use this — not raw concept — for cross-company queries.
numeric_value
factFLOATYesParsed financial value. NULL for non-numeric XBRL facts. Always use ABS() on capex; divide by NULLIF(denom, 0) on ratios.
derived_quarterly_value
factFLOATYesQ2/Q3 10-Q cash flows adjusted from YTD to single-quarter. Use COALESCE(derived_quarterly_value, numeric_value) for all cash flow metrics.
is_sp500
referencesBOOLEANNoTRUE if this security is currently a member of the S&P 500. Fastest filter for S&P 500 screens — no join to index_membership needed.
is_active
references / securityBOOLEANNoTRUE if this ticker is currently active (valid_to IS NULL). Filter to avoid duplicate rows from ticker changes.
accession_id
filingVARCHARNoSEC EDGAR accession number — the unique identifier for a filing. Format: XXXXXXXXXX-YY-ZZZZZZ. The join key between filing and fact.
cik
entity / referencesVARCHARNoSEC Central Index Key — 10-digit company identifier. The universal join key across all tables. entity.cik = security.entity_id = filing.entity_id = fact.entity_id.

All Fields by Table

Quick-reference field list. For full descriptions and SEC XBRL tag mappings, use the Data Catalog.

entity

23 fields
FieldTypeDescription
cikVARCHARSEC Central Index Key — 10-digit unique company identifier
nameTEXTLegal registered company name
leiVARCHARLegal Entity Identifier (ISO 17442 20-character code)
industryTEXTIndustry classification description
sectorVARCHARBroad market sector (e.g. Technology, Healthcare, Financials)
sic_codeVARCHARStandard Industrial Classification code (4-digit)
fiscal_year_endVARCHARFiscal year end as MMDD (e.g. 1231 = December 31)
locationTEXTCompany headquarters location (city, state, country)
websiteTEXTCompany primary website URL
ceoTEXTChief Executive Officer name at most recent filing
founded_yearSMALLINTYear the company was founded
descriptionTEXTCompany business description extracted from SEC filings
statusVARCHAREntity status: ACTIVE, INACTIVE, or DELISTED. Filter status != 'ACTIVE' to include all non-active companies
entity_typeTEXTSEC-defined filer category (e.g. Non-accelerated Filer, Large Accelerated Filer)
einTEXTEmployer Identification Number (IRS tax ID)
owner_orgTEXTParent or controlling organization name
investor_websiteTEXTInvestor relations website URL
categoryTEXTSEC registrant category
state_of_incorporationTEXTUS state or country of incorporation
phoneTEXTCompany contact phone number
addressTEXTPrincipal registered business address
last_seen_atDATEDate this entity last appeared in any SEC filing
ingested_atTIMESTAMPTZTimestamp when Valuein first ingested this entity record

security

17 fields
FieldTypeDescription
idINTEGERSurrogate primary key for the security record
entity_idVARCHARForeign key to entity.cik — the issuing company
symbolVARCHARExchange ticker symbol
exchangeTEXTStock exchange name (e.g. NASDAQ, NYSE)
micVARCHARMarket Identification Code (ISO 10383)
valid_fromDATESCD Type 2 start date — when this ticker became active on this exchange
valid_toDATESCD Type 2 end date — when this ticker was retired. NULL means currently active
is_activeBOOLEANDerived: TRUE when valid_to IS NULL. Use to filter current tickers only
figiVARCHARFinancial Instrument Global Identifier at share-class level (OpenFIGI)
composite_figiVARCHARComposite FIGI at the exchange level
share_class_figiVARCHARShare class FIGI
security_typeVARCHARInstrument type (e.g. Common Stock, ADR, ETF)
market_sectorVARCHARFIGI market sector classification
figi_tickerVARCHARTicker symbol as recorded in the OpenFIGI registry
figi_nameTEXTSecurity name from the OpenFIGI registry
figi_exch_codeVARCHARExchange code from the OpenFIGI registry
ingested_atTIMESTAMPTZTimestamp when Valuein first ingested this security record

filing

11 fields
FieldTypeDescription
accession_idVARCHARSEC EDGAR accession number — unique filing identifier (format: XXXXXXXXXX-YY-ZZZZZZ)
entity_idVARCHARForeign key to entity.cik — the filing company
filing_dateDATEDate the SEC received and accepted the filing. Use this for PIT backtest filters — not report_date
form_typeVARCHARFiling type: 10-K (annual), 10-Q (quarterly), 20-F (foreign annual), 8-K (current report), etc.
filing_urlTEXTFull URL to the filing document on SEC EDGAR
is_amendmentBOOLEANTRUE if this is an amended filing (e.g. 10-K/A, 10-Q/A)
amendment_noSMALLINTAmendment sequence number — 0 for the original filing, 1 for the first amendment, etc.
report_dateDATEFiscal period end date covered by this filing. Using this as a PIT filter introduces look-ahead bias — use filing_date instead
itemsTEXT[]Array of 8-K item numbers reported (e.g. ['1.01', '9.01']). NULL for non-8-K filings
accepted_atTIMESTAMPTZExact timestamp the SEC accepted the filing (millisecond precision). Source for fact.knowledge_at
ingested_atTIMESTAMPTZTimestamp when Valuein ingested this filing record

fact

30 fields
FieldTypeDescription
entity_idVARCHARForeign key to entity.cik
accession_idVARCHARForeign key to filing.accession_id
conceptTEXTRaw XBRL tag as filed by the company (e.g. us-gaap:Revenues). Use standard_concept for cross-company comparisons
standard_conceptVARCHARCanonical normalized concept label (e.g. TotalRevenue). Maps 10K+ raw XBRL tags to ~150 standard concepts for cross-company queries
prioritySMALLINTStandardization priority — lower value = higher confidence in the concept mapping
labelTEXTHuman-readable label from the company's XBRL filing context
valueTEXTRaw string value as filed in the XBRL document
numeric_valueFLOATParsed numeric value. NULL for non-numeric or string facts
unitVARCHARUnit of measure: USD, shares, pure (dimensionless ratio), etc.
period_startDATEReporting period start date. NULL for instant/point-in-time facts (e.g. balance sheet items)
period_endDATEReporting period end date. Table is range-partitioned on this column by year
fiscal_yearSMALLINTFiscal year number (e.g. 2024)
fiscal_periodVARCHARFiscal period label: FY (annual), Q1, Q2, Q3, Q4
frameVARCHARXBRL frame identifier linking this fact to a specific reporting period context
statement_typeVARCHARFinancial statement category: income, balance_sheet, cash_flow, dei, notes
accuracy_scoreREALStandardization confidence: 1.0 = manual/exact, 0.85 = taxonomy match, 0.55 = keyword fallback, 0.0 = unmapped. Use >= 0.70 for analytics
confidence_scoreFLOATComposite data quality confidence metric (0.0–1.0)
is_negativeBOOLEANTRUE if the originally reported value was negative
is_auditedBOOLEANTRUE if this fact comes from an audited annual filing (10-K or 20-F)
is_estimatedBOOLEANTRUE if the value is estimated rather than directly reported
data_qualityTEXTData quality flag: reported, estimated, or provisional
knowledge_atTIMESTAMPTZPoint-in-time timestamp — when this fact became publicly known via SEC EDGAR. Filter knowledge_at <= trade_date for look-ahead-free backtests
ingested_atTIMESTAMPTZTimestamp when Valuein ingested this fact
accounting_standardVARCHARAccounting framework: US-GAAP or IFRS
reporting_currencyVARCHARISO 4217 currency code for the reported value (e.g. USD, EUR, GBP)
period_span_daysINTEGERNumber of calendar days spanned by this reporting period
is_cumulativeBOOLEANTRUE if the value is cumulative year-to-date rather than for the period in isolation (common in 10-Q cash flow statements)
derived_quarterly_valueFLOATQuarterly value derived by subtracting prior YTD periods. Use COALESCE(derived_quarterly_value, numeric_value) for single-quarter cash flow metrics
derivation_typeVARCHARMethod used to derive the quarterly value: ytd_subtraction, direct, or null if not derived
derivation_uncertainBOOLEANTRUE if the quarterly derivation has reduced confidence due to fiscal period ambiguity

valuation

26 fields
FieldTypeDescription
idBIGINTSurrogate primary key for the valuation record
entity_idVARCHARForeign key to entity.cik — the company being valued
security_idINTEGERForeign key to security.id — the specific security valued (NULL if company-level only)
valuation_dateDATEDate the valuation was computed, based on the most recent available financials at that point in time
model_typeVARCHARValuation model used: dcf (Discounted Cash Flow) or ddm (Dividend Discount Model)
model_versionVARCHARVersion string of the valuation model algorithm (e.g. 1.0)
intrinsic_valueNUMERICTotal estimated intrinsic equity value in USD
per_share_valueNUMERICIntrinsic value per diluted share in USD
margin_of_safetyNUMERICPercentage discount of current market price to intrinsic value. Positive = undervalued, negative = overvalued
valuation_labelVARCHARSummary label derived from margin of safety: undervalued, fairvalued, or overvalued
base_earningsNUMERICBase earnings or free cash flow used as the primary model input (USD)
growth_rateNUMERICNear-term earnings or FCF growth rate assumption applied in the high-growth stage
terminal_rateNUMERICPerpetuity terminal growth rate assumption for the Gordon Growth terminal value calculation
discount_rateNUMERICDiscount rate applied: WACC for DCF models, required return for DDM models
growth_yearsINTEGERNumber of years modelled in the explicit high-growth stage before applying terminal value
terminal_yearsINTEGERNumber of years used to estimate the terminal value horizon
shares_outstandingBIGINTDiluted shares outstanding used to convert total equity value to a per-share figure
current_priceNUMERICMarket price per share at the valuation date (USD) — used to compute margin of safety
base_dividendNUMERICAnnual dividend per share used as the DDM base input (USD)
dividend_growth_rateNUMERICExpected long-term dividend growth rate used in the DDM Gordon Growth Model
payout_ratioNUMERICDividend payout ratio (dividends ÷ earnings) used in DDM calibration
data_qualityVARCHARQuality of the inputs used: reported (direct from filings), estimated (derived), or provisional (incomplete data)
source_periodVARCHARFiscal period used as the model input (e.g. FY2024, TTM-2024Q3)
assumptionsJSONBJSON overflow field for additional model parameters and sensitivity analysis inputs
created_atTIMESTAMPTZTimestamp when this valuation record was first computed
updated_atTIMESTAMPTZTimestamp when this valuation record was last recomputed (ON CONFLICT DO UPDATE)

taxonomy_guide

13 fields
FieldTypeDescription
standard_conceptTEXTCanonical concept name (primary key) — matches fact.standard_concept for joins
human_nameTEXTHuman-readable display name for the concept (e.g. Total Revenue)
definitionTEXTPlain-English definition of what this financial concept measures
unit_typeVARCHARExpected unit of measure: monetary, shares, ratio, pure, etc.
balance_typeVARCHARAccounting balance type: debit or credit
source_referenceTEXTUS GAAP taxonomy reference identifier for this concept
levelSMALLINTHierarchy level in the taxonomy structure (lower = higher in the hierarchy)
statement_typeVARCHARFinancial statement this concept belongs to: income, balance_sheet, cash_flow, dei, notes
display_orderSMALLINTPreferred display order within its statement type for financial statement presentation
is_activeBOOLEANTRUE if this concept is actively used in the current GAAP taxonomy
created_atTIMESTAMPTZTimestamp when this concept was added to the taxonomy guide
updated_atTIMESTAMPTZTimestamp when this concept definition was last updated
gaap_ifrs_comparableBOOLEANTRUE if this concept has a directly comparable IFRS equivalent

index_membership

5 fields
FieldTypeDescription
idINTEGERSurrogate primary key
security_idINTEGERForeign key to security.id — the index constituent
index_nameVARCHARIndex identifier (e.g. SP500, RUSSELL2000, MSCI_US)
start_dateDATEDate the security was added to the index
end_dateDATEDate the security was removed from the index. NULL means it is currently a member

references

18 fields
FieldTypeDescription
cikVARCHARSEC Central Index Key — 10-digit unique company identifier (from entity)
nameTEXTLegal registered company name (from entity)
sectorTEXTBroad market sector (from entity)
industryTEXTIndustry classification description (from entity)
sic_codeVARCHARStandard Industrial Classification code 4-digit (from entity)
statusVARCHAREntity status: ACTIVE, INACTIVE, or DELISTED (from entity)
entity_typeVARCHARSEC-defined filer category (from entity)
security_idINTEGERSurrogate PK of the security record (from security)
symbolVARCHARExchange ticker symbol (from security)
exchangeVARCHARStock exchange name e.g. NASDAQ, NYSE (from security)
micVARCHARMarket Identification Code ISO 10383 (from security)
valid_fromDATESCD Type 2 start date — when this ticker became active (from security)
valid_toDATESCD Type 2 end date — NULL means currently active (from security)
is_activeBOOLEANTRUE when valid_to IS NULL — use to filter current tickers only (from security)
figiVARCHARFinancial Instrument Global Identifier share-class level (from security)
composite_figiVARCHARComposite FIGI at the exchange level (from security)
share_class_figiVARCHARShare class FIGI (from security)
is_sp500BOOLEANTRUE if the security is currently an S&P500 constituent (from index_membership)

DuckDB Usage

All Parquet files load natively with DuckDB's read_parquet(). DuckDB handles ZSTD decompression automatically and supports lazy projection — only the columns you SELECT are read from disk.

-- Load a table directly from disk
SELECT *
FROM read_parquet('fact.parquet')
WHERE standard_concept = 'Revenues'
  AND knowledge_at <= '2023-01-01'
LIMIT 10;

-- Query directly from the API URL (presigned or streamed)
SELECT r.symbol, fa.numeric_value / 1e9 AS revenue_billions
FROM read_parquet('https://data.valuein.biz/v1/sp500/references') r
JOIN read_parquet('https://data.valuein.biz/v1/sp500/fact') fa
  ON fa.entity_id = r.cik
WHERE r.is_sp500 = TRUE
  AND fa.standard_concept = 'Revenues'
  AND fa.knowledge_at <= '2024-01-01'
ORDER BY revenue_billions DESC
LIMIT 20;

ZSTD compression

All files are ZSTD-compressed. Typical compression ratio is 6–10x vs uncompressed. DuckDB decompresses transparently.

Column pruning

SELECT only the columns you need — DuckDB reads only those column chunks from the Parquet file, not the full table.

Predicate pushdown

DuckDB pushes WHERE clauses into Parquet row group statistics, skipping groups that cannot contain matching rows.

Explore the full field catalog

Search and filter all fields with SEC XBRL tag mappings, types, and example values in the interactive Data Catalog.