Workspace beta is live — BYO-LLM chat wired to 57 SEC tools. Try it free →
ValueinValuein
Schema

Parquet Schema Reference

Fifteen ZSTD-compressed Parquet tables — a 9-table fundamentals dataset plus the 6-table smart-money dataset (Institutional). 111M+ facts, ~78M smart-money rows, 19,000+ entities, 1993–present. All tables use consistent join keys — cik for companies, accession_id for filings.

15
Tables
111M+
Facts
78M+
Smart-Money Rows
ZSTD
Compression

Entity-Relationship Diagram

All tables link via two universal keys: cik (company) and accession_id (filing). The references table is a derived flat join — use it as your query starting point to avoid manual joins.

schema.erd
┌─────────────┐     ┌────────────────┐     ┌──────────────────────┐
│   entity    │     │    security    │     │   index_membership   │
│─────────────│     │────────────────│     │──────────────────────│
│ cik (PK)    │──┐  │ id (PK)        │  ┌──│ cik (FK → entity)    │
│ name        │  └──│ entity_id (FK) │  │  │ index_name           │
│ sector      │     │ symbol         │  │  │ effective_date       │
│ sic_code    │     │ exchange       │  │  │ removal_date         │
│ status      │     │ is_active      │  │  │ removal_reason       │
└─────────────┘     └────────────────┘  │  │ successor_cik        │
       │                                │  └──────────────────────┘
       │            ┌─────────────────┐ │           ▲
       └────────────│     filing      │ │           │
                    │─────────────────│ │           │
                    │ accession_id(PK)│ │           │
                    │ entity_id (FK)  │ │           │
                    │ form_type       │ │           │
                    │ filing_date     │ │           │
                    │ accepted_at     │ │           │
                    └─────────────────┘ │           │
                            │           │           │
                            └────────────┌──────────────────────┐
                                         │         fact         │
                                         │──────────────────────│
                                         │ fact_id (deterministic)│
                                         │ entity_id (FK)       │
                                         │ accession_id (FK)    │
                                         │ standard_concept     │
                                         │ numeric_value        │
                                         │ value_current        │
                                         │ value_as_filed       │
                                         │ accepted_at (PIT)    │
                                         └──────────────────────┘

 ┌──────────────────────────────────────────────────────────────┐
 │  references  (derived — start every cross-co query here)     │
 │──────────────────────────────────────────────────────────────│
 │  cik, name, sector, industry, sic_code, status, entity_type  │
 │  security_id, symbol, exchange, mic, is_active, valid_*      │
 │  figi, composite_figi, share_class_figi                      │
 │                                                              │
 │  Index membership? JOIN index_membership ON cik = cik        │
 │  (same column name both sides — migration 0015).             │
 └──────────────────────────────────────────────────────────────┘

 ┌──────────────────────────────────────────────────────────────┐
 │  Derived analytic table (recomputed each pipeline run):      │
 │  ratio                                                       │
 │  Keys on entity_id (= references.cik) + period_end.          │
 └──────────────────────────────────────────────────────────────┘

 ╔══════════════════════════════════════════════════════════════╗
 ║  SMART-MONEY DATASET  (Institutional tier — ~78M rows)        ║
 ╠══════════════════════════════════════════════════════════════╣
 ║  insider_party ─┬─ insider_filing ─┬─ insider_transaction     ║
 ║  (who)          │  (the disclosure)└─ insider_ownership (13D/G)║
 ║                 │                                             ║
 ║  institutional_filing ── institutional_holding (13F-HR)       ║
 ║                                                              ║
 ║  Issuer link (soft): insider_transaction.entity_id,           ║
 ║  *_holding/_ownership.subject_entity_id  →  entity.cik        ║
 ╚══════════════════════════════════════════════════════════════╝

Tables

entity

~19,888 rowsPK: cik

One row per SEC-registered company. Contains company profile data: name, sector, SIC code, location, CEO, description, and status. The root table — all other tables link back via entity_id = entity.cik.

Tip: Always filter status = 'ACTIVE' unless you specifically need delisted or inactive companies.
32 fieldsBrowse in catalog

security

~9,099 rowsPK: id

Exchange listings with SCD Type 2 history. Multiple rows per company — one per ticker/exchange combination. FIGI identifiers included for cross-system mapping.

Tip: Filter is_active = TRUE for current tickers. Use valid_from / valid_to for historical ticker reconstruction.
18 fieldsBrowse in catalog

filing

~2.35M rowsPK: accession_id

SEC EDGAR financial-statement filing index — 10-K, 10-Q, 8-K, 20-F, 40-F and amendments. One row per accepted filing. The bridge between companies and their financial facts. Every fact row carries this filing's accepted_at forward as its own PIT anchor.

Tip: Use form_type = '10-K' for annual data, '10-Q' for quarterly. Watch is_amendment for restated filings.
23 fieldsBrowse in catalog

fact

~111M rowsPK: entity_id + accession_id + standard_concept

The core financial data table. 111M+ XBRL-sourced data points normalized to 292 standard_concept labels. Each row is a single financial value for a specific company, period, and concept.

Tip: Start with standard_concept (not raw concept) for cross-company queries. Use accepted_at for PIT filtering. Use COALESCE(derived_quarterly_value, numeric_value) for cash flow metrics.
38 fieldsBrowse in catalog

valuation

~42K rowsPK: id

Pipeline-computed DCF and DDM intrinsic values. One row per company per valuation run. Includes WACC, growth rates, and margin of safety vs. market price.

Tip: model_type = 'dcf' is the primary model. DDM only applies to dividend-paying companies. Recomputed each run — not point-in-time.
26 fieldsBrowse in catalog

taxonomy_guide

~292 rowsPK: standard_concept

Lookup table for standard_concept labels used in the fact table. Maps canonical names to human-readable labels, definitions, and GAAP taxonomy references.

Tip: Join fact on standard_concept to get human_name and definition for any concept.
13 fieldsBrowse in catalog

index_membership

~6,485 rowsPK: id

Historical index constituent records — SP500, RUSSELL1000, RUSSELL2000, RUSSELL3000. One row per (cik, index, membership-period). Essential for point-in-time universe construction — prevents survivorship bias.

Tip: Filter removal_date IS NULL for current members. Use effective_date / removal_date for the half-open membership window. Keys on cik (since migration 0015) — JOIN with references on cik = cik.
17 fieldsBrowse in catalog

references

Start here
~ rowsPK: symbol + cik

Derived flat join of entity + security. One row per security. Eliminates the need for 2-table joins for company-level metadata. Always start here for cross-company queries.

Tip: is_active = TRUE for current tickers. For index membership (current OR historical), JOIN with index_membership on cik = cik — there is no is_sp500 flag (dropped 2026-05-02 because it was snapshot-only and single-index).
17 fieldsBrowse in catalog

ratio

~5.3M rowsPK: entity_id + ratio_name + period_end + fiscal_period + accepted_at

Pipeline-computed financial ratios per entity per fiscal period. Append-on-restatement: accepted_at is the PIT vintage (max accepted_at of the input facts), so a restated period is a new vintage row — filter accepted_at <= as_of then take the latest vintage to avoid look-ahead bias, mirroring the fact table.

Tip: Filter by category (profitability | per_share | liquidity | leverage | efficiency | owner_earnings | valuation) for grouped screens. is_ttm = TRUE returns trailing-twelve-months rows. accepted_at is NULL for *_sector_pctile rank rows.
13 fieldsBrowse in catalog

institutional_holding

~59.5M rowsPK: id

Smart-money · Form 13F-HR holdings — institutional managers' quarterly positions. One row per (filing, CUSIP, share-class, put/call): shares, USD market value, options, and voting authority. Institutional tier only.

Tip: subject_entity_id LEFT JOINs to entity.cik via the CUSIP→CIK lookup (NULL until resolved). Join the 13F record via accession_id = institutional_filing.accession_id.
18 fieldsBrowse in catalog

insider_transaction

~11.4M rowsPK: id

Smart-money · Form 3/4/5/144 line items — one row per insider transaction, initial holding, or proposed sale. The core insider table: transaction code, shares, price, post-transaction holdings, role, and direct/indirect ownership. Institutional tier only.

Tip: transaction_code P=buy / S=sell; transaction_type = transaction | initial_holding | proposed_sale. entity_id soft-joins to entity.cik; insider_party_id joins insider_party.
25 fieldsBrowse in catalog

insider_filing

~5.2M rowsPK: id

Smart-money · subject-rooted insider disclosure filing records (Form 3/4/5/SC 13D/SC 13G/144 + amendments) — kept separate from the financial filing table. Parent of insider_transaction and insider_ownership via accession_id. Institutional tier only.

Tip: subject_cik is a soft reference to entity.cik. accepted_at is intraday on Institutional.
13 fieldsBrowse in catalog

insider_ownership

~1.33M rowsPK: id

Smart-money · SC 13D / SC 13G beneficial-ownership disclosures (5%+ stakes). One row per reporting person; group filings emit multiple rows. Carries percent owned and the full voting/dispositive power breakdown. Institutional tier only.

Tip: schedule_type distinguishes 13D (activist) from 13G (passive). subject_entity_id soft-joins to entity.cik.
17 fieldsBrowse in catalog

insider_party

~406K rowsPK: id

Smart-money · deduplicated directory of every insider / reporting person referenced across Form 3/4/5/144. Keyed on CIK when SEC-registered, otherwise on a normalized name. Institutional tier only.

Tip: Join insider_transaction via insider_party_id = insider_party.id for cross-filing roll-ups of one person's activity.
6 fieldsBrowse in catalog

institutional_filing

~386K rowsPK: id

Smart-money · manager-rooted 13F-HR / 13F-NT filing records (and amendments) — distinct from the issuer-rooted filing table because a 13F has no single subject issuer. FK to insider_party for the manager. Institutional tier only.

Tip: Join institutional_holding via accession_id. period_end is the quarter-end the holdings reflect.
12 fieldsBrowse in catalog

Key Columns Guide

The most important columns across tables. Master these and you can write any query.

ColumnTableTypeNullableDescription
accepted_at
PIT
factTIMESTAMPTZNoPIT timestamp — when this fact became publicly known via SEC EDGAR acceptance. Use accepted_at <= trade_date for look-ahead-free backtests.
standard_concept
factVARCHARYesCanonical normalized concept name (e.g. Revenues, NetIncomeLoss). Use this — not raw concept — for cross-company queries.
numeric_value
factFLOATYesParsed financial value. NULL for non-numeric XBRL facts. Always use ABS() on capex; divide by NULLIF(denom, 0) on ratios.
derived_quarterly_value
factFLOATYesQ2/Q3 10-Q cash flows adjusted from YTD to single-quarter. Use COALESCE(derived_quarterly_value, numeric_value) for all cash flow metrics.
cik
index_membership / referencesVARCHARNoIndex membership join key. Same column name on both tables (since migration 0015) — JOIN index_membership ON references.cik = index_membership.cik. Filter index_name = 'SP500' AND removal_date IS NULL for the current S&P500.
is_active
references / securityBOOLEANNoTRUE if this ticker is currently active (valid_to IS NULL). Filter to avoid duplicate rows from ticker changes.
accession_id
filingVARCHARNoSEC EDGAR accession number — the unique identifier for a filing. Format: XXXXXXXXXX-YY-ZZZZZZ. The join key between filing and fact.
cik
entity / referencesVARCHARNoSEC Central Index Key — 10-digit company identifier. The universal join key across all tables. entity.cik = security.entity_id = filing.entity_id = fact.entity_id.

All Fields by Table

Quick-reference field list. For full descriptions and SEC XBRL tag mappings, use the Data Catalog.

DuckDB Usage

All Parquet files load natively with DuckDB's read_parquet(). DuckDB handles ZSTD decompression automatically and supports lazy projection — only the columns you SELECT are read from disk.

sql
-- Load a table directly from disk
SELECT *
FROM read_parquet('fact.parquet')
WHERE standard_concept = 'Revenues'
  AND accepted_at <= '2023-01-01'
LIMIT 10;

-- Query directly from the API URL (presigned or streamed).
-- Membership goes through index_membership — references no longer
-- carries an is_sp500 flag (dropped 2026-05-02).
SELECT r.symbol, fa.numeric_value / 1e9 AS revenue_billions
FROM read_parquet('https://data.valuein.biz/v1/sp500/references') r
JOIN read_parquet('https://data.valuein.biz/v1/sp500/index_membership') im
  ON im.cik = r.cik
JOIN read_parquet('https://data.valuein.biz/v1/sp500/fact') fa
  ON fa.entity_id = r.cik
WHERE im.index_name = 'SP500'
  AND im.removal_date IS NULL
  AND fa.standard_concept = 'Revenues'
  AND fa.accepted_at <= '2024-01-01'
ORDER BY revenue_billions DESC
LIMIT 20;

ZSTD compression

All files are ZSTD-compressed. Typical compression ratio is 6–10x vs uncompressed. DuckDB decompresses transparently.

Column pruning

SELECT only the columns you need — DuckDB reads only those column chunks from the Parquet file, not the full table.

Predicate pushdown

DuckDB pushes WHERE clauses into Parquet row group statistics, skipping groups that cannot contain matching rows.

Explore the full field catalog

Search and filter all fields with SEC XBRL tag mappings, types, and example values in the interactive Data Catalog.