For the engineers who own the financial-data pipeline

Buy the EDGAR-to-Parquet pipeline. Skip the rebuild treadmill.

Stop maintaining a brittle EDGAR pipeline. Get standardized, point-in-time SEC facts as open Parquet, on a versioned schema you can pin against. The catastrophe this removes: the silent schema drift that breaks every consumer downstream.

11,966 raw XBRL tags → 292 canonical concepts — delete your mapping repo.
Manifest-driven, versioned schema — pin against it, no silent schema-drift breakage.
Point-in-time done right: original facts + accepted_at, survivorship-free.
Open Parquet via the Bulk Data API — fundamentals, ratios, daily OHLCV prices, smart money. No IEX-Cloud-style lock-in.

Start free — no card View the Parquet schema

Built for

Data Engineers

Point-in-time accurate
Survivorship-bias-free
Every number cited to its filing

Works where you do

Bulk Data APIPython SDKMCP Server

Recommended plan

Institutional

~$520K

build cost, avoided

11,966 → 292

tags standardized

versioned Parquet tables

The pain points we remove

Ingesting EDGAR is never 'done.' Tag sprawl, schema drift, and freshness keep the rebuild treadmill running. Buying the manufacturer layer takes it off your roadmap.

XBRL tag chaos

The US-GAAP taxonomy has 18,000+ elements and ~19% of filed concepts are custom extensions. Mapping thousands of tags to canonical concepts is costly and error-prone.

Comparability isn't there out of the box

Even after normalization, coverage gaps, drifting fiscal years, and Q4-not-being-a-period mean you still owe a whole quality/derivation layer.

Pipelines are never finished

80% of data leaders rebuild pipelines after deployment; 39% do it constantly. EDGAR adds and changes tags every filing season, keeping the treadmill running.

Keeping it fresh against EDGAR's limits

10 req/sec, mandatory User-Agent, 429/IP-block penalties, and the slow-API-vs-bulk-dump tradeoff make reliable, idempotent incremental refresh real, unpaid engineering.

Point-in-time storage vs. lock-in

Doing PIT correctly (retain original, flag the revision and its date) is a storage-design problem most teams botch — and buying a vendor raises the lock-in fear.

The grind we take off your plate

From the daily check-ins to the month-end scramble — this is the recurring work Valuein automates so you spend your hours on the thesis, not the data.

Every day

Monitor ingestion jobs and handle throttled EDGAR pulls
Catch tag-mapping misses on new filings
Triage data-quality alerts

Every week

Reconcile coverage gaps
Extend canonical-concept mapping for new extensions
Run schema-drift checks and backfill corrections

Month-end & earnings

Absorb the 10-Q/10-K filing-season surge
Re-validate the standardization layer
Run the recurring buy-vs-build review

What you can do with Valuein

Each job you need done, mapped to the exact capability that delivers it.

Don't map 18,000 XBRL tags

Canonical concepts ship already standardized, with sector packs for the edge cases.

Standardized concepts

Stop the schema-drift rebuild treadmill

A versioned, manifest-driven schema consumers pin against — no silent breakage.

Versioned schema · manifest

PIT storage done right

Original facts plus accepted_at, survivorship-free — you don't design the bitemporal layer.

Point-in-time datasets

Replace the ingestion layer

A managed EDGAR→Parquet pipeline, refreshed nightly — redeploy your team onto value, not plumbing.

Bulk Data API

A measurable quality bar

A published, CI-gated accuracy baseline (all 19,607 S&P 500 annual filings passing every published accounting identity, 0 failures) — a result you can hold us to, re-derivable from one DuckDB script.

Published accuracy baseline

Open format, no lock-in

Parquet via HTTP plus an open-source SDK — portable, never an overnight-shutdown casualty.

Open Parquet

Works where you do

One Bearer token reaches the same point-in-time data from your AI agent, your notebook, or your browser. Use the surface that fits the job.

Bulk Data API

The edge-gateway streams the full Parquet universe — drop-in for your warehouse.

Explore

Python SDK

Reference client + 60 SQL templates; schema read from the manifest at runtime.

Explore

MCP Server

Programmatic access for internal agents and data-ops automations.

Explore

We turned 11,966 raw XBRL tags into 292 canonical concepts. Delete your tag-mapping repo.

Buy the EDGAR-to-Parquet pipeline. 80% of data teams rebuild theirs post-deploy — skip the treadmill.

Point-in-time, survivorship-free, in open Parquet. No proprietary blob, no overnight shutdown.

Frequently asked

How do I pin against the schema so my pipeline doesn't break?

The Parquet schema is published in the R2 manifest and versioned. The SDK and MCP read it at runtime, and breaking changes only ship behind a major version bump — so you pin a version and upgrade on your schedule.

What tables and history do I get?

20 Parquet tables — 14 core (entity, security, filing, fact, valuation, ratios, standard concepts, daily OHLCV price history, more) plus six smart-money tables on the Institutional tier — from 1993 to present, all keyed on CIK and survivorship-free.

How fresh is the data?

The pipeline ingests and re-exports nightly, and the Institutional tier adds intraday acceptance times and filing-event webhooks so you can react to new 8-Ks without polling.

Is the data portable if we ever leave?

Yes — it's open Parquet served over HTTP and an Apache-licensed SDK. There's no proprietary container; everything you pull is yours to keep in your own storage.

We turned 11,966 raw XBRL tags into 292 canonical concepts. Delete your tag-mapping repo.

111M+ standardized SEC facts across 19,000+ companies, 1993–present. Free to start — no credit card.

Start free — no card View pricing

Bulk Data API reference Parquet schema + ERD Data catalog