Buy the EDGAR-to-Parquet pipeline. Skip the rebuild treadmill.
80% of data teams rebuild their pipelines after deployment, and a typical build-and-maintain runs ~$520K. We turned 11,966 raw XBRL tags into ~286 canonical concepts, store them point-in-time, and serve open Parquet with a versioned schema you can pin against.
- 11,966 raw XBRL tags → ~286 canonical concepts — delete your mapping repo.
- Manifest-driven, versioned schema — pin against it, no silent schema-drift breakage.
- Point-in-time done right: original facts + accepted_at, survivorship-free.
- Open Parquet via the Bulk Data API — portable, no IEX-Cloud-style lock-in.
Built for
Data Engineers
- Point-in-time accurate
- Survivorship-bias-free
- Every number cited to its filing
Works where you do
Point-in-time accurate · Survivorship-bias-free · Every number cited to its filing
The pain points we remove
Ingesting EDGAR is never 'done.' Tag sprawl, schema drift, and freshness keep the rebuild treadmill running. Buying the manufacturer layer takes it off your roadmap.
XBRL tag chaos
The US-GAAP taxonomy has 18,000+ elements and ~19% of filed concepts are custom extensions. Mapping thousands of tags to canonical concepts is costly and error-prone.
Comparability isn't there out of the box
Even after normalization, coverage gaps, drifting fiscal years, and Q4-not-being-a-period mean you still owe a whole quality/derivation layer.
Pipelines are never finished
80% of data leaders rebuild pipelines after deployment; 39% do it constantly. EDGAR adds and changes tags every filing season, keeping the treadmill running.
Keeping it fresh against EDGAR's limits
10 req/sec, mandatory User-Agent, 429/IP-block penalties, and the slow-API-vs-bulk-dump tradeoff make reliable, idempotent incremental refresh real, unpaid engineering.
Point-in-time storage vs. lock-in
Doing PIT correctly (retain original, flag the revision and its date) is a storage-design problem most teams botch — and buying a vendor raises the lock-in fear.
Built around your actual cadence
From the daily grind to the month-end crunch — Valuein fits the rhythm of the work, not the other way around.
- Monitor ingestion jobs and handle throttled EDGAR pulls
- Catch tag-mapping misses on new filings
- Triage data-quality alerts
- Reconcile coverage gaps
- Extend canonical-concept mapping for new extensions
- Run schema-drift checks and backfill corrections
- Absorb the 10-Q/10-K filing-season surge
- Re-validate the standardization layer
- Run the recurring buy-vs-build review
What you can do with Valuein
Each job you need done, mapped to the exact capability that delivers it.
Don't map 18,000 XBRL tags
Canonical concepts ship already standardized, with sector packs for the edge cases.
Stop the schema-drift rebuild treadmill
A versioned, manifest-driven schema consumers pin against — no silent breakage.
PIT storage done right
Original facts plus accepted_at, survivorship-free — you don't design the bitemporal layer.
Replace the ingestion layer
A managed EDGAR→Parquet pipeline, refreshed nightly — redeploy your team onto value, not plumbing.
Open format, no lock-in
Parquet via HTTP plus an open-source SDK — portable, never an overnight-shutdown casualty.
One token. Every channel.
A single Stripe-issued token unlocks every surface at your tier — use Valuein from your AI client, your code, or the browser.
We turned 11,966 raw XBRL tags into ~286 canonical concepts. Delete your tag-mapping repo.
Buy the EDGAR-to-Parquet pipeline. 80% of data teams rebuild theirs post-deploy — skip the treadmill.
Point-in-time, survivorship-free, in open Parquet. No proprietary blob, no overnight shutdown.
Frequently asked
How do I pin against the schema so my pipeline doesn't break?
The Parquet schema is published in the R2 manifest and versioned. The SDK and MCP read it at runtime, and breaking changes only ship behind a major version bump — so you pin a version and upgrade on your schedule.
What tables and history do I get?
17 Parquet tables — core fundamentals (entity, security, filing, fact, valuation, ratios, more) plus six smart-money tables on the Institutional tier — from 1993 to present, all keyed on CIK and survivorship-free.
How fresh is the data?
The pipeline ingests and re-exports nightly, and the Institutional tier adds intraday acceptance times and filing-event webhooks so you can react to new 8-Ks without polling.
Is the data portable if we ever leave?
Yes — it's open Parquet served over HTTP and an Apache-licensed SDK. There's no proprietary container; everything you pull is yours to keep in your own storage.
We turned 11,966 raw XBRL tags into ~286 canonical concepts. Delete your tag-mapping repo.
105M+ standardized SEC facts across 19,000+ companies, 1993–present. Free to start — no credit card.