Workspace beta is live — BYO-LLM chat wired to 57 SEC tools. Try it free →
ValueinValuein
For the engineers who own the financial-data pipeline

Buy the EDGAR-to-Parquet pipeline. Skip the rebuild treadmill.

80% of data teams rebuild their pipelines after deployment, and a typical build-and-maintain runs ~$520K. We turned 11,966 raw XBRL tags into ~286 canonical concepts, store them point-in-time, and serve open Parquet with a versioned schema you can pin against.

  • 11,966 raw XBRL tags → ~286 canonical concepts — delete your mapping repo.
  • Manifest-driven, versioned schema — pin against it, no silent schema-drift breakage.
  • Point-in-time done right: original facts + accepted_at, survivorship-free.
  • Open Parquet via the Bulk Data API — portable, no IEX-Cloud-style lock-in.

Built for

Data Engineers

  • Point-in-time accurate
  • Survivorship-bias-free
  • Every number cited to its filing

Works where you do

Bulk Data APIPython SDKMCP Server
Recommended plan
Institutional

Point-in-time accurate · Survivorship-bias-free · Every number cited to its filing

~$520K
build cost, avoided
11,966 → 286
tags standardized
17
versioned Parquet tables

The pain points we remove

Ingesting EDGAR is never 'done.' Tag sprawl, schema drift, and freshness keep the rebuild treadmill running. Buying the manufacturer layer takes it off your roadmap.

1

XBRL tag chaos

The US-GAAP taxonomy has 18,000+ elements and ~19% of filed concepts are custom extensions. Mapping thousands of tags to canonical concepts is costly and error-prone.

2

Comparability isn't there out of the box

Even after normalization, coverage gaps, drifting fiscal years, and Q4-not-being-a-period mean you still owe a whole quality/derivation layer.

3

Pipelines are never finished

80% of data leaders rebuild pipelines after deployment; 39% do it constantly. EDGAR adds and changes tags every filing season, keeping the treadmill running.

4

Keeping it fresh against EDGAR's limits

10 req/sec, mandatory User-Agent, 429/IP-block penalties, and the slow-API-vs-bulk-dump tradeoff make reliable, idempotent incremental refresh real, unpaid engineering.

5

Point-in-time storage vs. lock-in

Doing PIT correctly (retain original, flag the revision and its date) is a storage-design problem most teams botch — and buying a vendor raises the lock-in fear.

Built around your actual cadence

From the daily grind to the month-end crunch — Valuein fits the rhythm of the work, not the other way around.

Every day
  • Monitor ingestion jobs and handle throttled EDGAR pulls
  • Catch tag-mapping misses on new filings
  • Triage data-quality alerts
Every week
  • Reconcile coverage gaps
  • Extend canonical-concept mapping for new extensions
  • Run schema-drift checks and backfill corrections
Month / quarter-end
  • Absorb the 10-Q/10-K filing-season surge
  • Re-validate the standardization layer
  • Run the recurring buy-vs-build review

What you can do with Valuein

Each job you need done, mapped to the exact capability that delivers it.

Don't map 18,000 XBRL tags

Canonical concepts ship already standardized, with sector packs for the edge cases.

Standardized concepts

Stop the schema-drift rebuild treadmill

A versioned, manifest-driven schema consumers pin against — no silent breakage.

Versioned schema · manifest

PIT storage done right

Original facts plus accepted_at, survivorship-free — you don't design the bitemporal layer.

Point-in-time datasets

Replace the ingestion layer

A managed EDGAR→Parquet pipeline, refreshed nightly — redeploy your team onto value, not plumbing.

Bulk Data API

Open format, no lock-in

Parquet via HTTP plus an open-source SDK — portable, never an overnight-shutdown casualty.

Open Parquet

We turned 11,966 raw XBRL tags into ~286 canonical concepts. Delete your tag-mapping repo.

Buy the EDGAR-to-Parquet pipeline. 80% of data teams rebuild theirs post-deploy — skip the treadmill.

Point-in-time, survivorship-free, in open Parquet. No proprietary blob, no overnight shutdown.

Frequently asked

How do I pin against the schema so my pipeline doesn't break?

The Parquet schema is published in the R2 manifest and versioned. The SDK and MCP read it at runtime, and breaking changes only ship behind a major version bump — so you pin a version and upgrade on your schedule.

What tables and history do I get?

17 Parquet tables — core fundamentals (entity, security, filing, fact, valuation, ratios, more) plus six smart-money tables on the Institutional tier — from 1993 to present, all keyed on CIK and survivorship-free.

How fresh is the data?

The pipeline ingests and re-exports nightly, and the Institutional tier adds intraday acceptance times and filing-event webhooks so you can react to new 8-Ks without polling.

Is the data portable if we ever leave?

Yes — it's open Parquet served over HTTP and an Apache-licensed SDK. There's no proprietary container; everything you pull is yours to keep in your own storage.

We turned 11,966 raw XBRL tags into ~286 canonical concepts. Delete your tag-mapping repo.

105M+ standardized SEC facts across 19,000+ companies, 1993–present. Free to start — no credit card.