Workspace beta is live — BYO-LLM chat wired to 57 SEC tools. Try it free →
ValueinValuein
Bulk Data API

Bulk Data API Reference

Parquet-first edge gateway for bulk financial data. All endpoints stream ZSTD-compressed Parquet files — load directly into DuckDB, Pandas, or Polars.

https://data.valuein.biz
Live

Authentication

Authenticated endpoints require a Bearer token in the Authorization header. Tokens are provisioned automatically when you subscribe via Stripe. The /v1/sample/* endpoints are always public — no token required.

Unauthenticated (sample tier)

bash
$ curl https://data.valuein.biz/v1/sample/entity \
    --output entity.parquet

Authenticated (sp500 / full tier)

bash
$ curl https://data.valuein.biz/v1/sp500/fact \
    -H "Authorization: Bearer YOUR_TOKEN" \
    --output fact.parquet

Check your token plan

bash
$ curl https://data.valuein.biz/v1/me \
    -H "Authorization: Bearer YOUR_TOKEN"
json
{
  "plan":   "sp500",
  "status": "active",
  "email":  "[email protected]"
}

Response Format

Data endpoints return raw Parquet bytes. The Content-Type is application/octet-stream. Files are ZSTD-compressed — DuckDB, Pandas, and Polars decompress automatically via read_parquet(). Non-data endpoints return application/json.

Response TypeContent-TypeEndpoints
Parquet streamapplication/octet-stream/v1/sample/*, /v1/sp500/*, /v1/full/*
JSONapplication/json/health, /v1/me, /v1/manifest, /v1/usage

Plans

Your token's plan determines which bucket you can access. A higher plan grants access to all lower tiers as well.

PlanAuth RequiredBucketCoverage
sample
NoR2_SAMPLEPublic 5-year S&P500 slice
sp500
YesR2_SP500Full S&P500 history 1993–present
pro
YesR2_PROActive + delisted US universe (19,000+ entities), 15-year history (2011→present)
full
YesR2_FULLInstitutional tier: US + foreign issuers, 1993→present, intraday accepted_at, webhooks, redistribution license

Endpoints

8 endpoints across the gateway. Expand a row for its query parameters and an example response.

Python Example

Download a Parquet table and query it locally with DuckDB in under 10 lines.

quickstart.pypython
import duckdb
import requests

token = "YOUR_TOKEN"
url   = "https://data.valuein.biz/v1/sp500/fact"

r = requests.get(url, headers={"Authorization": f"Bearer {token}"}, stream=True)
r.raise_for_status()

with open("fact.parquet", "wb") as f:
    for chunk in r.iter_content(chunk_size=8192):
        f.write(chunk)

conn = duckdb.connect()
df   = conn.execute(
    "SELECT * FROM read_parquet('fact.parquet') LIMIT 5"
).df()
print(df)

Available Tables

9 tables cover the full schema. Pass any table name as the {table} path segment. See Parquet Schema Reference for full field definitions.

TableDescription
entityCompany profiles: name, sector, SIC code, location, CEO, founding year, description. One row per CIK.
securityExchange listings: ticker, exchange, FIGI, valid date range (SCD Type 2). Multiple rows per company.
filingSEC EDGAR filing index: accession ID, form type, filing date, acceptance timestamp. Links entity to facts.
fact111M+ financial data points: XBRL concept values with accepted_at timestamps for PIT accuracy.
valuationPipeline-computed DCF and DDM intrinsic values with WACC and growth rate assumptions.
taxonomy_guideMapping of 292 standard_concept labels to raw XBRL tags and human-readable descriptions.
index_membershipHistorical index constituents (SP500, NASDAQ100, RUSSELL3000, WILSHIRE5000) with effective_date / removal_date for PIT universe construction. Keys on cik (since migration 0015).
referencesDerived flat join of entity + security. One row per security. Start here for cross-company queries; JOIN index_membership on cik = cik for membership filters.
ratioPipeline-computed financial ratios per entity per fiscal period (recomputed on every pipeline run; not PIT). Filter by category for grouped screens.

Manifest Response

Call GET /v1/manifest to discover available tables and the current snapshot timestamp for your plan. Check this before downloading tables to detect updates.

json
{
  "snapshot":     "snapshot_20260411",
  "last_updated": "2026-04-11T00:00:00Z",
  "tables": ["entity", "security", "filing", "fact",
             "valuation", "taxonomy_guide",
             "index_membership", "references", "ratio"]
}

Rate Limits & Retries

Limits are per Bearer token and enforced at the Cloudflare edge. Every response includes the standard rate-limit headers — read them before retrying so your client never busy-waits.

Response headers on every request

http
X-RateLimit-Limit:     120
X-RateLimit-Remaining: 117
X-RateLimit-Reset:     1735680000
Retry-After:           42

Retry-After appears only on 429 responses (seconds). X-RateLimit-Reset is the Unix epoch when the window rolls over.

Recommended retry policy

  • 429 — sleep for Retry-After seconds, then retry. Never retry sooner.
  • 503 — exponential backoff: 1s, 2s, 4s. Cap at 3 retries.
  • 5xx other — retry once after 1s, then surface the error.
  • 4xx other — never retry. Fix the request.

curl with retry-aware streaming

bash
# --retry honors Retry-After on 429 and exponentially backs off on 5xx.
# --retry-max-time bounds total elapsed time; --retry-connrefused covers
# the cold-start case after a cache eviction.
$ curl https://data.valuein.biz/v1/sp500/fact \
    -H "Authorization: Bearer $VALUEIN_TOKEN" \
    --retry 5 --retry-max-time 120 \
    --retry-connrefused \
    --output fact.parquet

Parquet responses are a single binary stream — no pagination, no cursor tokens. The gateway sets Content-Length on every data response so clients can show progress and pre-allocate buffers. For incremental reads, query the latest snapshot from /v1/manifest and download only when the timestamp changes.

Bulk Data API vs MCP Server vs Python SDK

Three channels, one Bearer token, same warehouse. Pick by access pattern.

ChannelBest forReturnsLatency
Bulk Data APILoading full tables into DuckDB, Spark, or a warehouse. Periodic syncs.ZSTD Parquet (full table)Edge stream · MB-scale
MCP ServerSingle-fact lookups from AI agents. Conversational queries.JSON tool responsesSub-100ms typical
Python SDKDataFrame-shaped queries from notebooks and scripts.DuckDB over R2 Parquet · as-filed vs latest restatement columnsLocal DuckDB · ms

Use the SDK first if you're writing Python — it wraps this API and the MCP server with sensible defaults. Use the raw Bulk Data API when you're in a non-Python stack or building a partner integration.

Error Codes

StatusMeaningCommon Cause
200 OKSuccessRequest succeeded. Parquet bytes or JSON body in the response.
400 Bad RequestInvalid tableThe table name in the path is not in the valid tables list. Check spelling and trailing slashes.
401 UnauthorizedMissing or invalid tokenNo Authorization header, malformed Bearer token, or token not found in KV store.
403 ForbiddenPlan too lowYour token exists but its plan does not grant access to this bucket (e.g. sample token accessing /v1/sp500/).
429 Too Many RequestsRate limit exceededYou have exceeded your daily request quota. Resets at UTC midnight. Upgrade to a higher plan for higher limits.
503 Service UnavailableSnapshot loadingThe R2 snapshot is being refreshed. Retry after 30–60 seconds. This is rare and brief.

Get your API token

Subscribe to the S&P500 or Full plan to receive a Bearer token instantly. The sample tier is always free — no credit card required.