Bulk Data API

Bulk Data API Reference

Parquet-first edge gateway for bulk financial data. All endpoints stream ZSTD-compressed Parquet files — load directly into DuckDB, Pandas, or Polars.

https://data.valuein.biz

Live

Authentication

Authenticated endpoints require a Bearer token in the Authorization header. Tokens are provisioned automatically when you subscribe via Stripe. The /v1/sample/* endpoints are always public — no token required.

Unauthenticated (sample tier)

bash

$ curl https://data.valuein.biz/v1/sample/entity \    --output entity.parquet

Authenticated (sp500 / pro / full tier)

bash

$ curl https://data.valuein.biz/v1/sp500/fact \    -H "Authorization: Bearer YOUR_TOKEN" \    --output fact.parquet

Check your token plan

bash

$ curl https://data.valuein.biz/v1/me \    -H "Authorization: Bearer YOUR_TOKEN"

json

{  "plan":   "sp500",  "status": "active",  "email":  "user@example.com"}

Response Format

Data endpoints return raw Parquet bytes. The Content-Type is application/octet-stream. Files are ZSTD-compressed — DuckDB, Pandas, and Polars decompress automatically via read_parquet(). Non-data endpoints return application/json.

Response Type	Content-Type	Endpoints
Parquet stream	application/octet-stream	/v1/sample/, /v1/sp500/, /v1/pro/, /v1/full/
JSON	application/json	/health, /v1/me, /v1/manifest, /v1/usage

Plans

Your token's plan determines which bucket you can access. A higher plan grants access to all lower tiers as well.

Plan	Auth Required	Bucket	Coverage
sample	No	R2_SAMPLE	Public 5-year S&P500 slice
sp500	Yes	R2_SP500	Full S&P500 history 1993–present
pro	Yes	R2_PRO	Active + delisted US universe (19,000+ entities), 15-year history (2011→present)
full	Yes	R2_FULL	Institutional tier: US + foreign issuers, 1993→present, intraday accepted_at, webhooks, redistribution license

Endpoints

9 endpoints across the gateway. Expand a row for its query parameters and an example response.

Python Example

Download a Parquet table and query it locally with DuckDB in under 10 lines.

quickstart.pypython

import duckdbimport requests token = "YOUR_TOKEN"url   = "https://data.valuein.biz/v1/sp500/fact" r = requests.get(url, headers={"Authorization": f"Bearer {token}"}, stream=True)r.raise_for_status() with open("fact.parquet", "wb") as f:    for chunk in r.iter_content(chunk_size=8192):        f.write(chunk) conn = duckdb.connect()df   = conn.execute(    "SELECT * FROM read_parquet('fact.parquet') LIMIT 5").df()print(df)

Available Tables

14 core tables ship on every tier. Pass any table name as the {table} path segment. Institutional adds 6 smart-money tables (insider Forms 3/4/5/144 and 13F/13D/13G holdings): insider_party, insider_filing, insider_transaction, institutional_filing, institutional_holding, insider_ownership. See Parquet Schema Reference for full field definitions.

Table	Description
`references`	Derived flat join of entity + security. One row per security. Start here for cross-company queries; JOIN index_membership on cik = cik for membership filters.
`entity`	Company profiles: name, sector, SIC code, location, CEO, founding year, description. One row per CIK.
`security`	Exchange listings: ticker, exchange, FIGI, valid date range (SCD Type 2). Multiple rows per company.
`filing`	SEC EDGAR filing index: accession ID, form type, filing date, acceptance timestamp. Links entity to facts.
`fact`	111M+ financial data points: XBRL concept values with accepted_at timestamps for PIT accuracy.
`valuation`	Pipeline-computed DCF and DDM intrinsic values with WACC and growth rate assumptions.
`ratio`	Pipeline-computed financial ratios per entity per fiscal period. Carries an append-only accepted_at vintage, so ratios are point-in-time reconstructable like fundamentals. Filter by category for grouped screens.
`factor_scores`	Cross-sectional factor scores (value, quality, momentum families) per entity — powers universe screens.
`earnings_signals`	Per-entity earnings signals: surprises, revision momentum, and report-timing features.
`stock_price`	Latest end-of-day close per entity. Available on every tier.
`stock_price_daily`	Full daily OHLCV bar series with adjusted_close and corporate-action factors — join to fundamentals for PIT valuation multiples.
`index_membership`	Historical index constituents (SP500, RUSSELL1000, RUSSELL2000, RUSSELL3000) with effective_date / removal_date for PIT universe construction. Keys on cik.
`taxonomy_guide`	Mapping of 292 standard_concept labels to raw XBRL tags and human-readable descriptions.
`standard_concept`	The standardization layer itself: every canonical concept with definition, statement type, and CPA review status.

Manifest Response

Call GET /v1/manifest to discover available tables and the current snapshot timestamp for your plan. Check this before downloading tables to detect updates.

json

{  "snapshot":     "snapshot_20260704",  "last_updated": "2026-07-04T03:39:00Z",  "tables": ["earnings_signals", "entity", "fact", "factor_scores",             "filing", "index_membership", "ratio", "references",             "security", "standard_concept", "stock_price",             "stock_price_daily", "taxonomy_guide", "valuation"]}

Rate Limits & Retries

Limits are per Bearer token and enforced at the Cloudflare edge. Every response includes the standard rate-limit headers — read them before retrying so your client never busy-waits.

Response headers on every request

http

X-RateLimit-Limit:     120
X-RateLimit-Remaining: 117
X-RateLimit-Reset:     1735680000
Retry-After:           42

Retry-After appears only on 429 responses (seconds). X-RateLimit-Reset is the Unix epoch when the window rolls over.

Recommended retry policy

429 — sleep for Retry-After seconds, then retry. Never retry sooner.
503 — exponential backoff: 1s, 2s, 4s. Cap at 3 retries.
5xx other — retry once after 1s, then surface the error.
4xx other — never retry. Fix the request.

curl with retry-aware streaming

bash

# --retry honors Retry-After on 429 and exponentially backs off on 5xx.# --retry-max-time bounds total elapsed time; --retry-connrefused covers# the cold-start case after a cache eviction.$ curl https://data.valuein.biz/v1/sp500/fact \    -H "Authorization: Bearer $VALUEIN_TOKEN" \    --retry 5 --retry-max-time 120 \    --retry-connrefused \    --output fact.parquet

Parquet responses are a single binary stream — no pagination, no cursor tokens. The gateway sets Content-Length on every data response so clients can show progress and pre-allocate buffers. For incremental reads, query the latest snapshot from /v1/manifest and download only when the timestamp changes.

Bulk Data API vs MCP Server vs Python SDK

Three channels, one Bearer token, same warehouse. Pick by access pattern.

Channel	Best for	Returns	Latency
Bulk Data API	Loading full tables into DuckDB, Spark, or a warehouse. Periodic syncs.	ZSTD Parquet (full table)	Edge stream · MB-scale
MCP Server	Single-fact lookups from AI agents. Conversational queries.	JSON tool responses	Sub-100ms typical
Python SDK	DataFrame-shaped queries from notebooks and scripts.	DuckDB over R2 Parquet · as-filed vs latest restatement columns	Local DuckDB · ms

Use the SDK first if you're writing Python — it wraps this API and the MCP server with sensible defaults. Use the raw Bulk Data API when you're in a non-Python stack or building a partner integration.

Error Codes

Status	Meaning	Common Cause
200 OK	Success	Request succeeded. Parquet bytes or JSON body in the response.
400 Bad Request	Invalid table	The table name in the path is not in the valid tables list. Check spelling and trailing slashes.
401 Unauthorized	Missing or invalid token	No Authorization header, malformed Bearer token, or token not found in KV store.
403 Forbidden	Plan too low	Your token exists but its plan does not grant access to this bucket (e.g. sample token accessing /v1/sp500/).
429 Too Many Requests	Rate limit exceeded	You have exceeded your daily request quota. Resets at UTC midnight. Upgrade to a higher plan for higher limits.
503 Service Unavailable	Snapshot loading	The R2 snapshot is being refreshed. Retry after 30–60 seconds. This is rare and brief.

Get your API token

Subscribe to the free S&P500 plan, Pro, or Institutional to receive a Bearer token instantly. The sample tier is always free — no credit card required.

View pricing

Bulk Data API Reference

Authentication

Response Format

Plans

Endpoints

GET/healthPublicSystem

GET/v1/meBearer tokenAuth

GET/v1/manifestBearer tokenDiscovery

GET/v1/sample/manifestPublicSample (No Auth)

GET/v1/sample/{table}PublicSample (No Auth)

GET/v1/sp500/{table}Bearer tokenData

GET/v1/pro/{table}Bearer tokenData

GET/v1/full/{table}Bearer tokenData

GET/v1/usageBearer tokenAnalytics

Python Example

Available Tables

Manifest Response

Rate Limits & Retries

Bulk Data API vs MCP Server vs Python SDK

Error Codes

Get your API token