A high-performance toolkit for querying point-in-time US fundamentals from SEC EDGAR, built for quants, analysts, and data engineers.
The Valuein SDK, is a complete infrastructure solution for consuming point-in-time accurate US Core financial fundamentals (facts) on your daily workflow. Whether you are building complex asynchronous Python pipelines, performing forensic financial research in Excel, or executing templated SQL, this library provides frictionless, zero-setup access to institutional-grade data.
Powered by survivorship-bias-free data containing 12M+ filings and 108M+ facts from 10-Ks, 10-Qs, 8-Ks, 20-Fs, and amendments across 10,000+ active and delisted US companies since 1990.
β‘ Lightning-Fast Python SDK: Execute blazing-fast queries against remote Parquet files hosted on R2, powered entirely by DuckDB under the hood. No database setup, no massive local downloads.
π Excel & Power Query Ready: Not a Python developer? Fetch, transform, and analyze the data directly within Excel spreadsheets using our provided integrations.
π οΈ Plug-and-Play SQL Templates: Skip the boilerplate. Use our pre-built SQL templates to immediately extract insights, calculate intrinsic values, or model standardized financial statements.
π Comprehensive Context: Deep-dive documentation mapping out table schemas, primary keys, and field definitions to support your specific financial research use cases.
Easy of use and intelligence.
| Feature | Benefit |
|---|---|
| π PointβinβTime Data | Eliminate lookβahead bias in backtests |
| βοΈ SurvivorshipβBias Free | Includes bankrupt, delisted, and acquired firms |
| π Standardized Concepts | 15k+ XBRL tags mapped to ~150 canonical financial concepts |
| π DuckDB SQL Engine | Millisecond analytics directly in Python |
| βοΈ Cloud Parquet Streaming | No local data downloads required |
| π§© Financial Templates | Productionβready investment signals |
| Use Case | Who | Where to Start |
|---|---|---|
| Query financial data via Python | Quants, data engineers | Quickstart |
| Run 39 pre-built financial signals | Analysts, quants | SQL Templates |
| Learn with interactive notebooks | Students, new users | Examples & Notebooks |
| Pull data into Excel | Financial analysts | Excel Integration |
| Prove data quality to stakeholders | Institutional buyers, compliance | Research & Quality Proofs |
| Read methodology and compliance docs | Due diligence, enterprise | Documentation |
| Contribute templates, examples, research | Open-source contributors | Contributing |
1. Install Package
pip install valuein-sdkor
uv add valuein-sdk| Data Plan | Coverage | Price | Get Access |
|---|---|---|---|
| Free | S&P 500 coverage Full history Active & inactive companies |
Free | Register |
| Pro | Full universe of US stocks US core fundamentals Point-in-time & survivorship-bias free |
$200 / month | Subscribe |
| Pro (Annual) | Same as Pro plan | $1920 / year | Subscribe |
export VALUEIN_API_KEY="your_token"or
echo 'VALUEIN_API_KEY="your_token"' >> .envInclude API reference
The ValueinClient handles authentication, table discovery, and local caching in a high-performance DuckDB instance.
The Recommended Way (Context Manager)
Using a with statement ensures that temporary files and database connections are closed automatically, even if your script crashes.
from valuein_sdk import ValueinClient
try:
# load specific tables OR omit to load all tables
with ValueinClient() as client:
client.health() # dict: ok, connection status (no token required).
client.me() # dict: plan, status, email, createdAt
client.manifest() # dict: snapshot, last_updated, tables
client.tables() # List loaded table names
sql = "SELECT COUNT(cik) FROM entity" # Build SQL query
result_df = client.query(sql) # Execute it with DuckDB β pandas DataFrame
f_df = client.get(table="filing") # Download full table β pandas DataFrame
name, tickers = "01_fundamentals_by_ticker", "'AAPL','MSFT'"
df = client.run_template(name, tickers) # Named SQL template β pandas DataFrame
except Exception as e:
# Catch-all for unexpected errors (network issues, API errors, etc.)
# Or catch-them individually, more details in the error handling section at the bottom.
print(f"Unexpected error while initializing ValueinClient: {e}")Note
The first time it runs it will take a minute as it's fetching all tables.
Stream live SEC fundamental data directly into your spreadsheets. No Python, or complex scripts, just pure data power.
Get up and running in less than 60 seconds.
- Download Template: Get the
valuein-fundamentals.xlsxworkbook. - Authorize: Open the workbook and enter your API token in the Connectivity Guide sheet.
- Sync Data: Click
Data > Refresh All.
Tip
Data streams directly from Parquet files on Cloudflare R2, ensuring high-speed transfers and minimal local overhead.
- Microsoft 365 (Build 16.0.17531 or later)
- Active API Token
The workbook includes 8 high-performance sheets ready for analysis:
- Financials: Income Statement, Balance Sheet, Cash Flow
- Metadata: Entities, Securities, Filings
- Reference: Index Membership, Data Dictionary
For those who prefer a "Do It Yourself" approach, the raw M-language source files for Power Query are available in the excel/power-query/ directory. You can use these to build custom connections in your existing workbooks.
π Need more help? View the Full Setup Walkthrough.
Find more information regarding all tables, and it fields in docs/schema.json.
| Table | Description | Records |
|---|---|---|
| entity | Company metadata | 19K+ |
| security | Ticker history | 7K+ |
| filing | Filing metadata | 12M+ |
| fact | Financial statement facts | 108M+ |
| taxonomy_guide | 2026 US GAAP Taxonomy Guide | 11,966 |
| index_membership | Historical index membership (work-in-progress) | 8K+ |
security.entity_id β entity.cik
filing.entity_id β entity.cik
fact.entity_id β entity.cik
fact.accession_id β filing.accession_id
index_membership.security_id β security.id
Note
Raw XBRL tags (11,966 unique) are normalized to canonical standard_concept values. We standardized the most used
3,200 concepts from the US GAAP Taxonomy Code which allows to categorize 95% of all facts, the rest has the 'Other' category.
Both the raw
concepttag (xbrl_tag) and the normalizedstandard_conceptare on thefacttable β no join to a separate mapping table needed.
| Column | Table | Use for |
|---|---|---|
report_date / period_end |
filing / fact |
Aligning to fiscal calendar |
filing_date |
filing |
PIT backtest filter β when the SEC received the filing |
knowledge_at |
fact |
Millisecond-precision PIT for intraday signal research |
| Range | Category | Examples |
|---|---|---|
| 01β04 | Data Access | Fundamentals by ticker, FIGI lookup, peer comparison, survivorship-bias-free screen |
| 05β09 | Income Statement | YoY revenue growth, TTM, margin analysis, FCF, R&D intensity |
| 10β15 | Balance Sheet | Liquidity, solvency, interest coverage, cash conversion, capex ratios |
| 16β20 | Investment Scores | DuPont, Piotroski F-Score, Altman Z-Score, accruals anomaly |
| 21β26 | Valuation & Screening | Sector aggregates, peer ranking, dilution, arbitrage signals |
| 27β33 | Short Signals | Late filers, restatements, 8-K events, ghost companies |
| 34β39 | Advanced Analytics | PIT backtest engine, Z-score outliers, seasonality, XBRL audit |
See valuein_sdk/queries/SQL_CHEATSHEET.md for the full template reference.
| Document | Description | Format |
|---|---|---|
docs/METHODOLOGY.md |
Data sourcing, PIT architecture, restatement handling, XBRL normalization logic | Markdown |
docs/COMPLIANCE_AND_DDQ.md |
Data provenance, MNPI policy, PIT integrity, security, SLA summary | Markdown |
docs/SLA.md |
Uptime targets, data freshness SLAs, support response times, SLA credits | Markdown |
docs/excel-guide.md |
Full Excel / Power Query setup walkthrough | Markdown |
docs/DATA_CATALOG.xlsx |
All columns, types, definitions, sample values | Excel |
docs/schema.json |
Machine-readable JSON schema | JSON |
Standalone Python scripts and four Jupyter notebooks, designed to go from install to insight in under 3 minutes.*
Run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process.
from valuein_sdk import ValueinClient
client = ValueinClient(tables=["entity", "security"])
# This client only fetch these 2 tables, making it faster!
df = client.query("""
SELECT e.cik, e.name, e.sector, e.status,
s.symbol, s.exchange
FROM security s
JOIN entity e ON s.entity_id = e.cik
WHERE s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)You are now querying SEC financial statements directly from the cloud.
| Script | Level | What it demonstrates |
|---|---|---|
getting_started.py |
Beginner | Auth check, first query, entity counts by sector |
usage.py |
Reference | Every public SDK method demonstrated |
entity_screening.py |
Beginner | Screen by sector, SIC code, active vs inactive status |
financial_analysis.py |
Intermediate | Revenue trends, margins, concept normalization, peer comparison |
pit_backtest.py |
Intermediate | Correct PIT discipline, restatement impact, filing_date vs report_date |
survivorship_bias.py |
Intermediate | Delisted/bankrupt companies, index_membership, bias quantification |
| Notebook | Open in Colab |
|---|---|
| Quickstart | |
| Fundamental Analysis | |
| PIT Backtest | |
| Survivorship Bias |
from valuein_sdk import (
ValueinAuthError, # HTTP 401/403 β invalid or expired token
ValueinPlanError, # HTTP 403 β endpoint requires a higher plan
ValueinNotFoundError, # HTTP 404 β no table found
ValueinRateLimitError, # HTTP 429 β includes .retry_after (seconds)
ValueinAPIError, # HTTP 5xx β includes .status_code
ValueinClient
)
client = None
try:
client = ValueinClient()
df = client.query("SELECT * FROM fact LIMIT 1000000")
except ValueinAuthError:
print("Check your VALUEIN_API_KEY. It might be expired or invalid.")
except ValueinPlanError:
print("This requires a higher-tier plan. Upgrade at valuein.biz.")
except ValueinRateLimitError as e:
print(f"Slow down! Retry allowed in {e.retry_after}s.")
except ValueinNotFoundError as e:
print(f"That table or endpoint doesn't exist: {e}")
except ConnectionError as e:
print(f"Physical network issue: {e}")
except ValueinAPIError as e:
print(f"The Gateway is having a bad day (Status {e.status_code}).")
except Exception as e:
print(f"Non-SDK error (Python/Logic): {e}")
finally:
# Always close manually if not using a context manager and if a client was created.
if client is not None:
client.close()16 runnable research modules that prove every data quality claim with code. Designed for institutional due diligence and quantitative research.
# Install research dependencies
uv sync --group research
# Run a proof
python research/quantitative/pit_correctness_proof.py
python research/quality_proof/balance_sheet_check.pyresearch/fundamental/ β Financial statement analysis workflows
- Income statement, balance sheet, cash flow, DuPont decomposition, Altman Z-Score
research/quantitative/ β Factor model and strategy research
- PIT correctness proof, survivorship bias quantification, restatement tracking as short signal, sector rotation
research/data_engineering/ β XBRL normalization and pipeline analysis
- Concept mapping explorer, taxonomy coverage, filing timeline, data freshness by sector
research/quality_proof/ β Automated data quality validation
- Zero PIT violations, balance sheet equation check (Assets = Liabilities + Equity within 1%), coverage report, SEC cross-reference spot-check
See research/README.md for a full breakdown of what each module proves and the key metric it validates.
We welcome contributions including SQL templates, notebooks, scripts, research modules, and documentation improvements.
See CONTRIBUTING.md for code standards, naming conventions, and the PR process.
Apache-2.0 License β see LICENSE.
Disclosure: This repository is for research and educational purposes only and does not constitute financial advice.
