Skip to content

valuein/quants

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

193 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Valuein Logo

PyPI version Python 3.10+ License CI

πŸ’Ž Valuein Python SDK: Frictionless Financial Data

A high-performance toolkit for querying point-in-time US fundamentals from SEC EDGAR, built for quants, analysts, and data engineers.

The Valuein SDK, is a complete infrastructure solution for consuming point-in-time accurate US Core financial fundamentals (facts) on your daily workflow. Whether you are building complex asynchronous Python pipelines, performing forensic financial research in Excel, or executing templated SQL, this library provides frictionless, zero-setup access to institutional-grade data.

The Data Engine

Powered by survivorship-bias-free data containing 12M+ filings and 108M+ facts from 10-Ks, 10-Qs, 8-Ks, 20-Fs, and amendments across 10,000+ active and delisted US companies since 1990.

Why use this toolkit?

⚑ Lightning-Fast Python SDK: Execute blazing-fast queries against remote Parquet files hosted on R2, powered entirely by DuckDB under the hood. No database setup, no massive local downloads.

πŸ“Š Excel & Power Query Ready: Not a Python developer? Fetch, transform, and analyze the data directly within Excel spreadsheets using our provided integrations.

πŸ› οΈ Plug-and-Play SQL Templates: Skip the boilerplate. Use our pre-built SQL templates to immediately extract insights, calculate intrinsic values, or model standardized financial statements.

πŸ“š Comprehensive Context: Deep-dive documentation mapping out table schemas, primary keys, and field definitions to support your specific financial research use cases.

πŸš€ Why Valuein Data and SDK Library

Easy of use and intelligence.

Feature Benefit
πŸ•’ Point‑in‑Time Data Eliminate look‑ahead bias in backtests
βš–οΈ Survivorship‑Bias Free Includes bankrupt, delisted, and acquired firms
πŸ“Š Standardized Concepts 15k+ XBRL tags mapped to ~150 canonical financial concepts
πŸš€ DuckDB SQL Engine Millisecond analytics directly in Python
☁️ Cloud Parquet Streaming No local data downloads required
🧩 Financial Templates Production‑ready investment signals

🧠 What You Can Do With This Repository

Use Case Who Where to Start
Query financial data via Python Quants, data engineers Quickstart
Run 39 pre-built financial signals Analysts, quants SQL Templates
Learn with interactive notebooks Students, new users Examples & Notebooks
Pull data into Excel Financial analysts Excel Integration
Prove data quality to stakeholders Institutional buyers, compliance Research & Quality Proofs
Read methodology and compliance docs Due diligence, enterprise Documentation
Contribute templates, examples, research Open-source contributors Contributing

⚑ Quickstart

1. Install Package

pip install valuein-sdk

or

uv add valuein-sdk

πŸ”‘ 2. Get Your API Token

Data Plan Coverage Price Get Access
Free S&P 500 coverage
Full history
Active & inactive companies
Free Register
Pro Full universe of US stocks
US core fundamentals
Point-in-time & survivorship-bias free
$200 / month Subscribe
Pro (Annual) Same as Pro plan $1920 / year Subscribe

πŸ” 3. Set Your API Token

export VALUEIN_API_KEY="your_token"

or

echo 'VALUEIN_API_KEY="your_token"' >> .env

▢️ 4. Run Your First Query

Include API reference

The ValueinClient handles authentication, table discovery, and local caching in a high-performance DuckDB instance.

The Recommended Way (Context Manager)

Using a with statement ensures that temporary files and database connections are closed automatically, even if your script crashes.

from valuein_sdk import ValueinClient

try:
    # load specific tables OR omit to load all tables
    with ValueinClient() as client:    
        client.health()     # dict: ok, connection status (no token required).
        client.me()         # dict: plan, status, email, createdAt
        client.manifest()   # dict: snapshot, last_updated, tables
        client.tables()     # List loaded table names
        
        sql = "SELECT COUNT(cik) FROM entity"   # Build SQL query
        result_df = client.query(sql)           # Execute it with DuckDB β†’ pandas DataFrame
        f_df = client.get(table="filing")       # Download full table β†’ pandas DataFrame
        
        name, tickers = "01_fundamentals_by_ticker", "'AAPL','MSFT'"
        df = client.run_template(name, tickers) # Named SQL template β†’ pandas DataFrame
except Exception as e:
    # Catch-all for unexpected errors (network issues, API errors, etc.)
    # Or catch-them individually, more details in the error handling section at the bottom.
    print(f"Unexpected error while initializing ValueinClient: {e}")

Note

The first time it runs it will take a minute as it's fetching all tables.


πŸ“Š Excel Integration

Stream live SEC fundamental data directly into your spreadsheets. No Python, or complex scripts, just pure data power.

⚑ Quick Start

Get up and running in less than 60 seconds.

  1. Download Template: Get the valuein-fundamentals.xlsx workbook.
  2. Authorize: Open the workbook and enter your API token in the Connectivity Guide sheet.
  3. Sync Data: Click Data > Refresh All.

Tip

Data streams directly from Parquet files on Cloudflare R2, ensuring high-speed transfers and minimal local overhead.

πŸ›  Requirements

  • Microsoft 365 (Build 16.0.17531 or later)
  • Active API Token

πŸ“‚ Pre-Configured Sheets

The workbook includes 8 high-performance sheets ready for analysis:

  • Financials: Income Statement, Balance Sheet, Cash Flow
  • Metadata: Entities, Securities, Filings
  • Reference: Index Membership, Data Dictionary

πŸ§‘β€πŸ’» Developer Customization

For those who prefer a "Do It Yourself" approach, the raw M-language source files for Power Query are available in the excel/power-query/ directory. You can use these to build custom connections in your existing workbooks.

πŸ“– Need more help? View the Full Setup Walkthrough.


πŸ—‚οΈ Data Schema

Find more information regarding all tables, and it fields in docs/schema.json.

Table Description Records
entity Company metadata 19K+
security Ticker history 7K+
filing Filing metadata 12M+
fact Financial statement facts 108M+
taxonomy_guide 2026 US GAAP Taxonomy Guide 11,966
index_membership Historical index membership (work-in-progress) 8K+

πŸ”— Key Joins

security.entity_id              β†’  entity.cik
filing.entity_id                β†’  entity.cik
fact.entity_id                  β†’  entity.cik
fact.accession_id               β†’  filing.accession_id
index_membership.security_id    β†’  security.id

🏷️ Standard Concept Names

Note

Raw XBRL tags (11,966 unique) are normalized to canonical standard_concept values. We standardized the most used 3,200 concepts from the US GAAP Taxonomy Code which allows to categorize 95% of all facts, the rest has the 'Other' category.

Both the raw concept tag (xbrl_tag) and the normalized standard_concept are on the fact table β€” no join to a separate mapping table needed.

πŸ“… Date Columns Reference

Column Table Use for
report_date / period_end filing / fact Aligning to fiscal calendar
filing_date filing PIT backtest filter β€” when the SEC received the filing
knowledge_at fact Millisecond-precision PIT for intraday signal research

🧩 Template Categories

Range Category Examples
01–04 Data Access Fundamentals by ticker, FIGI lookup, peer comparison, survivorship-bias-free screen
05–09 Income Statement YoY revenue growth, TTM, margin analysis, FCF, R&D intensity
10–15 Balance Sheet Liquidity, solvency, interest coverage, cash conversion, capex ratios
16–20 Investment Scores DuPont, Piotroski F-Score, Altman Z-Score, accruals anomaly
21–26 Valuation & Screening Sector aggregates, peer ranking, dilution, arbitrage signals
27–33 Short Signals Late filers, restatements, 8-K events, ghost companies
34–39 Advanced Analytics PIT backtest engine, Z-score outliers, seasonality, XBRL audit

See valuein_sdk/queries/SQL_CHEATSHEET.md for the full template reference.

πŸ“š Documentation

Document Description Format
docs/METHODOLOGY.md Data sourcing, PIT architecture, restatement handling, XBRL normalization logic Markdown
docs/COMPLIANCE_AND_DDQ.md Data provenance, MNPI policy, PIT integrity, security, SLA summary Markdown
docs/SLA.md Uptime targets, data freshness SLAs, support response times, SLA credits Markdown
docs/excel-guide.md Full Excel / Power Query setup walkthrough Markdown
docs/DATA_CATALOG.xlsx All columns, types, definitions, sample values Excel
docs/schema.json Machine-readable JSON schema JSON

🐍 Python Examples

Standalone Python scripts and four Jupyter notebooks, designed to go from install to insight in under 3 minutes.*

Ticker lookup example

Run any SQL against the data lake. No downloads. No local database. DuckDB executes your queries in-process.

from valuein_sdk import ValueinClient

client = ValueinClient(tables=["entity", "security"])
# This client only fetch these 2 tables, making it faster!

df = client.query("""
    SELECT e.cik, e.name, e.sector, e.status,
           s.symbol, s.exchange
    FROM   security s
    JOIN   entity   e ON s.entity_id = e.cik
    WHERE  s.symbol = 'AAPL' AND s.is_active = TRUE
""")
print(df)

You are now querying SEC financial statements directly from the cloud.

Python scripts (examples/python/)

Script Level What it demonstrates
getting_started.py Beginner Auth check, first query, entity counts by sector
usage.py Reference Every public SDK method demonstrated
entity_screening.py Beginner Screen by sector, SIC code, active vs inactive status
financial_analysis.py Intermediate Revenue trends, margins, concept normalization, peer comparison
pit_backtest.py Intermediate Correct PIT discipline, restatement impact, filing_date vs report_date
survivorship_bias.py Intermediate Delisted/bankrupt companies, index_membership, bias quantification

Jupyter notebooks (examples/notebooks/)

Notebook Open in Colab
Quickstart Open in Colab
Fundamental Analysis Open in Colab
PIT Backtest Open in Colab
Survivorship Bias Open in Colab

πŸ›‘οΈ Error Handling

from valuein_sdk import (
    ValueinAuthError,      # HTTP 401/403 β€” invalid or expired token
    ValueinPlanError,      # HTTP 403 β€” endpoint requires a higher plan
    ValueinNotFoundError,  # HTTP 404 β€” no table found 
    ValueinRateLimitError, # HTTP 429 β€” includes .retry_after (seconds)
    ValueinAPIError,       # HTTP 5xx β€” includes .status_code
    ValueinClient
)

client = None
try:
    client = ValueinClient()
    df = client.query("SELECT * FROM fact LIMIT 1000000")
except ValueinAuthError:
    print("Check your VALUEIN_API_KEY. It might be expired or invalid.")
except ValueinPlanError:
    print("This requires a higher-tier plan. Upgrade at valuein.biz.")
except ValueinRateLimitError as e:
    print(f"Slow down! Retry allowed in {e.retry_after}s.")
except ValueinNotFoundError as e:
    print(f"That table or endpoint doesn't exist: {e}")
except ConnectionError as e:
    print(f"Physical network issue: {e}")
except ValueinAPIError as e:
    print(f"The Gateway is having a bad day (Status {e.status_code}).")
except Exception as e:
    print(f"Non-SDK error (Python/Logic): {e}")
finally:
    # Always close manually if not using a context manager and if a client was created.
    if client is not None:
        client.close()

πŸ”¬ Research & Quality Proofs

16 runnable research modules that prove every data quality claim with code. Designed for institutional due diligence and quantitative research.

# Install research dependencies
uv sync --group research

# Run a proof
python research/quantitative/pit_correctness_proof.py
python research/quality_proof/balance_sheet_check.py

πŸ“Š Research Modules

research/fundamental/ β€” Financial statement analysis workflows

  • Income statement, balance sheet, cash flow, DuPont decomposition, Altman Z-Score

research/quantitative/ β€” Factor model and strategy research

  • PIT correctness proof, survivorship bias quantification, restatement tracking as short signal, sector rotation

research/data_engineering/ β€” XBRL normalization and pipeline analysis

  • Concept mapping explorer, taxonomy coverage, filing timeline, data freshness by sector

research/quality_proof/ β€” Automated data quality validation

  • Zero PIT violations, balance sheet equation check (Assets = Liabilities + Equity within 1%), coverage report, SEC cross-reference spot-check

See research/README.md for a full breakdown of what each module proves and the key metric it validates.


🀝 Contributing

We welcome contributions including SQL templates, notebooks, scripts, research modules, and documentation improvements.

See CONTRIBUTING.md for code standards, naming conventions, and the PR process.

πŸ“„ License

Apache-2.0 License β€” see LICENSE.

Disclosure: This repository is for research and educational purposes only and does not constitute financial advice.

About

We provide documentation and free resources for 12M+ SEC filings and 105M+ raw facts. This is a survivor-bias free and Point-in-Time dataset.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors