Add FinQA Environment for Financial QA Benchmarking #329

bhavishya-pohani · 2026-01-26T14:15:36Z

Summary

Adds FinQA environment for evaluating LLMs on financial question-answering tasks using SEC 10-K filing data.

Tool-calling based environment with SQL queries on financial tables
290 questions across multiple companies (Alphabet, Amazon, Apple, etc.)
Fuzzy numerical matching for reward computation (handles percentages, fractions, LaTeX formatting)
Auto-generated OpenAI tool schemas from function docstrings
HuggingFace data download script

Features

Tools: get_descriptions, get_table_info, sql_query, submit_answer
Reward: Binary (1.0 correct, 0.0 incorrect) with 1% tolerance
Data: Downloaded from HuggingFace via download_data.sh

Test Plan

Unit tests for reward matching (49 tests passing)
Docker build and inference script working

…ference script - Add /tools endpoint to expose tool schemas in OpenAI function calling format - Auto-generate tool schemas from function docstrings (tool_schema.py) - Add download_data.sh to fetch data from HuggingFace - Fix reward computation for multi-value answers (multiple \boxed{} values) - Add comprehensive tests for reward matching - Remove unused imports, clean up dead code - Update README with download instructions

meta-cla · 2026-01-26T14:15:42Z

Hi @bhavishya-pohani!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

greptile-apps · 2026-01-26T14:19:30Z

Greptile Overview

Greptile Summary

Adds FinQA environment for evaluating LLMs on financial question-answering using SEC 10-K filing data. The environment follows OpenEnv architecture patterns correctly with client-server separation, reward computation in the server, and tool-based interaction model.

Key Changes:

Tool-calling environment with 4 tools: get_descriptions, get_table_info, sql_query, submit_answer
Fuzzy numerical matching for rewards (handles percentages, fractions, LaTeX formatting with 1% tolerance)
Auto-generated OpenAI tool schemas from function docstrings via introspection
290 questions across multiple companies from HuggingFace dataset
Comprehensive test suite (49 tests) covering various number formats and edge cases

Architecture Alignment:

✅ Rewards computed inside environment server (not client)
✅ Client-server separation maintained (follows HTTPEnvClient pattern)
✅ Environment inherits from core Environment interface
✅ Action/Observation/State follow core type patterns
✅ Docker-based deployment matching existing environments

Issues Found:

Critical bug in examples/finqa_inference.py:150 - undefined variable when no tool calls returned
Minor style issue accessing private _base attribute in inference script

Confidence Score: 4/5

Safe to merge after fixing the critical undefined variable bug in the inference script
Score reflects solid architecture following OpenEnv patterns with comprehensive testing, but docked one point for the critical logic error in examples/finqa_inference.py that would cause runtime crash in error handling path. The core environment implementation is well-designed with proper client-server separation, reward computation in server, extensive test coverage, and clean integration with existing patterns.
examples/finqa_inference.py requires immediate attention to fix undefined variable bug on line 150

Important Files Changed

Filename	Overview
examples/finqa_inference.py	Added inference script with undefined variable bug in error handling path and private attribute access
src/envs/finqa_env/client.py	HTTP client implementation following HTTPEnvClient pattern correctly
src/envs/finqa_env/server/finqa_environment.py	Environment implementation correctly inheriting from core Environment with reward computation in server
src/envs/finqa_env/server/tools.py	Tool implementations with SQL query validation and lazy loading of table metadata
src/envs/finqa_env/server/rewards.py	Comprehensive reward matching with fuzzy numerical comparison, percentage/fraction handling

Sequence Diagram

sequenceDiagram
    participant Agent
    participant Client as FinQAEnv<br/>(HTTP Client)
    participant Server as FastAPI Server
    participant Env as FinQAEnvironment
    participant Tools as FinQATools
    participant Rewards as Reward System
    
    Agent->>Client: from_docker_image("finqa-env:latest")
    Client->>Server: Start Docker container
    Server-->>Client: base_url
    
    Agent->>Client: reset()
    Client->>Server: POST /reset
    Server->>Env: reset()
    Env->>Env: Load next question from shuffled dataset
    Env-->>Server: FinQAObservation(question, company, tools)
    Server-->>Client: JSON response
    Client-->>Agent: StepResult(observation, reward=None, done=False)
    
    loop Until answer submitted or max_steps
        Agent->>Client: step(FinQAAction(tool_name, tool_args))
        Client->>Server: POST /step {tool_name, tool_args}
        Server->>Env: step(action)
        
        alt Tool is get_descriptions/get_table_info/sql_query
            Env->>Tools: execute_tool(tool_name, tool_args)
            Tools->>Tools: Load data from JSON files
            Tools->>Tools: Execute SQL in-memory (sqlite3)
            Tools-->>Env: (result_string, is_final=False)
            Env-->>Server: FinQAObservation(tool_result, done=False)
        else Tool is submit_answer
            Env->>Tools: execute_tool("submit_answer", {answer})
            Tools-->>Env: (confirmation, is_final=True)
            Env->>Rewards: compute_reward(submitted, ground_truth)
            Rewards->>Rewards: Parse numbers (%, fractions, LaTeX)
            Rewards->>Rewards: Compare with 1% tolerance + 1.0 abs diff
            Rewards-->>Env: 1.0 (correct) or 0.0 (incorrect)
            Env-->>Server: FinQAObservation(result, done=True, reward)
        end
        
        Server-->>Client: JSON response
        Client-->>Agent: StepResult(observation, reward, done)
    end
    
    Agent->>Server: GET /tools
    Server-->>Agent: OpenAI tool schemas (auto-generated from docstrings)

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

examples/finqa_inference.py

…numbers, add fixes & tests for multiple numbers in labels

bhavishya-pohani added 3 commits December 15, 2025 12:50

Add Snorkel's FinQA Environment(for Financial QA) benchmarking

8917a9a

readme minor fixes

6795486

greptile-apps bot reviewed Jan 26, 2026

View reviewed changes

examples/finqa_inference.py Outdated Show resolved Hide resolved

examples/finqa_inference.py Outdated Show resolved Hide resolved

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 26, 2026

add fixes from greptile's suggested changes, add fixes & tests for % …

e491702

…numbers, add fixes & tests for multiple numbers in labels

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FinQA Environment for Financial QA Benchmarking #329

Add FinQA Environment for Financial QA Benchmarking #329

Uh oh!

bhavishya-pohani commented Jan 26, 2026 •

edited

Loading

Uh oh!

meta-cla bot commented Jan 26, 2026

Uh oh!

greptile-apps bot commented Jan 26, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add FinQA Environment for Financial QA Benchmarking #329

Are you sure you want to change the base?

Add FinQA Environment for Financial QA Benchmarking #329

Uh oh!

Conversation

bhavishya-pohani commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Test Plan

Uh oh!

meta-cla bot commented Jan 26, 2026

Action Required

Process

Uh oh!

greptile-apps bot commented Jan 26, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bhavishya-pohani commented Jan 26, 2026 •

edited

Loading