Skip to content

Add jfinqa: Japanese Financial Numerical Reasoning QA #1168

@ajtgjmdjp

Description

@ajtgjmdjp

Summary

I'd like to add jfinqa — a Japanese financial numerical reasoning QA benchmark — as a new task in lighteval.

About jfinqa

  • 1,000 questions across 3 subtasks:
    • Numerical Reasoning (550): Calculate growth rates, margins, ratios from financial statements
    • Consistency Checking (200): Verify internal consistency of figures
    • Temporal Reasoning (250): Analyze year-over-year trends
  • 68 companies from EDINET (Japan's securities filing system)
  • Covers J-GAAP, IFRS, and US-GAAP accounting standards
  • HuggingFace Dataset: ajtgjmdjp/jfinqa
  • GitHub: ajtgjmdjp/jfinqa

Metrics

Two metrics per subtask:

  1. Exact Match — with Japanese financial normalisation (fullwidth→halfwidth, △→minus, comma removal, NFKC)
  2. Numerical Match — 1% relative tolerance, handles kanji multipliers (千/百万/億/兆) and unit suffixes (円/ドル/bps)

Prior Art

Baselines (zero-shot, temperature=0)

Model Overall Numerical Consistency Temporal
GPT-4o 87.0% 80.2% 90.5% 99.2%
Gemini 2.0 Flash 80.4% 86.2% 83.5% 65.2%
GPT-4o-mini 67.7% 79.3% 83.5% 29.6%
Qwen2.5-3B 39.6% 46.4% 51.0% 15.6%

I have a PR ready — happy to adjust the implementation based on your feedback (e.g., inspect-ai format if preferred).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions