Skip to content

feat: add HiPhO benchmark task#1186

Merged
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-299-hipho
Feb 23, 2026
Merged

feat: add HiPhO benchmark task#1186
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-299-hipho

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • add a new hipho benchmark under lmms_eval/tasks/hipho with 13 split tasks mapped to all HY-Wan/HiPhO official splits
  • implement shared task utilities for prompt construction, visual extraction, boxed-answer parsing, and weighted scoring using per-subquestion points
  • wire task auto-discovery through YAML-only task definitions (hipho.yaml group + split YAMLs + _default_template_yaml)

Validation

  • uv run pre-commit run --all-files
  • HF_TOKEN='' HUGGINGFACE_HUB_TOKEN='' uv run python -m lmms_eval --model dummy_video_reader --model_args "response=The final answer is \\boxed{1}" --tasks hipho --limit 8 --batch_size 1 --log_samples --output_path ./outputs/hipho-smoke

Validation Results

  • score table printed for hipho group and all 13 subtasks (e.g., hipho_apho_2025, hipho_ipho_2025, hipho_panpho_2025)
  • aggregated metrics artifact: outputs/hipho-smoke/20260222_225825_results.json
  • sample logs artifact: outputs/hipho-smoke/20260222_225825_samples_hipho_apho_2025.jsonl
  • verified non-empty model outputs in JSONL ("filtered_resps": "The final answer is \\boxed{1}")

Smoke Validation (limit=8)

Status: PASS (LMM-299 / hipho)

Output Table

Metric Value
hipho_apho_2025 25.0
hipho_eupho_2024 14.3
hipho_ipho_2024 21.7
hipho_ipho_2025 37.5
hipho_nbpho_2025 12.5
hipho_panmechanics_2024 12.5
hipho_panmechanics_2025 12.5
hipho_panpho_2024 2.1
hipho_panpho_2025 12.5

Sample Output

Sample 1 (doc_id: 0)

  • Input: You are solving a physics olympiad question. ↵ ↵ If the problem has multiple sub-answers, provide them in order. ↵ ↵ End with a single line in this format: Final Answer: \boxed{...}. ↵ ↵ Context: ↵ [Precession of the Earth's axis] ↵ ↵ [Introduction] ↵ ↵ It has been known since ancient times t…
  • Model Output: This problem requires a dimensional analysis to determine the exponents $\beta$, $\gamma$, and $\delta$ in the relationship for $h_{\max}$: ↵ $$h_{\max} \propto G^{-1} \omega^{\beta} M_E^{\gamma} R^{\delta}$$ ↵ ↵ First, we establish the dimensions of each quantity using the base dimensions Length ($L$), Mass ($M$), and Time ($T$). ↵ ↵ 1. Dimension of $h_{\max}$ (and $R$): ↵ Since $h_{\max} = R_e - R_p$ and $R$ is the average radius, $h_{\max}$ has the dimension of length: ↵ $$[h_{…
  • Reference: ["\boxed{$\beta = 2$}", "\boxed{$\gamma = -1$}", "\boxed{$\delta = 4$}"]
  • Scores: hipho_score = 0.9999999999999999 (id: APhO_2025_1_A_1, source: APhO_2025, field: Mechanics, parsed_prediction: ['\beta=2', '\gamma=-1', '\delta=4'], prediction: This problem requires a dimensional analysis to determine the exponents $\beta$, $\gamma$, and $\delta$ in the relationship for $h_{\max}$:
    $$h_{\max} \propto G^{-1} \omega^{\beta} M_E^{\gamma} R^{\delta}$$

First, we establish the dimensions of each quantity using the base dimensions Length ($L$), Mass ($M$), and Time ($T$).

  1. Dimension of $h_{\max}$ (and $R$):
    Since $h_{\max} = R_e - R_p$ and $R$ is the average radius, $h_{\max}$ has the dimension of length:
    $$[h_{\max}] = L$$

  2. Dimension of $G$ (Gravitational Constant):
    From Newton's Law of Gravitation, $F = G \frac{M_1 M_2}{r^2}$.
    $$[G] = \frac{[F] [r^2]}{[M]^2} = \frac{(M L T^{-2}) L^2}{M^2} = L^3 M^{-1} T^{-2}$$
    (This corresponds to the first guideline point.)

  3. Dimension of $\omega$ (Angular Speed):
    $$\text{Angular speed} = \frac{\text{Angle}}{\text{Time}} \implies [\omega] = T^{-1}$$

  4. Dimension of $M_E$ (Mass of Earth):
    $$[M_E] = M$$

  5. Dimension of $R$ (Radius):
    $$[R] = L$$

Now, we equate the dimensions on both sides of the proportionality relation:
$$[h_{\max}] = [G^{-1}] [\omega]^{\beta} [M_E]^{\gamma} [R]^{\delta}$$
$$L = (L^3 M^{-1} T^{-2})^{-1} (T^{-1})^{\beta} (M)^{\gamma} (L)^{\delta}$$
$$L^1 M^0 T^0 = L^{-3} M^{1} T^{2} \cdot T^{-\beta} \cdot M^{\gamma} \cdot L^{\delta}$$
$$L^1 M^0 T^0 = L^{\delta - 3} M^{1 + \gamma} T^{2 - \beta}$$

By comparing the exponents for $L$, $M$, and $T$:

For Time ($T$):
$$0 = 2 - \beta$$
$$\beta = 2$$
(This corresponds to the second and fifth guideline points.)

For Mass ($M$):
$$0 = 1 + \gamma$$
$$\gamma = -1$$
(This corresponds to the third and sixth guideline points.)

For Length ($L$):
$$1 = \delta - 3$$
$$\delta = 4$$
(This corresponds to the fourth and seventh guideline points.)

The required exponents are:
(1) $\beta = 2$
(2) $\gamma = -1$
(3) $\delta = 4$

Final Answer: \boxed{\beta=2, \gamma=-1, \delta=4}, answers: ['\boxed{$\beta = 2$}', '\boxed{$\gamma = -1$}', '\boxed{$\delta = 4$}'])

  • Tokens: output=697, reasoning=0

Sample 2 (doc_id: 1)

  • Input: You are solving a physics olympiad question. ↵ ↵ If the problem has multiple sub-answers, provide them in order. ↵ ↵ End with a single line in this format: Final Answer: \boxed{...}. ↵ ↵ Context: ↵ [Precession of the Earth's axis] ↵ ↵ [Introduction] ↵ ↵ It has been known since ancient times t…
  • Model Output: This problem asks for the exponents $\beta, \gamma, \delta$ in the dimensional relationship for $h_{\max}$, and then to calculate the numerical value of $h_{\max}$ using the provided constants and an assumed dimensionless factor of 1. ↵ ↵ ### Part (A.1): Finding the exponents $\beta$, $\gamma$, and $\delta$ ↵ ↵ The relationship is given by dimensional analysis: ↵ $$h_{\max} \propto G^{-1} \omega^{\beta} M_E^{\gamma} R^{\delta}$$ ↵ ↵ We first write down the dimensions of each quantity using th…
  • Reference: ["\boxed{21.9}"]
  • Scores: hipho_score = 0.0 (id: APhO_2025_1_A_2, source: APhO_2025, field: Mechanics, parsed_prediction: ['$$h_{\max} = \frac{(7.2722 \times 10^{-5} \text{ s}^{-1})^2 \cdot (6.371 \times 10^{6} \text{ m})^4}{(6.67 \times 1'], prediction: This problem asks for the exponents $\beta, \gamma, \delta$ in the dimensional relationship for $h_{\max}$, and then to calculate the numerical value of $h_{\max}$ using the provided constants and an assumed dimensionless factor of 1.

Part (A.1): Finding the exponents $\beta$, $\gamma$, and $\delta$

The relationship is given by dimensional analysis:
$$h_{\max} \propto G^{-1} \omega^{\beta} M_E^{\gamma} R^{\delta}$$

We first write down the dimensions of each quantity using the fundamental dimensions: Mass ($M$), Length ($L$), and Time ($T$).

  1. $h_{\max}$ (Difference in radii): Length, $[h_{\max}] = L$
  2. $G$ (Gravitational constant): $[G] = \frac{\text{Force} \cdot \text{Distance}^2}{\text{Mass}^2} = \frac{(MLT^{-2}) L^2}{M^2} = M^{-1} L^3 T^{-2}$
  3. $\omega$ (Angular speed): $[ \omega ] = T^{-1}$
  4. $M_E$ (Mass of Earth): $[M_E] = M$
  5. $R$ (Radius): $[R] = L$

Now we substitute these dimensions into the proportionality relation:
$$[h_{\max}] = [G^{-1}] [\omega]^{\beta} [M_E]^{\gamma} [R]^{\delta}$$
$$L = (M^{-1} L^3 T^{-2})^{-1} (T^{-1})^{\beta} (M)^{\gamma} (L)^{\delta}$$
$$L = (M L^{-3} T^{2}) (T^{-\beta}) (M^{\gamma}) (L^{\delta})$$
$$L^1 M^0 T^0 = M^{1+\gamma} L^{-3+\delta} T^{2-\beta}$$

We equate the exponents for $M$, $L$, and $T$:

For Mass ($M$):
$$0 = 1 + \gamma \implies \gamma = -1$$

For Time ($T$):
$$0 = 2 - \beta \implies \beta = 2$$

For Length ($L$):
$$1 = -3 + \delta \implies \delta = 4$$

The exponents are $\beta = 2$, $\gamma = -1$, and $\delta = 4$.
Thus, the dimensional relationship is:
$$h_{\max} \propto G^{-1} \omega^{2} M_E^{-1} R^{4}$$
$$h_{\max} \propto \frac{\omega^2 R^4}{G M_E}$$

Calculation of $h_{\max}$

The problem asks to calculate the numerical value of $h_{\max}$ assuming the dimensionless factor is 1:
$$h_{\max} = \frac{\omega^2 R^4}{G M_E}$$

Step 1: Calculate the angular speed $\omega$.
The Earth rotates once per day (approximately 24 hours).
$$T = 24 \text{ hours} = 24 \times 3600 \text{ s} = 86400 \text{ s}$$
$$\omega = \frac{2\pi}{T} = \frac{2\pi}{86400 \text{ s}} \approx 7.2722 \times 10^{-5} \text{ s}^{-1}$$
(The guideline suggests $7.27 \times 10^{-5} \text{ s}^{-1}$, which we will use for consistency if needed, but we use the more precise value for the final calculation.)

Step 2: Substitute the values.
Constants provided:

  • $G = 6.67 \times 10^{-11} \mathrm{Nm}^{2} / \mathrm{kg}^{2}$
  • $M_E = 5.972 \times 10^{24} \mathrm{kg}$
  • $R = 6.371 \times 10^{6} \mathrm{m}$ (Using the average radius as specified in the context for the calculation, although $R_e$ is the radius relevant to $h_{\max}$)

$$h_{\max} = \frac{(7.2722 \times 10^{-5} \text{ s}^{-1})^2 \cdot (6.371 \times 10^{6} \text{ m})^4}{(6.67 \times 1, answers: ['\boxed{21.9}'])

  • Tokens: output=1024, reasoning=0

Test Params

uv run python -m lmms_eval --model openai --model_args "model_version=google/gemini-2.5-flash-lite-preview-09-2025" --tasks hipho --batch_size 1 --limit 8 --log_samples

@Luodian Luodian merged commit 5dee848 into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-299-hipho branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant