feat: add HiPhO benchmark task#1186
Merged
Merged
Conversation
Luodian
added a commit
that referenced
this pull request
Feb 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hiphobenchmark underlmms_eval/tasks/hiphowith 13 split tasks mapped to allHY-Wan/HiPhOofficial splitshipho.yamlgroup + split YAMLs +_default_template_yaml)Validation
uv run pre-commit run --all-filesHF_TOKEN='' HUGGINGFACE_HUB_TOKEN='' uv run python -m lmms_eval --model dummy_video_reader --model_args "response=The final answer is \\boxed{1}" --tasks hipho --limit 8 --batch_size 1 --log_samples --output_path ./outputs/hipho-smokeValidation Results
hiphogroup and all 13 subtasks (e.g.,hipho_apho_2025,hipho_ipho_2025,hipho_panpho_2025)outputs/hipho-smoke/20260222_225825_results.jsonoutputs/hipho-smoke/20260222_225825_samples_hipho_apho_2025.jsonl"filtered_resps": "The final answer is \\boxed{1}")Smoke Validation (limit=8)
Status: PASS (LMM-299 / hipho)
Output Table
Sample Output
Sample 1 (doc_id: 0)
hipho_score= 0.9999999999999999 (id: APhO_2025_1_A_1, source: APhO_2025, field: Mechanics, parsed_prediction: ['\beta=2', '\gamma=-1', '\delta=4'], prediction: This problem requires a dimensional analysis to determine the exponentsFirst, we establish the dimensions of each quantity using the base dimensions Length ($L$ ), Mass ($M$ ), and Time ($T$ ).
Dimension of$h_{\max}$ (and $R$ ):$h_{\max} = R_e - R_p$ and $R$ is the average radius, $h_{\max}$ has the dimension of length:
$$[h_{\max}] = L$$
Since
Dimension of$G$ (Gravitational Constant):$F = G \frac{M_1 M_2}{r^2}$ .
$$[G] = \frac{[F] [r^2]}{[M]^2} = \frac{(M L T^{-2}) L^2}{M^2} = L^3 M^{-1} T^{-2}$$
From Newton's Law of Gravitation,
(This corresponds to the first guideline point.)
Dimension of$\omega$ (Angular Speed):
$$\text{Angular speed} = \frac{\text{Angle}}{\text{Time}} \implies [\omega] = T^{-1}$$
Dimension of$M_E$ (Mass of Earth):
$$[M_E] = M$$
Dimension of$R$ (Radius):
$$[R] = L$$
Now, we equate the dimensions on both sides of the proportionality relation:
$$[h_{\max}] = [G^{-1}] [\omega]^{\beta} [M_E]^{\gamma} [R]^{\delta}$$
$$L = (L^3 M^{-1} T^{-2})^{-1} (T^{-1})^{\beta} (M)^{\gamma} (L)^{\delta}$$
$$L^1 M^0 T^0 = L^{-3} M^{1} T^{2} \cdot T^{-\beta} \cdot M^{\gamma} \cdot L^{\delta}$$
$$L^1 M^0 T^0 = L^{\delta - 3} M^{1 + \gamma} T^{2 - \beta}$$
By comparing the exponents for$L$ , $M$ , and $T$ :
For Time ($T$ ):
$$0 = 2 - \beta$$
$$\beta = 2$$
(This corresponds to the second and fifth guideline points.)
For Mass ($M$ ):
$$0 = 1 + \gamma$$
$$\gamma = -1$$
(This corresponds to the third and sixth guideline points.)
For Length ($L$ ):
$$1 = \delta - 3$$
$$\delta = 4$$
(This corresponds to the fourth and seventh guideline points.)
The required exponents are:$\beta = 2$ $\gamma = -1$ $\delta = 4$
(1)
(2)
(3)
Final Answer: \boxed{\beta=2, \gamma=-1, \delta=4}, answers: ['\boxed{$\beta = 2$}', '\boxed{$\gamma = -1$}', '\boxed{$\delta = 4$}'])
Sample 2 (doc_id: 1)
hipho_score= 0.0 (id: APhO_2025_1_A_2, source: APhO_2025, field: Mechanics, parsed_prediction: ['$$h_{\max} = \frac{(7.2722 \times 10^{-5} \text{ s}^{-1})^2 \cdot (6.371 \times 10^{6} \text{ m})^4}{(6.67 \times 1'], prediction: This problem asks for the exponentsPart (A.1): Finding the exponents$\beta$ , $\gamma$ , and $\delta$
The relationship is given by dimensional analysis:
$$h_{\max} \propto G^{-1} \omega^{\beta} M_E^{\gamma} R^{\delta}$$
We first write down the dimensions of each quantity using the fundamental dimensions: Mass ($M$ ), Length ($L$ ), and Time ($T$ ).
Now we substitute these dimensions into the proportionality relation:
$$[h_{\max}] = [G^{-1}] [\omega]^{\beta} [M_E]^{\gamma} [R]^{\delta}$$
$$L = (M^{-1} L^3 T^{-2})^{-1} (T^{-1})^{\beta} (M)^{\gamma} (L)^{\delta}$$
$$L = (M L^{-3} T^{2}) (T^{-\beta}) (M^{\gamma}) (L^{\delta})$$
$$L^1 M^0 T^0 = M^{1+\gamma} L^{-3+\delta} T^{2-\beta}$$
We equate the exponents for$M$ , $L$ , and $T$ :
For Mass ($M$ ):
$$0 = 1 + \gamma \implies \gamma = -1$$
For Time ($T$ ):
$$0 = 2 - \beta \implies \beta = 2$$
For Length ($L$ ):
$$1 = -3 + \delta \implies \delta = 4$$
The exponents are$\beta = 2$ , $\gamma = -1$ , and $\delta = 4$ .
$$h_{\max} \propto G^{-1} \omega^{2} M_E^{-1} R^{4}$$
$$h_{\max} \propto \frac{\omega^2 R^4}{G M_E}$$
Thus, the dimensional relationship is:
Calculation of$h_{\max}$
The problem asks to calculate the numerical value of$h_{\max}$ assuming the dimensionless factor is 1:
$$h_{\max} = \frac{\omega^2 R^4}{G M_E}$$
Step 1: Calculate the angular speed$\omega$ .
$$T = 24 \text{ hours} = 24 \times 3600 \text{ s} = 86400 \text{ s}$$
$$\omega = \frac{2\pi}{T} = \frac{2\pi}{86400 \text{ s}} \approx 7.2722 \times 10^{-5} \text{ s}^{-1}$$ $7.27 \times 10^{-5} \text{ s}^{-1}$ , which we will use for consistency if needed, but we use the more precise value for the final calculation.)
The Earth rotates once per day (approximately 24 hours).
(The guideline suggests
Step 2: Substitute the values.
Constants provided:
$$h_{\max} = \frac{(7.2722 \times 10^{-5} \text{ s}^{-1})^2 \cdot (6.371 \times 10^{6} \text{ m})^4}{(6.67 \times 1, answers: ['\boxed{21.9}'])
Test Params
uv run python -m lmms_eval --model openai --model_args "model_version=google/gemini-2.5-flash-lite-preview-09-2025" --tasks hipho --batch_size 1 --limit 8 --log_samples