Skip to content

OLMo evaluation tasks not found #638

@chen-hao-chao

Description

@chen-hao-chao

Hi,

Thank you for your great work. I am using your Docker image (docker://ghcr.io/allenai/olmo-core:tch270cu128-v2.1.0) and trying to launch a pretraining run for the 7B model (src/scripts/official/OLMo3/OLMo-3-1025-7B-pretrain-1.py). However, I encountered errors during the downstream evaluation step. The error message says that the tasks do not exist, so I checked list_tasks in the olmo_eval package — but none of the listed tasks have a corresponding implementation available.

Could this be a configuration issue?

Below is my code for checking task availability:

  • Checking code:
from olmo_eval import list_tasks

all_tasks = list_tasks()
print(f"Total tasks available: {len(all_tasks)}\n")

needed_tasks = [
    "arc_challenge_test_bpb_5shot",
    "arc_challenge_test_mc_5shot_fast",
    "arc_easy_test_bpb_5shot",
    "arc_easy_test_mc_5shot_fast",
    "hellaswag_bpb_5shot",
    "mmlu_humanities_test_bpb_5shot",
    "mmlu_humanities_test_mc_5shot_fast",
    "mmlu_other_test_bpb_5shot",
    "mmlu_other_test_mc_5shot_fast",
    "mmlu_social_sciences_test_bpb_5shot",
    "mmlu_social_sciences_test_mc_5shot_fast",
    "mmlu_stem_test_bpb_5shot",
    "mmlu_stem_test_mc_5shot_fast",
    # Basic Skills
    "basic_skills_arithmetic_rc_5shot",
    "basic_skills_coding_rc_5shot",
    "basic_skills_common_knowledge_rc_5shot",
    "basic_skills_logical_reasoning_rc_5shot",
    "basic_skills_pattern_rc_5shot",
    "basic_skills_string_operations_rc_5shot",
    # Gen tasks BPB
    "codex_humaneval_gold_bpb_3shot",
    "codex_mbpp_gold_bpb_3shot",
    "minerva_math_500_gold_bpb_0shot",
    "mt_mbpp_cpp_gold_bpb_3shot",
    "mt_mbpp_java_gold_bpb_3shot",
    "mt_mbpp_rust_gold_bpb_3shot",
    # Sanity check for MCQA ability
    "copycolors_10way_fast",
]

print("Checking for needed tasks:")
for task in needed_tasks:
    status = "✓ FOUND" if task in all_tasks else "✗ MISSING"
    print(f"  {status}: {task}")
  • Results:
Total tasks available: 167

Checking for needed tasks: 
✗ MISSING: arc_challenge_test_bpb_5shot 
✗ MISSING: arc_challenge_test_mc_5shot_fast 
✗ MISSING: arc_easy_test_bpb_5shot 
✗ MISSING: arc_easy_test_mc_5shot_fast 
✗ MISSING: hellaswag_bpb_5shot 
✗ MISSING: mmlu_humanities_test_bpb_5shot 
✗ MISSING: mmlu_humanities_test_mc_5shot_fast 
✗ MISSING: mmlu_other_test_bpb_5shot 
✗ MISSING: mmlu_other_test_mc_5shot_fast 
✗ MISSING: mmlu_social_sciences_test_bpb_5shot 
✗ MISSING: mmlu_social_sciences_test_mc_5shot_fast 
✗ MISSING: mmlu_stem_test_bpb_5shot 
✗ MISSING: mmlu_stem_test_mc_5shot_fast 
✗ MISSING: basic_skills_arithmetic_rc_5shot 
✗ MISSING: basic_skills_coding_rc_5shot 
✗ MISSING: basic_skills_common_knowledge_rc_5shot 
✗ MISSING: basic_skills_logical_reasoning_rc_5shot 
✗ MISSING: basic_skills_pattern_rc_5shot 
✗ MISSING: basic_skills_string_operations_rc_5shot 
✗ MISSING: codex_humaneval_gold_bpb_3shot 
✗ MISSING: codex_mbpp_gold_bpb_3shot 
✗ MISSING: minerva_math_500_gold_bpb_0shot 
✗ MISSING: mt_mbpp_cpp_gold_bpb_3shot 
✗ MISSING: mt_mbpp_java_gold_bpb_3shot 
✗ MISSING: mt_mbpp_rust_gold_bpb_3shot 
✗ MISSING: copycolors_10way_fast

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions