feat(tasks): add LongProc benchmark (6 task types, 16 configs)#3544
Open
xiye17 wants to merge 2 commits intoEleutherAI:mainfrom
Open
feat(tasks): add LongProc benchmark (6 task types, 16 configs)#3544xiye17 wants to merge 2 commits intoEleutherAI:mainfrom
xiye17 wants to merge 2 commits intoEleutherAI:mainfrom
Conversation
Add the LongProc long-form procedural generation benchmark from "LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation" (COLM 2025). Dataset: PrincetonPli/LongProc on HuggingFace Hub Task types and configs: - countdown (0.5k, 2k, 8k) - path_traversal (0.5k, 2k, 8k) - tom_tracking (0.5k, 2k, 8k) - html_to_tsv (0.5k, 2k, 8k) - pseudo_to_code (0.5k, 2k) - travel_planning (2k, 8k) Group hierarchy: longproc -> 6 sub-groups -> 16 individual tasks Evaluation logic ported from the original LongProc codebase into metrics.py with task-specific process_results functions. Optional dependencies: - pandas (html_to_tsv evaluation) - g++ with C++11 support (pseudo_to_code evaluation)
Contributor
|
Hi! Thanks for the PR! LGTM, just one small addition, could you add |
Author
|
Thanks a lot for your quick review! I just marked the two subtasks requiring code execution. Let me know if I need to add the flag more globally for the whole group. |
a2e5e2c to
4a95009
Compare
Author
Hi I just mark the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hi,
This is the author from LongProc, trying to:
(paper, COLM 2025) as a new task suite
Files added (29)
Optional dependencies
Test plan