-
Notifications
You must be signed in to change notification settings - Fork 192
Evaluation on LiveCodeBench-Cpp #885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 35 commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
7690e27
adding eval support for lcb-c++
wasiahmad 1b8fe33
setting language default
wasiahmad 4931f45
prefix for release version is only needed for python
wasiahmad 07deb25
for c++, we need python interpretor for corresponding eval harness
wasiahmad 7e6720a
code eval docs updated
wasiahmad 0c5ca9f
Merge branch 'main' into livecodebench_cpp
wasiahmad ffdad7d
minor eval docs update
wasiahmad ee346da
minor eval docs update
wasiahmad 84a8a0e
debugging generate with sandbox
wasiahmad 535c8af
debugging generate with sandbox
wasiahmad e29d05d
debugging generate with sandbox
wasiahmad d3f83ea
debugging generate with sandbox
wasiahmad ac2dbda
debugging generate with sandbox
wasiahmad a846042
debugging generate with sandbox
wasiahmad dc91b99
debugging generate with sandbox
wasiahmad 05b94d4
getting back all changes
wasiahmad 6547325
setting KEEP_MOUNTS_FOR_SANDBOX = True for lcb eval
wasiahmad e3cc020
Merge remote-tracking branch 'origin/main' into livecodebench_cpp
wasiahmad 5db2f25
Merge branch 'main' into livecodebench_cpp
wasiahmad 3bfa508
Merge branch 'main' into livecodebench_cpp
wasiahmad 02b613f
Increase sandbox client timeouts and skip code re-execution on timeou…
i-vainn 8c4fbea
Control max_concurrent_requests in subclasses with parallel generatio…
Kipok e6bdf63
Token count for BFCL (#896)
shtoshni 96dfea3
MT datasets FLORES200 and WMT24pp (#892)
AlexGrinch 3af8f58
Add responses api type (#889)
smahdavi4 bf5be80
Add concurrent semaphore control to llm base class (#907)
smahdavi4 8ed94f0
Add qos slurm parameter (#906)
wedu-nvidia 2f355b2
Fix typo in default tools parameter for token count (#910)
Kipok 4c1863e
resolving conflicts
wasiahmad 428be52
Slurm tests for code execution timeouts (#905)
i-vainn b1df71b
Add copyright checks workflow (#912)
activatedgeek 308907a
Context error recovery (#914)
shtoshni a8a0952
Fix MCP tests (#916)
gwarmstrong 03ff6a9
Addding BFCL headers (#917)
shtoshni 9c1060b
updated year in copyright message
wasiahmad 1bc7c26
lcb-cpp eval without sandbox
wasiahmad f5ee7b6
minor doc update
wasiahmad d8db2c7
sanbodx use logic updated with retries
wasiahmad 8b48105
resolving conflicts
wasiahmad f311171
minor bug fix
wasiahmad bd19311
removing unwanted print statement
wasiahmad bf31d28
Merge branch 'main' into livecodebench_cpp
wasiahmad 02ad4b3
Merge branch 'main' into livecodebench_cpp
wasiahmad aea6c79
added a comment
wasiahmad 54a2aae
Merge branch 'main' into livecodebench_cpp
wasiahmad e4b3cb8
ojbench uses tries when using sandbox
wasiahmad File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| name: Copyright check | ||
|
|
||
| on: | ||
| pull_request: | ||
|
|
||
| jobs: | ||
| copyright-check: | ||
| uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_copyright_check.yml@v0.2.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.