eval: Add eval generator prompt#471
Open
noahlwest wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
Open
eval: Add eval generator prompt#471noahlwest wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
noahlwest wants to merge 1 commit intoGoogleCloudPlatform:mainfrom
Conversation
janetkuo
reviewed
Aug 12, 2025
| #!/bin/bash | ||
| set -e | ||
| NAMESPACE={The exact same namespace as setup.sh} | ||
| kubectl delete namespace $NAMESPACE --wait=false |
Member
There was a problem hiding this comment.
This assumes the test is namespace-scoped
prasad89
reviewed
Aug 13, 2025
| **THE TASK** | ||
|
|
||
| Now, using the role, criteria, format, and golden example above as your guide, generate a complete evaluation for the following user-provided task.\ | ||
| TASK: "{INSERT_EVALUATION_TOPIC_HERE}" |
Contributor
There was a problem hiding this comment.
Everything else looks good just a few questions:
- Can we enhance the markdown further, considering that LLMs can understand it better?
- Will evaluating only the topic be sufficient? Perhaps we could also include a description with do’s and don’ts, especially if we don’t plan to iterate further.
zvdy
reviewed
Aug 13, 2025
Comment on lines
+72
to
+79
| ```yaml | ||
| script: | ||
| - prompt: "Hey, I just deployed my 'finance-app' in the `finance-ns` namespace, but the pod seems to be stuck in a crash loop. Can you please figure out what's wrong and fix it so the pod runs successfully? | ||
| setup: "setup.sh" | ||
| verifier: "verify.sh" | ||
| cleanup: "cleanup.sh" | ||
| difficulty: "easy" | ||
| ``` |
Contributor
There was a problem hiding this comment.
missing closing quote in prompt "
Makes yaml possibly unreadable for small models
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a prompt that can be used to generate eval files, given a task description.
I'm also thinking of adding an eval scaffolding tool to make the boilerplate a little easier.
Looking for feedback on any improvements that could be made to the prompt @droot @zvdy @prasad89 @ShubyM @justinsb @janetkuo