Skip to content

Commit d271131

Browse files
authored
feat(jobs): add gemini and sonnet eval result (#24)
1 parent cf44623 commit d271131

File tree

8,058 files changed

+1900038
-950
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

8,058 files changed

+1900038
-950
lines changed
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
mkdir -p ~/.pochi && cat << EOF > ~/.pochi/config.jsonc
2+
{
3+
"providers": {
4+
"deepinfra": {
5+
"kind": "openai",
6+
"baseURL": "https://api.deepinfra.com/v1/openai",
7+
"apiKey": "DEEPINFRA_API_KEY",
8+
"models": {
9+
"zai-org/GLM-4.7-Flash": {
10+
"name": "glm-4.7-flash"
11+
},
12+
"moonshotai/Kimi-K2.5": {
13+
"name": "kimi-K2.5"
14+
},
15+
"Qwen/Qwen3-Coder-480B-A35B-Instruct": {
16+
"name": "qwen3-coder-480b"
17+
}
18+
}
19+
},
20+
"anthropic": {
21+
"kind": "openai",
22+
"baseURL": "https://api.anthropic.com/v1",
23+
"apiKey": "ANTHROPIC_API_KEY",
24+
"models": {
25+
"claude-opus-4-6": {
26+
"name": "claude-opus-4-6"
27+
}
28+
}
29+
},
30+
"openai": {
31+
"kind": "openai",
32+
"baseURL": "https://api.openai.com/v1",
33+
"apiKey": "OPENAI_API_KEY",
34+
"models": {
35+
"gpt-5.4": {
36+
"name": "gpt-5.4"
37+
}
38+
}
39+
}
40+
}
41+
}
42+
EOF
43+
sed -i 's/OPENAI_API_KEY/'$OPENAI_API_KEY'/g' ~/.pochi/config.jsonc && sed -i 's/DEEPINFRA_API_KEY/'$DEEPINFRA_API_KEY'/g' ~/.pochi/config.jsonc && sed -i 's/ANTHROPIC_API_KEY/'$ANTHROPIC_API_KEY'/g' ~/.pochi/config.jsonc && pochi --model anthropic/claude-4-6-sonnet --max-steps 200 --max-retries 10 --blobs-dir /logs/agent/pochi/blobs --stream-json /logs/agent/pochi/trajectory.jsonl > >(tee /logs/agent/pochi/stdout.txt) 2> >(tee /logs/agent/pochi/stderr.txt >&2) <<'EOF'
44+
# Prune Stale WIP Revisions Using Revsets
45+
46+
## Background
47+
48+
You are an open-source maintainer working on a project called `mylib` hosted on GitHub. You use Jujutsu (`jj`) as your version control system over a Git-backed repository.
49+
50+
Over the past few days you have been experimenting locally and left behind several scratch/WIP (work-in-progress) revisions that were never promoted to a proper bookmark or given a meaningful description. You now want to clean up the repository by **abandoning** those stale WIP revisions before pushing your real work upstream.
51+
52+
The repository already has the following history set up for you:
53+
54+
```
55+
root()
56+
└── "Initial project scaffold" ← bookmark: main
57+
└── "Add core library module" ← bookmark: feature/core
58+
├── "(no description set)" ← WIP scratch commit A (no bookmark)
59+
└── "(no description set)" ← WIP scratch commit B (no bookmark)
60+
```
61+
62+
Both WIP commits are:
63+
- Direct children of the `feature/core` bookmark commit
64+
- Empty of any useful description (description is blank / empty string)
65+
- Not pointed to by any local bookmark
66+
67+
Your task is to **abandon all mutable revisions that have an empty description AND are not ancestors of any local bookmark**, then verify the result.
68+
69+
## Starting Environment
70+
71+
- Working directory: `/home/user/mylib`
72+
- The repository is already initialised as a `jj` git-backed repo
73+
- User identity is already configured in jj config
74+
- The commit history described above is already in place
75+
- `jj` binary is available in `PATH`
76+
77+
## Your Task
78+
79+
Complete the following steps **inside `/home/user/mylib`**:
80+
81+
### Step 1 — Inspect the current log
82+
83+
Run `jj log` to see the current history and identify the WIP revisions:
84+
85+
```bash
86+
cd /home/user/mylib
87+
jj log
88+
```
89+
90+
You should see at least four revisions: the root, `main`, `feature/core`, and two WIP commits that have no description and no bookmark.
91+
92+
### Step 2 — Abandon all WIP revisions using a revset
93+
94+
Use a single `jj abandon` command with a revset expression to abandon **all mutable revisions that have an empty description and are not ancestors of any local bookmark**.
95+
96+
The revset you need is:
97+
98+
```
99+
mutable() & empty_description & ~::bookmarks()
100+
```
101+
102+
where `empty_description` is expressed using the `description("")` function (which matches commits whose description is the empty string).
103+
104+
The exact command to run:
105+
106+
```bash
107+
jj abandon 'mutable() & description("") & ~::bookmarks()'
108+
```
109+
110+
This revset reads as:
111+
- `mutable()` — only commits that are not immutable (i.e., not protected remote/tag heads)
112+
- `description("")` — only commits whose description is the empty string
113+
- `~::bookmarks()` — exclude commits that are ancestors-of (or equal to) any local bookmark target
114+
115+
**Important**: The `~::bookmarks()` part uses the `::` range operator to compute the set of all ancestors of bookmark targets (inclusive), and then negates it with `~`. This ensures you only abandon commits that are genuinely not reachable from any named bookmark.
116+
117+
### Step 3 — Verify the result
118+
119+
After abandoning, confirm the cleanup was successful:
120+
121+
```bash
122+
jj log
123+
```
124+
125+
You should now see **only** the revisions that have bookmarks (`main`, `feature/core`) and the working-copy commit. The two WIP scratch commits must no longer appear in the visible history.
126+
127+
Also verify with a revset query:
128+
129+
```bash
130+
jj log -r 'mutable() & description("") & ~::bookmarks()'
131+
```
132+
133+
This should output **nothing** (or an empty result), confirming all matching WIP revisions have been abandoned.
134+
135+
## Expected Final State
136+
137+
- The two WIP revisions (empty description, no bookmark) are **abandoned** (no longer visible)
138+
- The `main` bookmark still points to the "Initial project scaffold" commit
139+
- The `feature/core` bookmark still points to the "Add core library module" commit
140+
- `jj log -r 'mutable() & description("") & ~::bookmarks()'` returns an empty result
141+
- The file `/home/user/mylib/README.md` still exists (it was part of the scaffold)
142+
- The file `/home/user/mylib/src/lib.rs` still exists (it was part of the core module)
143+
144+
## Verification Conditions
145+
146+
Automated tests will check:
147+
1. `jj log -r 'mutable() & description("") & ~::bookmarks()'` in `/home/user/mylib` returns an empty result (exit 0 but no commits shown, or the message indicating no revisions)
148+
2. `jj bookmark list` shows both `main` and `feature/core` bookmarks still present
149+
3. The file `/home/user/mylib/README.md` exists
150+
4. The file `/home/user/mylib/src/lib.rs` exists
151+
152+
EOF
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
0
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
2026-03-19 07:20:44.447 DEBUG native:1 Pochi pochi v0.6.0
2+
2026-03-19 07:20:44.604 DEBUG /$bunfs/root/pochi:503027 loadAgents Loaded 4 custom agents (4 valid, 0 invalid)
3+
2026-03-19 07:20:44.605 DEBUG /$bunfs/root/pochi:503028 loadSkills Loaded 2 skills (2 valid, 0 invalid)
4+
2026-03-19 07:20:44.827 DEBUG /$bunfs/root/pochi:439288 PochiVendor JWT is expiring or missing, fetching a new one
5+
2026-03-19 07:20:46.330 DEBUG /$bunfs/root/pochi:415701 MCPHub MCP servers configuration changed via signal:
6+
2026-03-19 07:20:46.331 DEBUG /$bunfs/root/pochi:415813 MCPHub Build MCPHub Status
7+
2026-03-19 07:20:46.430 DEBUG /$bunfs/root/pochi:502453 TaskRunner Starting TaskRunner...
8+
2026-03-19 07:20:46.518 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
9+
2026-03-19 07:20:46.735 DEBUG /$bunfs/root/pochi:496495 generateTaskTitle Generating task title, old: null, new: # Prune Stale WIP Revisions Using Revsets
10+
2026-03-19 07:20:56.101 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
11+
2026-03-19 07:20:58.620 WARN  GitStatus readGitStatus timed out after 12000ms, returning undefined
12+
2026-03-19 07:20:58.986 DEBUG /$bunfs/root/pochi:496495 generateTaskTitle Generating task title, old: # Prune Stale WIP Revisions Using Revsets, new: Abandon stale WIP commits with empty descriptions using a jj revset expression
13+
2026-03-19 07:20:59.642 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
14+
2026-03-19 07:21:03.126 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
15+
2026-03-19 07:21:05.895 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
16+
2026-03-19 07:21:08.111 WARN  GitStatus readGitStatus timed out after 12000ms, returning undefined
17+
2026-03-19 07:21:11.647 WARN  GitStatus readGitStatus timed out after 12000ms, returning undefined
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
- Initializing MCP connections...
2+
You
3+
# Prune Stale WIP Revisions Using Revsets ## Background You are an open-source maintainer working on a project called `mylib` hosted on GitHub. You use Jujutsu (`jj`) as your version control system over a Git-backed repository. Over the past few days you have been experimenting locally and left behind several scratch/WIP (work-in-progress) revisions that were never promoted to a proper bookmark or given a meaningful description. You now want to clean up the repository by **abandoning** those stale WIP revisions before pushing your real work upstream. The repository already has the following history set up for you: ``` root() └── "Initial project scaffold" ← bookmark: main └── "Add core library module" ← bookmark: feature/core ├── "(no description set)" ← WIP scratch commit A (no bookmark) └── "(no description set)" ← WIP scratch commit B (no bookmark) ``` Both WIP commits are: * Direct children of the `feature/core` bookmark commit * Empty of any useful description (description is blank / empty string) * Not pointed to by any local bookmark Your task is to **abandon all mutable revisions that have an empty description AND are not ancestors of any local bookmark**, then verify the result. ## Starting Environment * Working directory: `/home/user/mylib` * The repository is already initialised as a `jj` git-backed repo * User identity is already configured in jj config * The commit history described above is already in place * `jj` binary is available in `PATH` ## Your Task Complete the following steps **inside `/home/user/mylib`**: ### Step 1 — Inspect the current log Run `jj log` to see the current history and identify the WIP revisions: ```bash cd /home/user/mylib jj log ``` You should see at least four revisions: the root, `main`, `feature/core`, and two WIP commits that have no description and no bookmark. ### Step 2 — Abandon all WIP revisions using a revset Use a single `jj abandon` command with a revset expression to abandon **all mutable revisions that have an empty description and are not ancestors of any local bookmark**. The revset you need is: ``` mutable() & empty_description & ~::bookmarks() ``` where `empty_description` is expressed using the `description("")` function (which matches commits whose description is the empty string). The exact command to run: ```bash jj abandon 'mutable() & description("") & ~::bookmarks()' ``` This revset reads as: * `mutable()` — only commits that are not immutable (i.e., not protected remote/tag heads) * `description("")` — only commits whose description is the empty string * `~::bookmarks()` — exclude commits that are ancestors-of (or equal to) any local bookmark target **Important**: The `~::bookmarks()` part uses the `::` range operator to compute the set of all ancestors of bookmark targets (inclusive), and then negates it with `~`. This ensures you only abandon commits that are genuinely not reachable from any named bookmark. ### Step 3 — Verify the result After abandoning, confirm the cleanup was successful: ```bash jj log ``` You should now see **only** the revisions that have bookmarks (`main`, `feature/core`) and the working-copy commit. The two WIP scratch commits must no longer appear in the visible history. Also verify with a revset query: ```bash jj log -r 'mutable() & description("") & ~::bookmarks()' ``` This should output **nothing** (or an empty result), confirming all matching WIP revisions have been abandoned. ## Expected Final State * The two WIP revisions (empty description, no bookmark) are **abandoned** (no longer visible) * The `main` bookmark still points to the "Initial project scaffold" commit * The `feature/core` bookmark still points to the "Add core library module" commit * `jj log -r 'mutable() & description("") & ~::bookmarks()'` returns an empty result * The file `/home/user/mylib/README.md` still exists (it was part of the scaffold) * The file `/home/user/mylib/src/lib.rs` still exists (it was part of the core module) ## Verification Conditions Automated tests will check: 1. `jj log -r 'mutable() & description("") & ~::bookmarks()'` in `/home/user/mylib` returns an empty result (exit 0 but no commits shown, or the message indicating no revisions) 2. `jj bookmark list` shows both `main` and `feature/core` bookmarks still present 3. The file `/home/user/mylib/README.md` exists 4. The file `/home/user/mylib/src/lib.rs` exists
4+
5+
Pochi
6+
💭 Thinking for 137 characters
7+
💫 Executing cd /home/user/mylib && jj log ✔
8+
The two WIP commits (`zrxxqmpv` and `xztlzoov`) are visible — no description, no bookmark. Now abandoning them with the revset:
9+
💫 Executing cd /home/user/mylib && jj abandon 'mutable() & description("") & ~::bookmarks()' ✔
10+
Both WIP commits abandoned. Now verifying:
11+
💫 Executing cd /home/user/mylib && jj log && echo "---" && jj log -r 'mutable() & description("") & ~::bookmarks()' && echo "---" && jj bookmark list ✔
12+
All verification checks pass:
13+
💫 Executing ls /home/user/mylib/README.md /home/user/mylib/src/lib.rs ✔
14+
🎉 Task Completed
15+
All steps completed successfully:
16+
17+
1. **Inspected the log** — confirmed two WIP commits (`zrxxqmpv`, `xztlzoov`) with no description and no bookmark, both children of `feature/core`.
18+
19+
2. **Abandoned WIP revisions** using:
20+
```
21+
jj abandon 'mutable() & description("") & ~::bookmarks()'
22+
```
23+
Output confirmed: `Abandoned 2 commits`.
24+
25+
3. **Verified the result**:
26+
- `jj log` now shows only `main`, `feature/core`, and the working-copy — no WIP commits.
27+
- `jj log -r 'mutable() & description("") & ~::bookmarks()'` returns an empty result.
28+
- `jj bookmark list` shows both `feature/core` and `main` intact.
29+
- `README.md` and `src/lib.rs` both exist.
30+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
4+
apt-get update
5+
apt-get install -y curl ripgrep
6+
7+
# Install Pochi
8+
9+
echo "downloading latest pochi"
10+
curl -fsSL https://getpochi.com/install.sh | bash
11+
12+
13+
ln -s ~/.pochi/bin/pochi /usr/local/bin/pochi
14+
mkdir -p /logs/agent/pochi
15+
16+
pochi --version
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
2026-03-19 07:20:44.447 DEBUG native:1 Pochi pochi v0.6.0
2+
2026-03-19 07:20:44.604 DEBUG /$bunfs/root/pochi:503027 loadAgents Loaded 4 custom agents (4 valid, 0 invalid)
3+
2026-03-19 07:20:44.605 DEBUG /$bunfs/root/pochi:503028 loadSkills Loaded 2 skills (2 valid, 0 invalid)
4+
2026-03-19 07:20:44.827 DEBUG /$bunfs/root/pochi:439288 PochiVendor JWT is expiring or missing, fetching a new one
5+
2026-03-19 07:20:46.330 DEBUG /$bunfs/root/pochi:415701 MCPHub MCP servers configuration changed via signal:
6+
2026-03-19 07:20:46.331 DEBUG /$bunfs/root/pochi:415813 MCPHub Build MCPHub Status
7+
2026-03-19 07:20:46.430 DEBUG /$bunfs/root/pochi:502453 TaskRunner Starting TaskRunner...
8+
2026-03-19 07:20:46.518 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
9+
2026-03-19 07:20:46.735 DEBUG /$bunfs/root/pochi:496495 generateTaskTitle Generating task title, old: null, new: # Prune Stale WIP Revisions Using Revsets
10+
2026-03-19 07:20:56.101 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
11+
2026-03-19 07:20:58.620 WARN  GitStatus readGitStatus timed out after 12000ms, returning undefined
12+
2026-03-19 07:20:58.986 DEBUG /$bunfs/root/pochi:496495 generateTaskTitle Generating task title, old: # Prune Stale WIP Revisions Using Revsets, new: Abandon stale WIP commits with empty descriptions using a jj revset expression
13+
2026-03-19 07:20:59.642 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
14+
2026-03-19 07:21:03.126 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
15+
2026-03-19 07:21:05.895 DEBUG /$bunfs/root/pochi:454195 listFiles Listing workspace files from /home/user/mylib with maxItems 500
16+
2026-03-19 07:21:08.111 WARN  GitStatus readGitStatus timed out after 12000ms, returning undefined
17+
2026-03-19 07:21:11.647 WARN  GitStatus readGitStatus timed out after 12000ms, returning undefined

0 commit comments

Comments
 (0)