-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathvmware_labbench.yaml
More file actions
162 lines (142 loc) · 6.34 KB
/
Copy pathvmware_labbench.yaml
File metadata and controls
162 lines (142 loc) · 6.34 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
# Formerly called: anthropic_filemap.yaml
# This template is heavily inspired by anthropic's computer use demo, but you can use
# it with any LM.
agent:
run_mode: swe_gui
max_trajectory_length: 200
use_workflow_memory: false
update_workflow_memory: false
templates:
system_template: |-
You are a helpful assistant that can interact with a computer to solve tasks. You are good at understanding scientific contents, using softwares, browsing the internet like a biologist.
Try to be efficient as possible and gives the final answer once you have enough information. Don't perform excessive double-checking.
When you are finished, generate this in your response: use bash echo to print:
The answer is A/B/C/D: [actual answer text]. **Task Completed**
instance_template: |-
<working_dir>
{{working_dir}}
</working_dir>
Consider the following problem statement:
<pr_description>
{{problem_statement}}
</pr_description>
Whenever you view a image, you must analyze it and write down your key observations in the subsequent thought, before proceeding to the next step.
next_step_template: |-
OBSERVATION:
{{observation}}
next_step_no_output_template: |-
Your command ran successfully and did not produce any output.
tools:
type: vmware
execution_timeout: 1200
total_execution_timeout: 72000
env_variables:
PAGER: cat
MANPAGER: cat
LESS: -R
PIP_PROGRESS_BAR: 'off'
TQDM_DISABLE: '1'
GIT_PAGER: cat
LOCAL_DEV_PATH: "/home/user/Documents/WorkingDir"
PL_POLARS_FILE_CACHING: '0'
bundles:
- path: tools/registry
- path: tools/edit_anthropic
- path: tools/image_tools
# - path: tools/cellprofiler
# - path: tools/napari_viewer_tools
# - path: tools/pdf_tools
# - path: tools/jump
- path: tools/x11_utils
# - path: tools/module_search
# Enable subagents for GUI software automation
enable_subagents: true
subagent_bundles:
# - path: subagents/cellprofiler
- path: subagents/chromium
- path: subagents/image_viewer
registry_variables:
USE_FILEMAP: 'true'
SUBMIT_REVIEW_MESSAGES:
- |
Thank you for your work on this issue. Please carefully follow the steps below to help review your changes.
1. If you made any changes to your code after running the reproduction script, please run the reproduction script again.
If the reproduction script is failing, please revisit your changes and make sure they are correct.
If you have already removed your reproduction script, please ignore this step.
2. Remove your reproduction script (if you haven't done so already).
3. If you have modified any TEST files, please revert them to the state they had before you started fixing the issue.
You can do this with `git checkout -- /path/to/test/file.py`. Use below <diff> to find the files you need to revert.
4. Run the submit command again to confirm.
Here is a list of all of your changes:
<diff>
{{diff}}
</diff>
enable_bash_tool: true
parse_function:
type: function_calling
history_processors:
- type: keep_important
last_n_messages: 50
last_n_tagged_messages: 3
# Desktop agent configuration for GUI automation
desktop_agent:
platform: "ubuntu"
model: "claude-sonnet-4-5-20250929" #"bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0" #"claude-sonnet-4-5-20250929"
max_tokens: 1500
top_p: 0.9
temperature: 0.5
action_space: "computer_13" # "computer_13" or "pyautogui"
observation_type: "screenshot"
max_trajectory_length: 5
a11y_tree_max_tokens: 10000
reflection_template: |
PREVIOUS ATTEMPT ANALYSIS - QuPath Learning
Based on the given trajectory, provide a structured reflection on:
1. Workflow Learned: Summarize what QuPath workflow or techniques were learned
2. QUALITY ASSESSMENT: Give a detailed assessment of the quality of the learning attempt, including what was done well and what could be improved
3. NEXT STEPS: Specific improvements and things to learn for the next attempt
Provide a concise technical summary and reflection focusing on issues and improvements:
summary_template: |
Generate in one single sentence in this format:
The answer is A/B/C/D: [actual answer text].
This summary should concisely capture the final answer to the problem statement based on all the information gathered during the trajectory.
subagent_summary_template: |
Based on the subagent interactions in the trajectory, provide a structured summary in this format:
Generate a summary that includes all the files that have previously been generated and observations from the GUI interactions.
Try to include all the details that would be useful for a biologist to understand what was observed. Include any knowledge-intensive observations for future reference.
workflow_template: |
WORKFLOW EXTRACTION - QuPath using GUI
Based on the successful trajectory, extract a structured workflow in this format:
Step 1: Description of step 1
Step 2: Description of step 2
...
evaluation:
type: cellprofiler
ground_truth_dir: "/Users/machang/Documents/research-work/CellMMAgent/metrics/ground_truth"
compare_dir: "/Users/machang/Documents/research-work/CellMMAgent/CellDev/measurements"
threshold: 0.5
# retry_loop:
# type: score
# max_attempts: 1
# accept_score: 0.8
# cost_limit: 20.0
# max_trajectory_length: 100
# model:
# name: claude-sonnet-4-20250514
# per_instance_cost_limit: 50.0
# temperature: 0.0
# max_output_tokens: 500
# reviewer_config:
# type: reviewer
# system_template: "You are a helpful assistant."
# instance_template: |
# This is a tutorial learning task, therefore the score is not important and act as a placeholder.
# The goal is to provide a score between 0.0 and 1.0 based on the unit test results in format <score>{your score here}</score>.
# Unit test results:
# {{test}}
# Generate Review:
# traj_formatter:
# filter: []
# output_filter: []
# item_template: "Model: {{response}}\n\nObservation: {{observation}}"
# only_show_last_n_output: 0