Skip to content

Commit 802e6de

Browse files
authored
Changed strategy to define the plan tasks to be more like instructions. (#286)
* Changed strategy to define the plan tasks to be more like instructions. * Improved thinking on plan and execute agent. * Fixed unittets. * Fixed datetime values on prompts. * Improved prompts and logic of review addressor agent. * Fixed thinking of codebase chat. * Improved pipeline fixer. * Fixed images integrations. Fixed with_strutured_output method. * Added condition to relevant files.
1 parent 7300b63 commit 802e6de

53 files changed

Lines changed: 987 additions & 1057 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## Unreleased
99

10+
### Changed
11+
12+
- Increased `max_tokens` to `4096` for `Anthropic` models.
13+
- Removed fallback logic from all agents, as it's not needed with the new `OpenRouter` integration.
14+
- Improved `PlanAndExecuteAgent`:
15+
- Introduced `think` tool on both planning and execution phases to improve the reasoning capabilities of the agent, even when using models without that capability (https://www.anthropic.com/engineering/claude-think-tool).
16+
- Improved planning prompt to focus on the atual state of the codebase and design plans to be more self contained and actionable.
17+
- Improved execution prompt to focus on the execution to strictly follow the plan.
18+
- File paths included on the plan are now preloaded as messages on the execution to improve the speed of the execution and reduce the number of tool calls.
19+
- Improved `ReviewAddressorAgent`:
20+
- Improved planning prompt to focus on the atual diff hunk without losing the context on the comments from the reviewer.
21+
- Improved `PipelineFixerAgent`:
22+
- Improved troubleshooting prompt to focus on verifiable knowledge and to improve the quality of the remediation steps.
23+
1024
### Added
1125

1226
- Support for `OpenRouter` integration, unified API for LLM providers. **Breaking change: this will be the default provider from now on as it's more reliable and has more models available.**
1327

28+
### Fixed
29+
30+
- Command parameter `base_url` on `setup_webhooks` was declared required with a default value, forcing to pass it on every call. Now it's not required and will use the value of `DAIV_EXTERNAL_URL` if not provided.
31+
- Date and time provided to prompts were defined at compilation time, leading to a fixed date and time on all prompts. Now it's defined at runtime to provide the correct date and time for each execution.
32+
- CodebaseChat tool calls were being shown outside <thinking> tags on OpenWebUI, polluting the UI.
33+
1434
## [0.1.0-beta.3] - 2025-03-23
1535

1636
### Added

daiv/automation/agents/base.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ def get_model_kwargs(
116116
# As stated in docs: https://docs.anthropic.com/en/api/rate-limits#updated-rate-limits
117117
# the OTPM is calculated based on the max_tokens. We need to use a fair value to avoid rate limiting.
118118
# If needed, we can increase this value using the configurable field.
119-
_kwargs["max_tokens"] = 2_048
119+
_kwargs["max_tokens"] = 4_096
120120

121121
if _kwargs["model"].startswith("claude-3-7-sonnet"):
122122
# Enable token efficient tools to reduce the number of tokens used and
@@ -128,7 +128,7 @@ def get_model_kwargs(
128128
_kwargs["reasoning_effort"] = thinking_level
129129

130130
elif model_provider == ModelProvider.OPENROUTER:
131-
_kwargs["model"] = _kwargs["model"].split(":")[1]
131+
_kwargs["model"] = _kwargs["model"].split(":", 1)[1]
132132
# OpenRouter is OpenAI compatible, so we need to use the OpenAI model provider
133133
_kwargs["model_provider"] = ModelProvider.OPENAI
134134
_kwargs["model_kwargs"]["extra_headers"] = {
@@ -147,7 +147,7 @@ def get_model_kwargs(
147147

148148
elif _kwargs["model"].startswith("anthropic"):
149149
# Avoid rate limiting by setting a fair max_tokens value
150-
_kwargs["max_tokens"] = 2_048
150+
_kwargs["max_tokens"] = 4_096
151151

152152
return _kwargs
153153

@@ -156,7 +156,7 @@ def _get_anthropic_thinking_tokens(self, *, thinking_level: ThinkingLevel) -> tu
156156
Get the thinking tokens and max tokens for the model.
157157
"""
158158
if thinking_level == ThinkingLevel.LOW:
159-
return 4_096, {"type": "enabled", "budget_tokens": 2_048}
159+
return 8_192, {"type": "enabled", "budget_tokens": 4_096}
160160
elif thinking_level == ThinkingLevel.MEDIUM:
161161
return 32_768, {"type": "enabled", "budget_tokens": 25_600}
162162
elif thinking_level == ThinkingLevel.HIGH:
Lines changed: 45 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,61 @@
11
from langchain_core.prompts import SystemMessagePromptTemplate
22

33
codebase_chat_system = SystemMessagePromptTemplate.from_template(
4-
"""You're DAIV, an helpful assistant specialized on software development and codebases knowledge. Your main task is to reply to user's queries that are aligned with software development or with knowledge about the codebases that you can collect using available tools.
5-
6-
IMPORTANT: You don't need to mention the knowledge base in your replies, just reply directly to the user's query. The user don't have access to the system context, only you have access to it, so NEVER refer to it in your response.
7-
8-
Current date: {{ current_date_time }}.
9-
10-
# Instructions
11-
1. Check if the user's query is related to software development or codebases. If not, just reply with a message indicating that you can only help with software development related queries. Otherwise, continue to the next step.
12-
2. Open a <thinking> tag and wrap you thinking process inside it. **IMPORTANT:** Don't close it until the end of your reply to the user's query.
13-
3. Analyse the user query using the rules "Query analysis rules".
14-
4. Call the `{{ search_code_snippets_name }}` tool following the rules "Tool usage rules" to ground your reply. Be specific about the code snippets you're searching for.
15-
5. Close the </thinking> tag. **IMPORTANT:** Only close it on the beginning of your reply to the user's query.
16-
6. Reply to the user's query.
17-
18-
# Tone and style
19-
- Communicate in the first person, as if speaking directly to the developer.
20-
- Use a tone of a senior software developer who is confident and experienced.
21-
- Don't reply with unnecessary preamble or postamble (such as explaining your query analysis or summarizing your action).
22-
23-
# Query analysis rules
24-
- Specific programming languages, frameworks, or technologies mentioned or implied, with an example of how each might be used in code.
25-
- Key search terms extracted from the query, prioritized based on relevance, with an example of how each might appear in code.
26-
- A prioritized list of key concepts or topics extracted from the query, with a brief explanation of why each is important.
27-
- Identification of multiple topics if present in the query, with an explanation of how they relate to each other. If multiple topics were identified in the query analysis, break down the plan for each topic.
28-
- References to specific files or repositories in the query, with an example of how each might be used in code.
29-
- Conversation history is important, as the user can follow-up queries. Use it to correlate the queries.
30-
31-
# Tool usage rules
32-
Use the `{{ search_code_snippets_name }}` tool to search for code snippets in the following repositories. If the user's query is not related to the repositories below, you should not use the `{{ search_code_snippets_name }}` tool.
33-
**IMPORTANT:** Make use of parallel tool calls if you intend to call the same tool multiple times.
34-
35-
# Output format
36-
Divide your reply to the user's query into two parts:
37-
- The first part is the reply to the user's query.
38-
- The second part is quoting repository files from the code snippets that are used as the basis for replying to the user. Use the `external_link` field from the <CodeSnippet> tags to create the links. If you didn't quote any code snippets, just don't include the second part.
39-
**IMPORTANT:** Only close the <thinking> tag on the beginning of your reply to the user's query.
40-
41-
Example output, the values in the [] are placeholders:
4+
"""You're DAIV, a helpful assistant tasked with answering user queries aligned with software development or knowledge of the repositories you have access to. You have tools available to help you inspect the repositories related to the user's request.
5+
6+
The current date and time is {{ current_date_time }}.
7+
8+
When queried about the repositories, do not rely on your internal or prior knowledge. Instead, base all conclusions and recommendations strictly on verifiable, factual information from the repositories.
9+
10+
<tone_and_style>
11+
When replying to the user, follow these guidelines:
12+
* Always reply to the user in the same language they are using.
13+
* You can use markdown formatting in your replies if helpful.
14+
* The user don't have access to the system context, only you have access to it, so NEVER refer to it in your replies.
15+
</tone_and_style>
16+
17+
<query_analysis_rules>
18+
Here are the rules to analyze the user's query before replying and searching the repositories:
19+
* Specific programming languages, frameworks, or technologies mentioned or implied, with an example of how each might be used in code.
20+
* Key search terms extracted from the query, prioritized based on relevance, with an example of how each might appear in code.
21+
* A prioritized list of key concepts or topics extracted from the query, with a brief explanation of why each is important.
22+
* Identification of multiple topics if present in the query, with an explanation of how they relate to each other. If multiple topics were identified in the query analysis, break down the plan for each topic.
23+
* References to specific files or repositories in the query, with an example of how each might be used in code.
24+
* Conversation history is important, as the user can follow-up queries. Use it to try to correlate the queries.
25+
</query_analysis_rules>
26+
27+
<tool_calling>
28+
You have tools at your disposal to search knowledge on the repositories. Follow these rules regarding tool calls:
29+
* ALWAYS follow the tool call schema exactly as specified and make sure to provide all necessary parameters.
30+
* Use the `{{ search_code_snippets_name }}` tool to search for code snippets in the repositories you have access to using the keywords extracted from the user's query. If the user's query is not related to the repositories you have access to, you should not use it.
31+
</tool_calling>
32+
33+
<reply_output_format>
34+
Divide your reply to the user's query into two sections:
35+
- The first section is the reply to the user's query;
36+
- The second section is the references to the repository files from the code snippets that are used as the basis for replying to the user. Use the `external_link` field from the <CodeSnippet> tags to create the links. If you didn't quote any code snippets, just don't include this section.
37+
38+
Example output format:
4239
```markdown
43-
<thinking>
44-
[thinking process]
45-
[tool calls]
46-
</thinking>
47-
4840
[reply to the user's query]
4941
5042
**References:**
5143
- [repository/path/to/file.py](https://github.com/user/repo/blob/branch/path/to/file.py)
5244
```
45+
</reply_output_format>
5346
54-
# Repositories
47+
<repositories>
5548
DAIV has access to the following repositories:
5649
{% for repository in repositories %}
5750
- {{ repository }}
5851
{%- endfor %}
59-
""", # noqa: E501
52+
</repositories>
53+
54+
<searching_and_replying>
55+
The user's query must be related to software development or repositories. If not, simply reply with a message stating that you can only help with software development related queries. Otherwise, go ahead and analyze the user's request and inspect the repositories with the tools available to support your answer.
56+
Finally, answer the user's question based on the information you have gathered from the repositories.
57+
</searching_and_replying>
58+
59+
Reply the user's query with grounded information.""", # noqa: E501
6060
"jinja2",
6161
)

daiv/automation/agents/codebase_search/agent.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,7 @@ def compile(self) -> Runnable:
4444
llm=cast(
4545
"BaseChatModel",
4646
# this model shows better results for rephrasing
47-
self.get_model(model=settings.REPHRASE_MODEL_NAME).with_fallbacks([
48-
self.get_model(model=settings.REPHRASE_FALLBACK_MODEL_NAME)
49-
]),
47+
self.get_model(model=settings.REPHRASE_MODEL_NAME),
5048
),
5149
)
5250
else:
@@ -57,9 +55,7 @@ def compile(self) -> Runnable:
5755
llm=cast(
5856
"BaseChatModel",
5957
# this model shows better results for listwise reranking
60-
self.get_model(model=settings.RERANKING_MODEL_NAME).with_fallbacks([
61-
self.get_model(model=settings.RERANKING_FALLBACK_MODEL_NAME)
62-
]),
58+
self.get_model(model=settings.RERANKING_MODEL_NAME),
6359
),
6460
top_n=settings.TOP_N,
6561
),

daiv/automation/agents/codebase_search/conf.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,9 @@ class CodebaseSearchSettings(BaseSettings):
1616
REPHRASE_MODEL_NAME: ModelName = Field(
1717
default=ModelName.GPT_4O_MINI, description="Model name to be used for codebase search."
1818
)
19-
REPHRASE_FALLBACK_MODEL_NAME: ModelName = Field(
20-
default=ModelName.CLAUDE_3_5_HAIKU, description="Fallback model name to be used for codebase search."
21-
)
2219
RERANKING_MODEL_NAME: ModelName = Field(
2320
default=ModelName.GPT_4O_MINI, description="Model name to be used for listwise reranking."
2421
)
25-
RERANKING_FALLBACK_MODEL_NAME: ModelName = Field(
26-
default=ModelName.CLAUDE_3_5_HAIKU, description="Fallback model name to be used for listwise reranking."
27-
)
2822

2923

3024
settings = CodebaseSearchSettings() # type: ignore

daiv/automation/agents/constants.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ class ModelName(StrEnum):
1313
CLAUDE_3_5_HAIKU = "openrouter:anthropic/claude-3-5-haiku"
1414
GPT_4O = "openrouter:openai/gpt-4o"
1515
GPT_4O_MINI = "openrouter:openai/gpt-4o-mini"
16-
O1 = "openrouter:openai/o1"
1716
O3_MINI = "openrouter:openai/o3-mini"
1817
GEMINI_2_0_FLASH = "openrouter:google/gemini-2.0-flash-001"
1918
GEMINI_2_0_FLASH_LITE = "openrouter:google/gemini-2.0-flash-lite-001"
19+
DEEPSEEK_CHAT_V3_0324 = "openrouter:deepseek/deepseek-chat-v3-0324"

daiv/automation/agents/image_url_extractor/agent.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ def compile(self) -> Runnable:
4141
prompt = ChatPromptTemplate.from_messages([system, human])
4242
return (
4343
prompt
44-
| self.get_model(model=settings.MODEL_NAME).with_structured_output(ImageURLExtractorOutput)
44+
| self.get_model(model=settings.MODEL_NAME).with_structured_output(
45+
ImageURLExtractorOutput, method="function_calling"
46+
)
4547
| RunnableLambda(_post_process, name="post_process_extracted_images")
4648
).with_config({"run_name": settings.NAME})

daiv/automation/agents/image_url_extractor/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ class ImageURLExtractorSettings(BaseSettings):
1313

1414
NAME: str = Field(default="ImageURLExtractor", description="Name of the image URL extractor agent.")
1515
MODEL_NAME: ModelName = Field(
16-
default=ModelName.GPT_4O_MINI, description="Model name to be used for image URL extractor."
16+
default=ModelName.GEMINI_2_0_FLASH_LITE, description="Model name to be used for image URL extractor."
1717
)
1818

1919

daiv/automation/agents/image_url_extractor/schemas.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from __future__ import annotations
22

3-
from typing import Literal
3+
from typing import Literal, cast
44
from urllib.parse import urlparse
55

66
from pydantic import BaseModel, Field
@@ -50,7 +50,9 @@ def from_images(
5050
and parsed_url.path.startswith("uploads/")
5151
):
5252
_repo_image_url = build_uri(f"{settings.GITLAB_URL}api/v4/projects/{project_id}/", image.url)
53-
image_url = url_to_data_url(_repo_image_url, headers={"PRIVATE-TOKEN": settings.GITLAB_AUTH_TOKEN})
53+
image_url = url_to_data_url(
54+
_repo_image_url, headers={"PRIVATE-TOKEN": cast("str", settings.GITLAB_AUTH_TOKEN)}
55+
)
5456

5557
if image_url:
5658
image_templates.append(

daiv/automation/agents/issue_addressor/agent.py

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,8 @@ def assessment(self, state: OverallState) -> Command[Literal["prepare_data", "__
6464
prompt = ChatPromptTemplate.from_messages([issue_assessment_system, issue_assessment_human])
6565

6666
evaluator = prompt | self.get_model(model=settings.ASSESSMENT_MODEL_NAME).with_structured_output(
67-
IssueAssessment
68-
).with_fallbacks([
69-
self.get_model(model=settings.FALLBACK_ASSESSMENT_MODEL_NAME).with_structured_output(IssueAssessment)
70-
])
67+
IssueAssessment, method="function_calling"
68+
)
7169

7270
response = cast(
7371
"IssueAssessment",
@@ -105,13 +103,14 @@ def prepare_data(self, state: OverallState, config: RunnableConfig) -> Command[L
105103
return Command(
106104
goto="plan_and_execute",
107105
update={
106+
"image_templates": extracted_images,
108107
"messages": HumanMessagePromptTemplate.from_template(
109-
[issue_addressor_human, *extracted_images], "jinja2"
108+
[issue_addressor_human] + extracted_images, "jinja2"
110109
).format_messages(
111110
issue_title=state["issue_title"],
112111
issue_description=state["issue_description"],
113112
project_description=repo_config.repository_description,
114-
)
113+
),
115114
},
116115
)
117116

0 commit comments

Comments
 (0)