Skip to content

Commit b67019e

Browse files
Yusu-fLinaLamjuliettech13chitaliantinomarques
authored
Add AI agent monitoring tutorial Part 1 (#3670)
* Add prompting-thinking-models page * add switch-to-deepseek * test * added cover image to when to switch to deepseek * added image * updated content * edit prompt-thinking-models * edit switch-to-deepseek * add open-webui-alts page * edit thinking models and delete open-webui-alts * add ai-agent-monitoring-tutorial-part-1 * add ai-agent-monitoring-tutorial-part-1 * Email sign up page (#3665) * Fix no org (#3672) * fix no org * fix org refetch * this def fixes it * Update web/services/hooks/organizations.tsx * clear state between sign outs (#3675) * Fix: Now mapping systemInstruction in gemini requests (#3673) * Update: Gemini chat mapper now correctly maps pdf file messages + renderer condition * Fix: Now mapping systemInstruction in gemini requests * Update: Gemini chat mapper now correctly maps pdf file messages + renderer condition (#3671) * Potential fix: Upped react-checkbox version (#3657) * Fix: Missing Realtime Code (#3676) * Fix: Missing code * Misc: Reamining code * Rate limits UI (#3664) * fix build (#3678) * hotfix: update the fuckoff protocol * larger timeouts * Update custom-properties to include User-Id special case (#3683) * kind of getting docker compose to work (#3689) * Eng 1560 (#3690) * fix(requests): Improve shift-click selection in requests table - Prevents text selection when holding shift and clicking rows in the requests table by conditionally applying `select-none` CSS. - Cleans up variable usage for `selectMode` in `RequestsPage.tsx`. - Fixes an issue where the shift-key state could get stuck if the window lost focus while shift was held down. The `useShiftKeyPress` hook now resets the state on window blur. * datasets * added dual producers (#3685) * udpated content * grammar * edit ai-agent-tutorial-1 * update code and article * update images * minor change * updated graphic * change section 4 wording * typos * reverted some cursor changes * edit part 1 to be more agentic * minor typo --------- Co-authored-by: LinaLam <tt3lam@uwaterloo.ca> Co-authored-by: _juliettech <juliette.chevaliera@gmail.com> Co-authored-by: Justin Torre <justintorre75@gmail.com> Co-authored-by: Tino Marques <59707674+tinomarques@users.noreply.github.com> Co-authored-by: colegottdank <colegottdank@gmail.com> Co-authored-by: bholagabbar <11693595+bholagabbar@users.noreply.github.com> Co-authored-by: Thomas Harmon <thomas.alan.harmon@gmail.com>
1 parent 19970dc commit b67019e

File tree

8 files changed

+286
-0
lines changed

8 files changed

+286
-0
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{
2+
"title": "Building and Monitoring AI Agents: A Step-by-Step Guide (Part 1)",
3+
"title1": "Building and Monitoring AI Agents (Part 1): A Step-by-Step Guide",
4+
"title2": "Building and Monitoring AI Agents (Part 1): A Step-by-Step Guide",
5+
"description": "Building reliable AI agents requires robust monitoring and observability. In this tutorial, we'll build a simple financial research assistant and optimize its performance using Helicone's agentic AI observability features.",
6+
"images": "/static/blog/ai-agent-monitoring-tutorial/building-and-monitoring-ai-agents.webp",
7+
"time": "10 minute read",
8+
"author": "Yusuf Ishola",
9+
"date": "May 2, 2025",
10+
"badge": "Best Practices"
11+
}
12+
Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
**Time to complete: ~30 minutes**
2+
3+
Your AI agent worked perfectly in testing, but now in production it's making bizarre recommendations and you have no idea why. Sound familiar? As AI agents grow increasingly complex, the black box problem is becoming the number one obstacle to reliable deployment.
4+
5+
![Building and Monitoring AI Agents](/static/blog/ai-agent-monitoring-tutorial/building-and-monitoring-ai-agents.webp)
6+
7+
In this first part of our **two-part series** on AI agent observability, we'll build a financial research assistant that demonstrates the key components of a modern AI agent. In part two, we'll explore how to effectively monitor it with Helicone's agentic AI observability features.
8+
9+
Let's get started!
10+
11+
## Table of Contents
12+
13+
## Prerequisites
14+
15+
Before we dive in, you'll need:
16+
17+
- **<a href="https://nodejs.org/" target="_blank" rel="noopener">Node.js 16+</a>** installed on your machine
18+
- **<a href="https://platform.openai.com/api-keys" target="_blank" rel="noopener">OpenAI API</a>** key
19+
- **<a href="https://www.alphavantage.co/support/#api-key" target="_blank" rel="noopener">Alpha Vantage API key</a>** (free tier available)
20+
- **<a href="https://helicone.ai/signup" target="_blank" rel="noopener">Helicone API key</a>** (free tier available)
21+
22+
## Quick Start
23+
24+
Want to skip ahead and try the code immediately? Clone the GitHub repository and run the code:
25+
26+
```bash
27+
git clone https://github.com/Yusu-f/helicone-agent-tutorial.git
28+
cd helicone-agent-tutorial
29+
npm install
30+
```
31+
32+
Create a `.env` file with your API keys
33+
34+
```bash
35+
OPENAI_API_KEY=your_openai_key_here
36+
ALPHA_VANTAGE_API_KEY=your_alpha_vantage_key_here
37+
HELICONE_API_KEY=your_helicone_key_here
38+
```
39+
40+
Run the assistant
41+
42+
```bash
43+
npm start
44+
```
45+
46+
This gives you the version of the financial assistant with basic Helicone monitoring.
47+
48+
In part 2, we'll show you how to add comprehensive monitoring to your AI agent with Helicone's Sessions feature.
49+
50+
## How We'll Build Our Financial Assistant
51+
52+
Our financial assistant does two things:
53+
54+
1. Fetches real-time price information and news for specific tickers
55+
2. Uses RAG to answer questions about financial concepts
56+
57+
The agent intelligently determines which approach to take for each query—a pattern applicable to many domains beyond finance, including customer support, healthcare, and legal applications.
58+
59+
## Key Components of Our AI Agent
60+
61+
### 1. Tools
62+
63+
Our agent uses OpenAI's function calling tools system to determine how to handle different queries:
64+
65+
```javascript
66+
// Define OpenAI tools for function calling
67+
const tools = [
68+
{
69+
type: "function",
70+
function: {
71+
name: "getStockData",
72+
description: "Get current price and other information for a specific stock by ticker symbol",
73+
parameters: {
74+
type: "object",
75+
properties: {
76+
ticker: {
77+
type: "string",
78+
description: "The stock ticker symbol, e.g., AAPL for Apple Inc."
79+
}
80+
},
81+
required: ["ticker"]
82+
}
83+
}
84+
},
85+
...
86+
]
87+
```
88+
89+
This approach allows the model to decide which functions to call based on the user's query.
90+
91+
### 2. Basic Helicone Monitoring
92+
93+
The financial assistant uses Helicone's basic monitoring to track the cost, latency, and error rate of our LLM calls. You can create an account for free <a href="https://www.helicone.ai/signup" target="_blank" rel="noopener">here</a>.
94+
95+
```javascript
96+
const openai = new OpenAI({
97+
apiKey: process.env.OPENAI_API_KEY,
98+
baseURL: "https://oai.helicone.ai/v1",
99+
defaultHeaders: {
100+
"Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
101+
},
102+
});
103+
```
104+
105+
### 3. RAG & External API access
106+
107+
For popular financial term queries, we use a vector store to retrieve relevant information, while for stock queries, such as real-time price information or news, we connect to the Alpha Vantage API:
108+
109+
```javascript
110+
async function searchFinancialTerms(query, vectorStore) {
111+
console.log("Searching for financial term definitions in knowledge base...");
112+
113+
// Get relevant documents from vector store with similarity scores
114+
const resultsWithScores = await vectorStore.similaritySearchWithScore(query, 2);
115+
116+
// Process results...
117+
}
118+
```
119+
120+
```javascript
121+
async function getStockData(ticker) {
122+
try {
123+
const url = `https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=${ticker}&apikey=${ALPHA_VANTAGE_API_KEY}`;
124+
const response = await axios.get(url);
125+
126+
...
127+
}
128+
}
129+
```
130+
131+
The RAG implementation provides domain-specific knowledge to the agent. However, as we'll see later, **without proper monitoring**, detecting what's causing the system to fail when it does might be difficult.
132+
133+
### 4. Minimal Agent Loop (Tool Calling)
134+
135+
We expose three tools to the LLM:
136+
137+
- `getStockData`: Retrieves current price and market information for a specific ticker
138+
- `getStockNews`: Fetches the latest news articles related to a stock ticker
139+
- `searchFinancialTerms`: Queries our vector database for information about financial concepts
140+
141+
The LLM may call a tool, receive its output as feedback, and then answer the user.
142+
143+
The loop allows our agent to call tools and process results for as long as needed to generate an appropriate response:
144+
145+
```javascript
146+
async function processQuery(userQuery, vectorStore) {
147+
let messages = [
148+
{
149+
role: "system",
150+
content: `You're a financial assistant. Use tools when needed. If you have enough information to answer, reply normally.`
151+
},
152+
{ role: "user", content: userQuery }
153+
];
154+
155+
// Add chat history for context if available
156+
if (chatHistory.length > 0) {
157+
messages.splice(1, 0, ...chatHistory);
158+
}
159+
160+
while (true) {
161+
console.log("Sending query to OpenAI...");
162+
const llmResp = await openai.chat.completions.create({
163+
model: "gpt-3.5-turbo",
164+
tools,
165+
messages,
166+
temperature: 0.1,
167+
});
168+
169+
const msg = llmResp.choices[0].message;
170+
171+
if (msg.tool_calls && msg.tool_calls.length > 0) {
172+
// Execute the helper and push message into history...
173+
174+
continue;
175+
}
176+
177+
// No tool call → LLM has produced the final answer
178+
return msg.content;
179+
}
180+
}
181+
```
182+
183+
## Testing Our Financial Assistant
184+
185+
Now, let's take our financial assistant for a spin!
186+
187+
Run the following command to start the assistant:
188+
189+
```bash
190+
npm start
191+
```
192+
193+
We can view the results of our queries in the **Helicone dashboard**.
194+
195+
> Prompt: What is tesla's stock price?
196+
197+
Result:
198+
199+
![Example of a financial research assistant responding to the prompt 'What is Tesla's stock price?' displaying the stock information.](/static/blog/ai-agent-monitoring-tutorial/tesla-prompt.webp)
200+
201+
> Prompt: What is GreenEnergy's profit margin?
202+
203+
Result:
204+
205+
![Example of an AI agent failing to retrieve information about GreenEnergy's profit margin despite having it in the knowledge base.](/static/blog/ai-agent-monitoring-tutorial/greenenergy-profit-margin.webp)
206+
207+
It looks like there's an issue with our agent!
208+
209+
It can't find the profit margin of GreenEnergy despite it being in our knowledge base.
210+
211+
Something is obviously wrong with our RAG implementation—but what?
212+
213+
This is where **observability** comes in!
214+
215+
## Debugging Our Financial Assistant
216+
217+
Looking at our implementation, there are several blind spots that could potentially cause issues:
218+
219+
- **Hallucinations and retrieval issues**: Our agent failed to answer the query related to GreenEnergy despite having the requisite information—how do we pinpoint the problem?
220+
- **Cost Visibility**: How many tokens is each component of our agent consuming? Which queries are most expensive?
221+
- **Latency Issues**: If the agent becomes slow, which step is causing the bottleneck?
222+
- **Error Patterns**: Are certain types of queries consistently failing? Where in the pipeline do these failures occur?
223+
224+
In <a href="https://www.helicone.ai/blog/ai-agent-monitoring-tutorial-part-2" target="_blank" rel="noopener">Part 2 of this tutorial</a> on AI agent optimization, we'll add Helicone to our financial assistant to gain comprehensive visibility into every step of the process. Here's a preview of what you can see:
225+
226+
![Helicone dashboard showcasing the Sessions feature for debugging AI agents, demonstrating AI observability and agent monitoring capabilities.](/static/blog/ai-agent-monitoring-tutorial/sessions-ai-agent.webp)
227+
228+
We'll monitor each step of the agent's workflow, resolve bugs, and gain insights into useful metrics like cost, latency, and error rates.
229+
230+
Stay tuned!
231+
232+
<CallToAction
233+
title="Observe Your AI Agents with Helicone ⚡️"
234+
description="Stop building AI in the dark. Get complete visibility into every step of your AI workflows, track costs down to the penny, and debug complex issues in minutes instead of days."
235+
primaryButtonText="Start Monitoring for Free"
236+
primaryButtonLink="https://helicone.ai/signup"
237+
secondaryButtonText="See How Debugging Works"
238+
secondaryButtonLink="https://docs.helicone.ai/features/sessions"
239+
/>
240+
241+
### You might also like:
242+
243+
- **<a href="https://www.helicone.ai/blog/ai-agent-monitoring-tutorial-part-2" target="_blank" rel="noopener">Part 2: Step-by-Step Guide to Building and Optimizing AI Agents</a>**
244+
- **<a href="https://www.helicone.ai/blog/debugging-chatbots-and-ai-agents-with-sessions" target="_blank" rel="noopener">Debugging RAG Chatbots and AI Agents with Sessions</a>**
245+
- **<a href="https://www.helicone.ai/blog/full-guide-to-improving-ai-agents" target="_blank" rel="noopener">The Full Developer's Guide to Building Effective AI Agents</a>**
246+
- **<a href="https://www.helicone.ai/blog/agentic-rag-full-developer-guide" target="_blank" rel="noopener">Building Agentic RAG Systems: A Developer's Guide to Smarter Information Retrieval</a>**
247+
248+
<FAQ
249+
items={[
250+
{
251+
question: "Why do AI agents need specialized observability tools?",
252+
answer: "AI agents have unique monitoring challenges including non-deterministic execution paths, multi-step LLM calls, complex branching logic, and dependencies on external systems. Unlike traditional applications with fixed flows, agents' decision trees vary with each request. Standard monitoring tools can't track these dynamic workflows or evaluate response quality across interconnected steps, which is why specialized tools like Helicone's session-based tracing are essential for AI agent observability."
253+
},
254+
{
255+
question: "What are the biggest blind spots when deploying AI agents to production?",
256+
answer: "The most dangerous blind spots include: undetected hallucinations in responses, hidden cost escalations from inefficient prompts, silent failures in multi-step reasoning chains, data leakage in RAG implementations, inconsistent performance across different user segments, and degrading accuracy over time as data or usage patterns change. Without proper observability, these issues can persist for weeks before being discovered, potentially causing significant business impact."
257+
},
258+
{
259+
question: "What metrics should I monitor for any AI agent?",
260+
answer: "Critical metrics for all AI agents include: end-to-end latency of complete workflows, token usage per step and total cost per request, step completion rates showing where agents get stuck, retrieval quality for RAG implementations, routing accuracy between different processing pathways, error rates for external API calls, and user satisfaction with responses. Tracking these metrics helps identify bottlenecks, optimize costs, and ensure reliable agent performance."
261+
},
262+
{
263+
question: "How do I implement observability across different AI agent frameworks?",
264+
answer: "Helicone offers flexible integration options for all major AI frameworks. For LangChain, CrewAI, and LlamaIndex, direct integrations are available. For custom agents or other frameworks, you can typically use either Helicone's proxy approach (changing just the base URL) or the SDK integration. The Sessions feature works consistently across most major frameworks to trace multi-step agent workflows regardless of your technology choices, giving you a unified view of all AI operations."
265+
}
266+
]}
267+
/>
268+
269+
<Questions />

bifrost/app/blog/page.tsx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,11 @@ export const BLOG_CONTENT: BlogStructure[] = [
234234
folderName: "the-complete-guide-to-LLM-observability-platforms",
235235
},
236236
},
237+
{
238+
dynmaicEntry: {
239+
folderName: "ai-agent-monitoring-tutorial",
240+
},
241+
},
237242
{
238243
dynmaicEntry: {
239244
folderName: "implement-and-monitor-cag",
248 KB
Loading
17 KB
Loading
50 KB
Loading
187 KB
Loading
24.2 KB
Loading

0 commit comments

Comments
 (0)