1- ## My Project
1+ # Observability Agent with Amazon Bedrock AgentCore
22
3- TODO: Fill this README out!
3+ [ ![ License: MIT-0] ( https://img.shields.io/badge/License-MIT--0-yellow.svg )] ( https://opensource.org/licenses/MIT-0 )
4+ [ ![ Python 3.11+] ( https://img.shields.io/badge/python-3.11+-blue.svg )] ( https://www.python.org/downloads/ )
5+ [ ![ AWS] ( https://img.shields.io/badge/AWS-Bedrock%20AgentCore-orange?logo=amazonaws )] ( https://aws.amazon.com/bedrock/agentcore/ )
6+ [ ![ Strands SDK] ( https://img.shields.io/badge/Built%20with-Strands%20SDK-purple )] ( https://strandsagents.com/ )
47
5- Be sure to:
8+ An AI-powered observability agent that helps Site Reliability Engineers (SREs) investigate incidents and reduce Mean Time to Resolution (MTTR). Built with Amazon Bedrock AgentCore and the Strands Agent SDK.
69
7- * Change the title in this README
8- * Edit your repository description on GitHub
10+ This sample implements the architecture from [ Reduce Mean Time to Resolution with an observability agent] ( https://aws.amazon.com/blogs/big-data/reduce-mean-time-to-resolution-with-an-observability-agent/ ) .
11+
12+ ## Architecture
13+
14+ ![ Observability Agent Architecture] ( images/observability-agent-architecture.png )
15+
16+ The agent queries three data sources to investigate incidents:
17+ - ** Logs** - Application logs stored in Amazon OpenSearch Serverless
18+ - ** Traces** - Distributed traces stored in Amazon OpenSearch Serverless
19+ - ** Metrics** - Infrastructure metrics stored in Amazon Managed Service for Prometheus
20+
21+ ## Choose Your Path
22+
23+ ### Option A: Quick Start (Build from Scratch)
24+
25+ Best for: Learning, POC, testing the agent with sample data.
26+
27+ This path creates all required AWS resources and populates them with test data simulating a payment service failure.
28+
29+ ** Time:** ~ 15 minutes | ** Cost:** ~ $5/day for OpenSearch Serverless
30+
31+ [ → Quick Start Guide] ( #quick-start-build-from-scratch )
32+
33+ ### Option B: Integrate with Existing Infrastructure
34+
35+ Best for: Production use with your existing observability stack.
36+
37+ This path connects the agent to your existing OpenSearch and Prometheus deployments.
38+
39+ ** Time:** ~ 10 minutes | ** Prerequisites:** Existing OpenSearch + Prometheus
40+
41+ [ → Integration Guide] ( #integrate-with-existing-infrastructure )
42+
43+ ---
44+
45+ ## Quick Start (Build from Scratch)
46+
47+ ### Prerequisites
48+
49+ - AWS account with appropriate permissions
50+ - Python 3.11+
51+ - AWS CLI configured with credentials
52+
53+ ### Step 1: Clone and Setup
54+
55+ ``` bash
56+ git clone https://github.com/aws-samples/observability-agent-bedrock-agentcore.git
57+ cd observability-agent-bedrock-agentcore
58+
59+ python -m venv .venv
60+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
61+ pip install -r requirements.txt
62+ ```
63+
64+ ### Step 2: Create OpenSearch Serverless Collection
65+
66+ ``` bash
67+ # Set your AWS account ID
68+ export AWS_ACCOUNT_ID=$( aws sts get-caller-identity --query Account --output text)
69+ export AWS_REGION=us-east-1
70+
71+ # Create encryption policy
72+ aws opensearchserverless create-security-policy \
73+ --name observability-enc \
74+ --type encryption \
75+ --policy ' {"Rules":[{"ResourceType":"collection","Resource":["collection/observability-agent"]}],"AWSOwnedKey":true}'
76+
77+ # Create network policy
78+ aws opensearchserverless create-security-policy \
79+ --name observability-net \
80+ --type network \
81+ --policy ' [{"Rules":[{"ResourceType":"collection","Resource":["collection/observability-agent"]},{"ResourceType":"dashboard","Resource":["collection/observability-agent"]}],"AllowFromPublic":true}]'
82+
83+ # Create data access policy
84+ aws opensearchserverless create-access-policy \
85+ --name observability-access \
86+ --type data \
87+ --policy " [{\" Rules\" :[{\" ResourceType\" :\" collection\" ,\" Resource\" :[\" collection/observability-agent\" ],\" Permission\" :[\" aoss:*\" ]},{\" ResourceType\" :\" index\" ,\" Resource\" :[\" index/observability-agent/*\" ],\" Permission\" :[\" aoss:*\" ]}],\" Principal\" :[\" arn:aws:iam::${AWS_ACCOUNT_ID} :root\" ]}]"
88+
89+ # Create collection
90+ aws opensearchserverless create-collection \
91+ --name observability-agent \
92+ --type SEARCH
93+ ```
94+
95+ Wait for the collection to become ACTIVE (~ 2-3 minutes):
96+
97+ ``` bash
98+ aws opensearchserverless batch-get-collection --names observability-agent
99+ ```
100+
101+ ### Step 3: Configure Environment
102+
103+ ``` bash
104+ # Get collection endpoint
105+ export OPENSEARCH_HOST=$( aws opensearchserverless batch-get-collection \
106+ --names observability-agent \
107+ --query ' collectionDetails[0].collectionEndpoint' \
108+ --output text | sed ' s|https://||' )
109+
110+ echo " OpenSearch Host: $OPENSEARCH_HOST "
111+ ```
112+
113+ ### Step 4: Generate Test Data
114+
115+ ``` bash
116+ python scripts/generate_test_data.py
117+ ```
118+
119+ This creates sample logs and traces simulating a payment service failure with ~ 40% error rate.
120+
121+ ### Step 5: Deploy the Agent
122+
123+ ``` bash
124+ pip install bedrock-agentcore-starter-toolkit
125+
126+ agentcore configure --entrypoint agent/main.py --non-interactive
127+ agentcore deploy
128+ ```
129+
130+ ### Step 6: Grant Permissions
131+
132+ Get the AgentCore role name from the deploy output, then:
133+
134+ ``` bash
135+ # Replace with your actual role name from deploy output
136+ export AGENTCORE_ROLE=AmazonBedrockAgentCoreSDKRuntime-us-east-1-XXXXXX
137+ export COLLECTION_ID=$( aws opensearchserverless batch-get-collection \
138+ --names observability-agent \
139+ --query ' collectionDetails[0].id' \
140+ --output text)
141+
142+ # Add OpenSearch permissions
143+ aws iam put-role-policy \
144+ --role-name $AGENTCORE_ROLE \
145+ --policy-name OpenSearchServerlessAccess \
146+ --policy-document " {
147+ \" Version\" : \" 2012-10-17\" ,
148+ \" Statement\" : [{
149+ \" Effect\" : \" Allow\" ,
150+ \" Action\" : [\" aoss:APIAccessAll\" ],
151+ \" Resource\" : \" arn:aws:aoss:${AWS_REGION} :${AWS_ACCOUNT_ID} :collection/${COLLECTION_ID} \"
152+ }]
153+ }"
154+
155+ # Update OpenSearch data access policy
156+ POLICY_VERSION=$( aws opensearchserverless get-access-policy \
157+ --name observability-access --type data \
158+ --query ' accessPolicyDetail.policyVersion' --output text)
159+
160+ aws opensearchserverless update-access-policy \
161+ --name observability-access \
162+ --type data \
163+ --policy-version $POLICY_VERSION \
164+ --policy " [{\" Rules\" :[{\" ResourceType\" :\" collection\" ,\" Resource\" :[\" collection/observability-agent\" ],\" Permission\" :[\" aoss:*\" ]},{\" ResourceType\" :\" index\" ,\" Resource\" :[\" index/observability-agent/*\" ],\" Permission\" :[\" aoss:*\" ]}],\" Principal\" :[\" arn:aws:iam::${AWS_ACCOUNT_ID} :root\" ,\" arn:aws:iam::${AWS_ACCOUNT_ID} :role/${AGENTCORE_ROLE} \" ]}]"
165+ ```
166+
167+ ### Step 7: Test the Agent
168+
169+ ``` bash
170+ # Wait for IAM propagation
171+ sleep 30
172+
173+ # Investigate errors
174+ agentcore invoke ' {"prompt": "Are there any errors in my application?"}'
175+
176+ # Check specific service
177+ agentcore invoke ' {"prompt": "What is wrong with the payment service?"}'
178+ ```
179+
180+ ---
181+
182+ ## Integrate with Existing Infrastructure
183+
184+ ### Prerequisites
185+
186+ - Existing Amazon OpenSearch Service or OpenSearch Serverless with logs/traces
187+ - (Optional) Amazon Managed Service for Prometheus with metrics
188+ - Python 3.11+
189+
190+ ### Step 1: Clone and Setup
191+
192+ ``` bash
193+ git clone https://github.com/aws-samples/observability-agent-bedrock-agentcore.git
194+ cd observability-agent-bedrock-agentcore
195+
196+ python -m venv .venv
197+ source .venv/bin/activate
198+ pip install -r requirements.txt
199+ ```
200+
201+ ### Step 2: Configure Your Endpoints
202+
203+ Edit ` agent/main.py ` or set environment variables:
204+
205+ ``` bash
206+ # For OpenSearch Serverless
207+ export OPENSEARCH_HOST=your-collection-id.us-east-1.aoss.amazonaws.com
208+
209+ # For Amazon Managed Prometheus (optional)
210+ export AMP_WORKSPACE_ID=ws-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
211+
212+ export AWS_REGION=us-east-1
213+ ```
214+
215+ ### Step 3: Customize Index Patterns (if needed)
216+
217+ If your indices use different naming conventions, update the index patterns in ` agent/main.py ` :
218+
219+ ``` python
220+ # Default patterns (OpenTelemetry standard)
221+ " otel-v1-apm-span-*" # For traces
222+ " otel-logs-*" # For logs
223+
224+ # Example: Custom patterns
225+ " your-traces-*"
226+ " your-logs-*"
227+ ```
228+
229+ ### Step 4: Deploy and Configure Permissions
230+
231+ ``` bash
232+ pip install bedrock-agentcore-starter-toolkit
233+ agentcore configure --entrypoint agent/main.py --non-interactive
234+ agentcore deploy
235+ ```
236+
237+ Then add the AgentCore role to your OpenSearch data access policy (see Step 6 in Quick Start).
238+
239+ ### Step 5: Test
240+
241+ ``` bash
242+ agentcore invoke ' {"prompt": "Show me the health of my services"}'
243+ ```
244+
245+ ---
246+
247+ ## Agent Tools
248+
249+ The agent exposes four tools for querying observability data:
250+
251+ | Tool | Data Source | Description |
252+ | ------| -------------| -------------|
253+ | ` get_red_metrics ` | OpenSearch (traces) | Rate, Error, Duration metrics by service |
254+ | ` search_logs ` | OpenSearch (logs) | Search logs by service, severity |
255+ | ` get_spans ` | OpenSearch (traces) | Search distributed trace spans |
256+ | ` query_metrics ` | Prometheus | Query metrics using PromQL |
257+
258+ ## Security
259+
260+ This sample follows AWS security best practices:
261+
262+ - ** No hardcoded credentials** - Uses IAM roles for authentication
263+ - ** TLS everywhere** - All connections use HTTPS with certificate verification
264+ - ** Input validation** - All tool inputs are validated and sanitized
265+ - ** Least privilege** - IAM policies grant minimal required permissions
266+
267+ See [ CONTRIBUTING.md] ( CONTRIBUTING.md ) for security issue reporting.
268+
269+ ## Clean Up
270+
271+ To avoid ongoing charges, delete the resources when done:
272+
273+ ``` bash
274+ # Delete AgentCore resources
275+ agentcore destroy
276+
277+ # Delete OpenSearch Serverless (if created)
278+ aws opensearchserverless delete-collection --id YOUR_COLLECTION_ID
279+ aws opensearchserverless delete-security-policy --name observability-enc --type encryption
280+ aws opensearchserverless delete-security-policy --name observability-net --type network
281+ aws opensearchserverless delete-access-policy --name observability-access --type data
282+ ```
283+
284+ ## References
285+
286+ - [ AWS Blog: Reduce MTTR with an Observability Agent] ( https://aws.amazon.com/blogs/big-data/reduce-mean-time-to-resolution-with-an-observability-agent/ )
287+ - [ Amazon Bedrock AgentCore Documentation] ( https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/ )
288+ - [ Strands Agents SDK] ( https://strandsagents.com/latest/documentation/docs/ )
289+ - [ Amazon OpenSearch Serverless] ( https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless.html )
9290
10291## Security
11292
@@ -14,4 +295,3 @@ See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more inform
14295## License
15296
16297This library is licensed under the MIT-0 License. See the LICENSE file.
17-
0 commit comments