Skip to content

Commit bf7934d

Browse files
committed
Initial commit
1 parent d0e9ee4 commit bf7934d

File tree

17 files changed

+1783
-6
lines changed

17 files changed

+1783
-6
lines changed

.github/workflows/pages.yml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Deploy GitHub Pages
2+
3+
on:
4+
push:
5+
branches: ["main"]
6+
workflow_dispatch:
7+
8+
permissions:
9+
contents: read
10+
pages: write
11+
id-token: write
12+
13+
concurrency:
14+
group: "pages"
15+
cancel-in-progress: false
16+
17+
jobs:
18+
build:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- name: Checkout
22+
uses: actions/checkout@v4
23+
- name: Setup Pages
24+
uses: actions/configure-pages@v5
25+
- name: Build with Jekyll
26+
uses: actions/jekyll-build-pages@v1
27+
with:
28+
source: ./docs
29+
destination: ./_site
30+
- name: Upload artifact
31+
uses: actions/upload-pages-artifact@v3
32+
33+
deploy:
34+
environment:
35+
name: github-pages
36+
url: ${{ steps.deployment.outputs.page_url }}
37+
runs-on: ubuntu-latest
38+
needs: build
39+
steps:
40+
- name: Deploy to GitHub Pages
41+
id: deployment
42+
uses: actions/deploy-pages@v4

.gitignore

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
.Python
7+
build/
8+
develop-eggs/
9+
dist/
10+
downloads/
11+
eggs/
12+
.eggs/
13+
lib/
14+
lib64/
15+
parts/
16+
sdist/
17+
var/
18+
wheels/
19+
*.egg-info/
20+
.installed.cfg
21+
*.egg
22+
23+
# Virtual environments
24+
.venv/
25+
venv/
26+
ENV/
27+
28+
# IDE
29+
.idea/
30+
.vscode/
31+
*.swp
32+
*.swo
33+
*~
34+
35+
# OS
36+
.DS_Store
37+
Thumbs.db
38+
39+
# Project specific
40+
generated-diagrams/
41+
*.log
42+
43+
# AgentCore
44+
.agentcore/
45+
.bedrock_agentcore/
46+
.bedrock_agentcore.yaml
47+
48+
# Local guides
49+
IMPLEMENTATION_GUIDE.md

README.md

Lines changed: 286 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,292 @@
1-
## My Project
1+
# Observability Agent with Amazon Bedrock AgentCore
22

3-
TODO: Fill this README out!
3+
[![License: MIT-0](https://img.shields.io/badge/License-MIT--0-yellow.svg)](https://opensource.org/licenses/MIT-0)
4+
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
5+
[![AWS](https://img.shields.io/badge/AWS-Bedrock%20AgentCore-orange?logo=amazonaws)](https://aws.amazon.com/bedrock/agentcore/)
6+
[![Strands SDK](https://img.shields.io/badge/Built%20with-Strands%20SDK-purple)](https://strandsagents.com/)
47

5-
Be sure to:
8+
An AI-powered observability agent that helps Site Reliability Engineers (SREs) investigate incidents and reduce Mean Time to Resolution (MTTR). Built with Amazon Bedrock AgentCore and the Strands Agent SDK.
69

7-
* Change the title in this README
8-
* Edit your repository description on GitHub
10+
This sample implements the architecture from [Reduce Mean Time to Resolution with an observability agent](https://aws.amazon.com/blogs/big-data/reduce-mean-time-to-resolution-with-an-observability-agent/).
11+
12+
## Architecture
13+
14+
![Observability Agent Architecture](images/observability-agent-architecture.png)
15+
16+
The agent queries three data sources to investigate incidents:
17+
- **Logs** - Application logs stored in Amazon OpenSearch Serverless
18+
- **Traces** - Distributed traces stored in Amazon OpenSearch Serverless
19+
- **Metrics** - Infrastructure metrics stored in Amazon Managed Service for Prometheus
20+
21+
## Choose Your Path
22+
23+
### Option A: Quick Start (Build from Scratch)
24+
25+
Best for: Learning, POC, testing the agent with sample data.
26+
27+
This path creates all required AWS resources and populates them with test data simulating a payment service failure.
28+
29+
**Time:** ~15 minutes | **Cost:** ~$5/day for OpenSearch Serverless
30+
31+
[→ Quick Start Guide](#quick-start-build-from-scratch)
32+
33+
### Option B: Integrate with Existing Infrastructure
34+
35+
Best for: Production use with your existing observability stack.
36+
37+
This path connects the agent to your existing OpenSearch and Prometheus deployments.
38+
39+
**Time:** ~10 minutes | **Prerequisites:** Existing OpenSearch + Prometheus
40+
41+
[→ Integration Guide](#integrate-with-existing-infrastructure)
42+
43+
---
44+
45+
## Quick Start (Build from Scratch)
46+
47+
### Prerequisites
48+
49+
- AWS account with appropriate permissions
50+
- Python 3.11+
51+
- AWS CLI configured with credentials
52+
53+
### Step 1: Clone and Setup
54+
55+
```bash
56+
git clone https://github.com/aws-samples/observability-agent-bedrock-agentcore.git
57+
cd observability-agent-bedrock-agentcore
58+
59+
python -m venv .venv
60+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
61+
pip install -r requirements.txt
62+
```
63+
64+
### Step 2: Create OpenSearch Serverless Collection
65+
66+
```bash
67+
# Set your AWS account ID
68+
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
69+
export AWS_REGION=us-east-1
70+
71+
# Create encryption policy
72+
aws opensearchserverless create-security-policy \
73+
--name observability-enc \
74+
--type encryption \
75+
--policy '{"Rules":[{"ResourceType":"collection","Resource":["collection/observability-agent"]}],"AWSOwnedKey":true}'
76+
77+
# Create network policy
78+
aws opensearchserverless create-security-policy \
79+
--name observability-net \
80+
--type network \
81+
--policy '[{"Rules":[{"ResourceType":"collection","Resource":["collection/observability-agent"]},{"ResourceType":"dashboard","Resource":["collection/observability-agent"]}],"AllowFromPublic":true}]'
82+
83+
# Create data access policy
84+
aws opensearchserverless create-access-policy \
85+
--name observability-access \
86+
--type data \
87+
--policy "[{\"Rules\":[{\"ResourceType\":\"collection\",\"Resource\":[\"collection/observability-agent\"],\"Permission\":[\"aoss:*\"]},{\"ResourceType\":\"index\",\"Resource\":[\"index/observability-agent/*\"],\"Permission\":[\"aoss:*\"]}],\"Principal\":[\"arn:aws:iam::${AWS_ACCOUNT_ID}:root\"]}]"
88+
89+
# Create collection
90+
aws opensearchserverless create-collection \
91+
--name observability-agent \
92+
--type SEARCH
93+
```
94+
95+
Wait for the collection to become ACTIVE (~2-3 minutes):
96+
97+
```bash
98+
aws opensearchserverless batch-get-collection --names observability-agent
99+
```
100+
101+
### Step 3: Configure Environment
102+
103+
```bash
104+
# Get collection endpoint
105+
export OPENSEARCH_HOST=$(aws opensearchserverless batch-get-collection \
106+
--names observability-agent \
107+
--query 'collectionDetails[0].collectionEndpoint' \
108+
--output text | sed 's|https://||')
109+
110+
echo "OpenSearch Host: $OPENSEARCH_HOST"
111+
```
112+
113+
### Step 4: Generate Test Data
114+
115+
```bash
116+
python scripts/generate_test_data.py
117+
```
118+
119+
This creates sample logs and traces simulating a payment service failure with ~40% error rate.
120+
121+
### Step 5: Deploy the Agent
122+
123+
```bash
124+
pip install bedrock-agentcore-starter-toolkit
125+
126+
agentcore configure --entrypoint agent/main.py --non-interactive
127+
agentcore deploy
128+
```
129+
130+
### Step 6: Grant Permissions
131+
132+
Get the AgentCore role name from the deploy output, then:
133+
134+
```bash
135+
# Replace with your actual role name from deploy output
136+
export AGENTCORE_ROLE=AmazonBedrockAgentCoreSDKRuntime-us-east-1-XXXXXX
137+
export COLLECTION_ID=$(aws opensearchserverless batch-get-collection \
138+
--names observability-agent \
139+
--query 'collectionDetails[0].id' \
140+
--output text)
141+
142+
# Add OpenSearch permissions
143+
aws iam put-role-policy \
144+
--role-name $AGENTCORE_ROLE \
145+
--policy-name OpenSearchServerlessAccess \
146+
--policy-document "{
147+
\"Version\": \"2012-10-17\",
148+
\"Statement\": [{
149+
\"Effect\": \"Allow\",
150+
\"Action\": [\"aoss:APIAccessAll\"],
151+
\"Resource\": \"arn:aws:aoss:${AWS_REGION}:${AWS_ACCOUNT_ID}:collection/${COLLECTION_ID}\"
152+
}]
153+
}"
154+
155+
# Update OpenSearch data access policy
156+
POLICY_VERSION=$(aws opensearchserverless get-access-policy \
157+
--name observability-access --type data \
158+
--query 'accessPolicyDetail.policyVersion' --output text)
159+
160+
aws opensearchserverless update-access-policy \
161+
--name observability-access \
162+
--type data \
163+
--policy-version $POLICY_VERSION \
164+
--policy "[{\"Rules\":[{\"ResourceType\":\"collection\",\"Resource\":[\"collection/observability-agent\"],\"Permission\":[\"aoss:*\"]},{\"ResourceType\":\"index\",\"Resource\":[\"index/observability-agent/*\"],\"Permission\":[\"aoss:*\"]}],\"Principal\":[\"arn:aws:iam::${AWS_ACCOUNT_ID}:root\",\"arn:aws:iam::${AWS_ACCOUNT_ID}:role/${AGENTCORE_ROLE}\"]}]"
165+
```
166+
167+
### Step 7: Test the Agent
168+
169+
```bash
170+
# Wait for IAM propagation
171+
sleep 30
172+
173+
# Investigate errors
174+
agentcore invoke '{"prompt": "Are there any errors in my application?"}'
175+
176+
# Check specific service
177+
agentcore invoke '{"prompt": "What is wrong with the payment service?"}'
178+
```
179+
180+
---
181+
182+
## Integrate with Existing Infrastructure
183+
184+
### Prerequisites
185+
186+
- Existing Amazon OpenSearch Service or OpenSearch Serverless with logs/traces
187+
- (Optional) Amazon Managed Service for Prometheus with metrics
188+
- Python 3.11+
189+
190+
### Step 1: Clone and Setup
191+
192+
```bash
193+
git clone https://github.com/aws-samples/observability-agent-bedrock-agentcore.git
194+
cd observability-agent-bedrock-agentcore
195+
196+
python -m venv .venv
197+
source .venv/bin/activate
198+
pip install -r requirements.txt
199+
```
200+
201+
### Step 2: Configure Your Endpoints
202+
203+
Edit `agent/main.py` or set environment variables:
204+
205+
```bash
206+
# For OpenSearch Serverless
207+
export OPENSEARCH_HOST=your-collection-id.us-east-1.aoss.amazonaws.com
208+
209+
# For Amazon Managed Prometheus (optional)
210+
export AMP_WORKSPACE_ID=ws-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
211+
212+
export AWS_REGION=us-east-1
213+
```
214+
215+
### Step 3: Customize Index Patterns (if needed)
216+
217+
If your indices use different naming conventions, update the index patterns in `agent/main.py`:
218+
219+
```python
220+
# Default patterns (OpenTelemetry standard)
221+
"otel-v1-apm-span-*" # For traces
222+
"otel-logs-*" # For logs
223+
224+
# Example: Custom patterns
225+
"your-traces-*"
226+
"your-logs-*"
227+
```
228+
229+
### Step 4: Deploy and Configure Permissions
230+
231+
```bash
232+
pip install bedrock-agentcore-starter-toolkit
233+
agentcore configure --entrypoint agent/main.py --non-interactive
234+
agentcore deploy
235+
```
236+
237+
Then add the AgentCore role to your OpenSearch data access policy (see Step 6 in Quick Start).
238+
239+
### Step 5: Test
240+
241+
```bash
242+
agentcore invoke '{"prompt": "Show me the health of my services"}'
243+
```
244+
245+
---
246+
247+
## Agent Tools
248+
249+
The agent exposes four tools for querying observability data:
250+
251+
| Tool | Data Source | Description |
252+
|------|-------------|-------------|
253+
| `get_red_metrics` | OpenSearch (traces) | Rate, Error, Duration metrics by service |
254+
| `search_logs` | OpenSearch (logs) | Search logs by service, severity |
255+
| `get_spans` | OpenSearch (traces) | Search distributed trace spans |
256+
| `query_metrics` | Prometheus | Query metrics using PromQL |
257+
258+
## Security
259+
260+
This sample follows AWS security best practices:
261+
262+
- **No hardcoded credentials** - Uses IAM roles for authentication
263+
- **TLS everywhere** - All connections use HTTPS with certificate verification
264+
- **Input validation** - All tool inputs are validated and sanitized
265+
- **Least privilege** - IAM policies grant minimal required permissions
266+
267+
See [CONTRIBUTING.md](CONTRIBUTING.md) for security issue reporting.
268+
269+
## Clean Up
270+
271+
To avoid ongoing charges, delete the resources when done:
272+
273+
```bash
274+
# Delete AgentCore resources
275+
agentcore destroy
276+
277+
# Delete OpenSearch Serverless (if created)
278+
aws opensearchserverless delete-collection --id YOUR_COLLECTION_ID
279+
aws opensearchserverless delete-security-policy --name observability-enc --type encryption
280+
aws opensearchserverless delete-security-policy --name observability-net --type network
281+
aws opensearchserverless delete-access-policy --name observability-access --type data
282+
```
283+
284+
## References
285+
286+
- [AWS Blog: Reduce MTTR with an Observability Agent](https://aws.amazon.com/blogs/big-data/reduce-mean-time-to-resolution-with-an-observability-agent/)
287+
- [Amazon Bedrock AgentCore Documentation](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/)
288+
- [Strands Agents SDK](https://strandsagents.com/latest/documentation/docs/)
289+
- [Amazon OpenSearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless.html)
9290

10291
## Security
11292

@@ -14,4 +295,3 @@ See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more inform
14295
## License
15296

16297
This library is licensed under the MIT-0 License. See the LICENSE file.
17-

0 commit comments

Comments
 (0)