Skip to content

Commit ff98a81

Browse files
williazzsyed-ahsan-ishtiaqueclaudemxiamxia
authored
feat(rum): add RUM to Application Signals MCP server (#3092)
* feat(cloudwatch-applicationsignals-mcp-server): add CloudWatch RUM tools Add a suite of CloudWatch RUM MCP tools to the cloudwatch-applicationsignals server, exposing app-monitor discovery, session/page/event exploration, error and performance correlation with X-Ray, SLO status, and a scoped raw-query escape hatch. - rum_tools.py: action-dispatched tool surface with platform-aware web-schema guards, truncation signals, and normalized error shapes ({error, error_type}). - rum_queries.py: Logs Insights query builders with escaped user inputs, length caps, and pagination. - aws_clients.py: add RUM + CloudWatch Logs clients. - server.py: register the new tools. - tests/test_rum_tools.py: coverage for platform validation, limit parsing, correlate regression paths, and X-Ray partial-failure. Also includes supporting repo tooling: - .claude/commands/{review,revise,auto-revise}.md review loop commands. - CONTRIBUTING.md + PR template touch-ups. * fix build * fix(cloudwatch-applicationsignals-mcp-server): address review blockers in RUM tools Revise-auto loop applied 3 blocker findings in iteration 1 (re-review clean): - errors_query(group_by='page') produced a duplicate metadata.pageId column in the Logs Insights `by` clause, rejected by the service. Skip the group-by splice when the dimension is already present. - Dispatcher caught only TypeError, letting ValueError from _parse_time leak as a raw traceback. Broaden to (TypeError, ValueError) so bad ISO time strings return a structured bad_request error. - analyze_rum_log_group called _parse_time outside its try/except, so malformed times also leaked. Move the _parse_time calls into the existing ValueError guard. Deferred (user-requested, not applied this run): - .claude/commands/revise.md: --only / --skip flags declared but not wired into the Procedure section. - rum_tools.py: _list_anomalies_for returns truncated=False on exception (conflates clean completion with mid-pagination failure). - rum_tools.py: analyze_rum_log_group error branch omits truncated / page_cap keys, breaking the shape invariant asserted by tests. - rum_tools.py: _get_account_id STS failure propagates uncaught. - rum_tools.py: most actions do not wrap _run_logs_insights_query in try/except, leaking boto3 errors as raw tracebacks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(cloudwatch-applicationsignals-mcp-server): raise RUM patch coverage above 90% Adds 26 tests closing the substantive gaps codecov flagged on the RUM diff (rum_tools.py coverage 87% -> 93%, patch target 92.59%): - Parametrized cw_log_disabled test across every action handler that calls _get_rum_app_info, covering the `except ValueError: return bad_request` defensive branch in health, errors, performance, sessions, page_views, timeseries, locations, http_requests, session_detail, resources, page_flows, crashes, app_launches, analyze, and correlate. - check_rum_data_access branch tests: missing telemetries (MEDIUM), sample_rate=0 (HIGH), low sample rate <0.1 (LOW), allowCookies=false (LOW). - Unknown-platform early-bail tests for sessions and session_detail. - Web Vitals `needs-improvement` bucket and malformed-p90 handling in get_rum_performance. - Partition resolution for us-gov-* and cn-* regions. - _get_rum_app_info_confident_cached ValueError when CwLogEnabled=True but CwLogGroup is missing. - _parse_time naive datetime -> UTC normalization. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style(cloudwatch-applicationsignals-mcp-server): capitalize RUM test docstring Align with ruff D-rule to unblock pre-commit ruff-check on CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(cloudwatch-applicationsignals-mcp-server): add local ci-check script Add scripts/ci-check.sh mirroring .github/workflows/python.yml and .github/workflows/pre-commit.yml for this package. Supports --only, --skip, --list, and --no-fail-fast. Uses `uv run --frozen` for tools declared in the dev group (bandit, pyright, ruff, pytest, pre-commit) and the package venv's pre-commit for hook orchestration. Also fix a pyright error in tests/test_rum_tools.py that the script caught locally: dt.utcoffset() can return None per the datetime stubs. Ignore generated bandit/coverage/junit artifacts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bump pr diff coverage to 97% * refactor(cloudwatch-applicationsignals-mcp-server): Rename rum tool to query_rum_events CodingAgent lazily loads MCP tool descriptions based largely on the tool name. The bare acronym `rum` was the odd one out among siblings like `query_service_metrics`, `list_slos`, and `audit_services`, which hurt discoverability for queries like "show me web session errors". Rename the public MCP tool from `rum` to `query_rum_events`: - `query_` matches the shape of `query_service_metrics` and `query_sampled_traces`. - `events` reflects RUM's underlying data model (sessions, page views, errors, crashes, and HTTP requests are all events queried via CloudWatch Logs Insights). - `rum` is retained because the acronym is unambiguous in this context. Also rename the internal `query_rum_events` helper (the handler for `action="query"`) to `run_rum_query` to resolve the name collision and match the sibling pattern (`get_rum_errors`, `audit_rum_health`). No behavior change. Function signature, action parameter, and dispatch logic are unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(cloudwatch-applicationsignals-mcp-server): scope PR to package directory Remove out-of-scope changes flagged by reviewer on PR #3092: - Delete .claude/ slash commands, root-level poe symlink, and package poe wrapper script - Delete scripts/ci-check.sh (local CI mirror) - Revert .github/pull_request_template.md and CONTRIBUTING.md to main - Drop poethepoet dev-dep and [tool.poe.tasks] from pyproject.toml - Fix typo in boto3-stubs extras: cloudwatchrum (not an extra, produced uv-lock warning) -> rum - Regenerate uv.lock (drops poethepoet + pastel, adds mypy-boto3-rum) All 868 tests pass. * fix build --------- Co-authored-by: Syed Ahsan Ishtiaque <176968742+syed-ahsan-ishtiaque@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Min Xia <mxiamxia@gmail.com>
1 parent 815c3a1 commit ff98a81

9 files changed

Lines changed: 4572 additions & 43 deletions

File tree

src/cloudwatch-applicationsignals-mcp-server/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,11 @@ env/
3838
.coverage
3939
htmlcov/
4040
.tox/
41+
junit.xml
42+
*-junit.xml
43+
coverage.xml
44+
*-coverage.xml
45+
bandit-report-*.html
4146

4247
# UV
4348
.uv/

src/cloudwatch-applicationsignals-mcp-server/README.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,50 @@ This tool provides access to AWS Application Signals' change detection capabilit
382382
- Shows default values for each grouping attribute
383383
- Useful for understanding available groups
384384

385+
### 🌐 CloudWatch RUM Tools
386+
387+
Monitor real user experience across web and mobile applications using CloudWatch RUM data.
388+
389+
> **Prerequisite:** Most RUM analytics actions require CloudWatch Logs to be enabled on the app monitor (`CwLogEnabled=true`). Use `check_data_access` to verify your setup.
390+
391+
All RUM functionality is exposed through a single **`query_rum_events`** tool with an `action` parameter:
392+
393+
```
394+
query_rum_events(action="<action_name>", app_monitor_name="my-app", ...)
395+
```
396+
397+
#### Actions Reference
398+
399+
| Action | Description | Required Params |
400+
|--------|-------------|-----------------|
401+
| **Discovery** | | |
402+
| `check_data_access` | Inspect app monitor config, find issues | `app_monitor_name` |
403+
| `list_monitors` | List all app monitors | *(none)* |
404+
| `get_monitor` | Get full app monitor config | `app_monitor_name` |
405+
| `list_tags` | List tags on an app monitor | `resource_arn` |
406+
| `get_policy` | Get resource-based policy | `app_monitor_name` |
407+
| **Analytics** *(require CW Logs)* | | |
408+
| `query` | Run custom Logs Insights query | `app_monitor_name`, `query_string`, `start_time`, `end_time` |
409+
| `health` | Quick health audit (errors, slow pages, sessions) | `app_monitor_name`, `start_time`, `end_time` |
410+
| `errors` | JS/HTTP errors by message and page | `app_monitor_name`, `start_time`, `end_time` |
411+
| `performance` | Page load + Core Web Vitals with good/needs-improvement/poor assessment | `app_monitor_name`, `start_time`, `end_time` |
412+
| `sessions` | Recent sessions with browser/OS/device | `app_monitor_name`, `start_time`, `end_time` |
413+
| `session_detail` | Full event timeline for a single session | `app_monitor_name`, `session_id`, `start_time`, `end_time` |
414+
| `page_views` | Top pages by view count | `app_monitor_name`, `start_time`, `end_time` |
415+
| `timeseries` | Time-bucketed trends (errors, performance, sessions) | `app_monitor_name`, `start_time`, `end_time` |
416+
| `locations` | Sessions and performance by country | `app_monitor_name`, `start_time`, `end_time` |
417+
| `http_requests` | Top HTTP requests with latency and error rates | `app_monitor_name`, `start_time`, `end_time` |
418+
| `resources` | Top resource requests by duration and size | `app_monitor_name`, `start_time`, `end_time` |
419+
| `page_flows` | Page-to-page navigation flows | `app_monitor_name`, `start_time`, `end_time` |
420+
| `crashes` | Mobile crashes + ANRs (Android validated, iOS experimental) | `app_monitor_name`, `start_time`, `end_time` |
421+
| `app_launches` | Mobile cold/warm/pre-warm launch times | `app_monitor_name`, `start_time`, `end_time` |
422+
| `analyze` | Anomaly detection + message patterns | `app_monitor_name`, `start_time`, `end_time` |
423+
| **Correlation & Metrics** | | |
424+
| `correlate` | Frontend-to-backend X-Ray trace correlation | `app_monitor_name`, `page_url`, `start_time`, `end_time` |
425+
| `metrics` | CloudWatch RUM namespace metrics | `app_monitor_name`, `metric_names` (JSON array), `start_time`, `end_time` |
426+
427+
**Optional parameters** (action-dependent): `resource_arn`, `page_url`, `group_by`, `platform`, `max_results`, `max_traces`, `statistic`, `period`, `session_id`, `metric`, `bucket`, `compare_previous`
428+
385429
## Installation
386430

387431
### One-Click Installation
@@ -869,6 +913,28 @@ For detailed change history of specific problematic services, I can investigate
869913
Would you like me to investigate the change history for any specific service in detail?
870914
```
871915

916+
### Example 9: CloudWatch RUM — Real User Monitoring
917+
```
918+
User: "Are my users experiencing issues on the checkout page?"
919+
Assistant: I'll check your RUM data for user-facing issues on the checkout page.
920+
921+
[Step 1: Verify the app monitor is configured correctly]
922+
query_rum_events(action="check_data_access", app_monitor_name="my-web-app")
923+
→ CW Logs enabled, X-Ray enabled, all telemetries active. Full analytics available.
924+
925+
[Step 2: Quick health check]
926+
query_rum_events(action="health", app_monitor_name="my-web-app", start_time="2026-03-18T00:00:00Z", end_time="2026-03-19T00:00:00Z")
927+
→ Error rate is 3x higher than normal, concentrated on /checkout page, mostly Chrome users in Germany.
928+
929+
[Step 3: Get error details]
930+
query_rum_events(action="errors", app_monitor_name="my-web-app", start_time="...", end_time="...", page_url="/checkout")
931+
→ Top error: "TypeError: Cannot read property 'total' of undefined" — 847 occurrences.
932+
933+
[Step 4: Is it frontend or backend?]
934+
query_rum_events(action="correlate", app_monitor_name="my-web-app", page_url="/checkout", start_time="...", end_time="...")
935+
→ Backend payment-service is returning 500 errors with avg 5.2s response time. Root cause is in the backend.
936+
```
937+
872938
## Recommended Workflows
873939

874940
### 🎯 Primary Audit Workflow (Most Common)
@@ -937,6 +1003,13 @@ The server requires the following AWS IAM permissions:
9371003
"synthetics:GetCanary",
9381004
"synthetics:GetCanaryRuns",
9391005
"synthetics:DescribeCanaries",
1006+
"rum:GetAppMonitor",
1007+
"rum:ListAppMonitors",
1008+
"rum:ListTagsForResource",
1009+
"rum:GetResourcePolicy",
1010+
"logs:DescribeLogGroups",
1011+
"logs:ListLogAnomalyDetectors",
1012+
"logs:ListAnomalies",
9401013
"s3:GetObject",
9411014
"s3:ListBucket",
9421015
"iam:GetRole",
@@ -956,6 +1029,7 @@ The server requires the following AWS IAM permissions:
9561029
- `AWS_REGION` - AWS region (defaults to us-east-1)
9571030
- `MCP_CLOUDWATCH_APPLICATION_SIGNALS_LOG_LEVEL` - Logging level (defaults to INFO)
9581031
- `AUDITOR_LOG_PATH` - Path for audit log files (defaults to /tmp)
1032+
- `MCP_RUM_ENDPOINT` - Override RUM API endpoint URL (for testing against non-production environments)
9591033

9601034
### AWS Credentials
9611035

src/cloudwatch-applicationsignals-mcp-server/awslabs/cloudwatch_applicationsignals_mcp_server/aws_clients.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ def _initialize_aws_clients():
4242
cloudwatch_endpoint = os.environ.get('MCP_CLOUDWATCH_ENDPOINT')
4343
xray_endpoint = os.environ.get('MCP_XRAY_ENDPOINT')
4444
synthetics_endpoint = os.environ.get('MCP_SYNTHETICS_ENDPOINT')
45+
rum_endpoint = os.environ.get('MCP_RUM_ENDPOINT')
4546

4647
# Log endpoint overrides
4748
if applicationsignals_endpoint:
@@ -54,6 +55,8 @@ def _initialize_aws_clients():
5455
logger.debug(f'Using X-Ray endpoint override: {xray_endpoint}')
5556
if synthetics_endpoint:
5657
logger.debug(f'Using Synthetics endpoint override: {synthetics_endpoint}')
58+
if rum_endpoint:
59+
logger.debug(f'Using RUM endpoint override: {rum_endpoint}')
5760

5861
# Check for AWS_PROFILE environment variable
5962
if aws_profile := os.environ.get('AWS_PROFILE'):
@@ -69,6 +72,7 @@ def _initialize_aws_clients():
6972
cloudwatch = session.client('cloudwatch', config=config, endpoint_url=cloudwatch_endpoint)
7073
xray = session.client('xray', config=config, endpoint_url=xray_endpoint)
7174
synthetics = session.client('synthetics', config=config, endpoint_url=synthetics_endpoint)
75+
rum = session.client('rum', config=config, endpoint_url=rum_endpoint)
7276
s3 = session.client('s3', config=config)
7377
iam = session.client('iam', config=config)
7478
lambda_client = session.client('lambda', config=config)
@@ -89,17 +93,17 @@ def _initialize_aws_clients():
8993
xray = boto3.client(
9094
'xray', region_name=AWS_REGION, config=config, endpoint_url=xray_endpoint
9195
)
92-
# Additional clients for canary functionality
9396
synthetics = boto3.client(
9497
'synthetics', region_name=AWS_REGION, config=config, endpoint_url=synthetics_endpoint
9598
)
99+
rum = boto3.client('rum', region_name=AWS_REGION, config=config, endpoint_url=rum_endpoint)
96100
s3 = boto3.client('s3', region_name=AWS_REGION, config=config)
97101
iam = boto3.client('iam', region_name=AWS_REGION, config=config)
98102
lambda_client = boto3.client('lambda', region_name=AWS_REGION, config=config)
99103
sts = boto3.client('sts', region_name=AWS_REGION, config=config)
100104

101105
logger.debug('AWS clients initialized successfully')
102-
return logs, applicationsignals, cloudwatch, xray, synthetics, s3, iam, lambda_client, sts
106+
return logs, applicationsignals, cloudwatch, xray, synthetics, s3, iam, lambda_client, sts, rum
103107

104108

105109
# Initialize clients at module level
@@ -114,6 +118,7 @@ def _initialize_aws_clients():
114118
iam_client,
115119
lambda_client,
116120
sts_client,
121+
rum_client,
117122
) = _initialize_aws_clients()
118123
except Exception as e:
119124
logger.error(f'Failed to initialize AWS clients: {str(e)}')

0 commit comments

Comments
 (0)