Skip to content

Conversation

@wchemz
Copy link

@wchemz wchemz commented Nov 6, 2025

Adds new AWS DMS Troubleshooting MCP Server

Summary

Changes

This PR introduces a new MCP server for AWS Database Migration Service (DMS) troubleshooting and Root Cause Analysis (RCA). The server provides comprehensive post-migration diagnostic capabilities to help customers identify and resolve DMS replication issues.

Key Features:

  • 9 diagnostic tools for comprehensive DMS troubleshooting
  • Replication task management and status monitoring
  • CloudWatch logs analysis with error pattern identification
  • Source and target endpoint configuration validation
  • Network connectivity diagnostics (security groups, VPC routing, network ACLs)
  • Automated Root Cause Analysis for failed tasks
  • Context-aware troubleshooting recommendations based on AWS best practices
  • Full integration with AWS DMS, CloudWatch Logs, and EC2 APIs

Available Tools:

  1. list_replication_tasks - List all DMS replication tasks with status filtering
  2. get_replication_task_details - Get comprehensive task information and statistics
  3. get_task_cloudwatch_logs - Retrieve and filter CloudWatch logs
  4. analyze_endpoint - Validate endpoint configurations
  5. diagnose_replication_issue - Comprehensive RCA for failed tasks
  6. get_troubleshooting_recommendations - Pattern-based error recommendations
  7. analyze_security_groups - Security group rule analysis
  8. diagnose_network_connectivity - Network diagnostics for tasks
  9. check_vpc_configuration - VPC routing and connectivity analysis

Package Details:

  • Package name: awslabs.aws-dms-troubleshoot-mcp-server
  • Python 3.10+ support
  • Uses FastMCP framework
  • Installable via uvx or pip

User experience

Before:

  • Customers troubleshooting DMS replication failures had to manually:
    • Navigate through AWS Console across multiple services (DMS, CloudWatch, EC2)
    • Analyze CloudWatch logs to identify error patterns
    • Check security group rules and network configurations
    • Cross-reference AWS documentation for common issues
    • Perform manual root cause analysis
    • This process was time-consuming and error-prone

After:

  • AI assistants can now automatically:
    • List and filter DMS tasks by status
    • Retrieve detailed task information and statistics
    • Analyze CloudWatch logs for error patterns
    • Diagnose network connectivity issues comprehensively
    • Validate endpoint configurations
    • Perform automated RCA with actionable recommendations
    • Provide context-aware troubleshooting steps based on AWS best practices
    • Check security groups, VPC routing, network ACLs, and connectivity options
  • Significantly reduces mean time to resolution (MTTR) for DMS issues
  • Enables proactive monitoring and faster issue identification

Example Usage:

User: "My DMS replication task 'prod-migration' is failing. Can you help diagnose the issue?"

Assistant uses:
1. list_replication_tasks to find the task
2. diagnose_replication_issue for comprehensive RCA
3. get_task_cloudwatch_logs for error context
4. diagnose_network_connectivity to check for network issues
5. Provides actionable recommendations with AWS documentation links

Checklist

If your change doesn't seem to apply, please leave them unchecked.

  • I have reviewed the contributing guidelines
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Is this a breaking change? (Y/N): N

RFC issue number: N/A (new server addition)

Checklist:

  • Migration process documented (N/A - new server)
  • Implement warnings (N/A - new server)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@codecov
Copy link

codecov bot commented Nov 6, 2025

Codecov Report

❌ Patch coverage is 92.53394% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.11%. Comparing base (8d4587c) to head (1f85263).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
.../awslabs/aws_dms_troubleshoot_mcp_server/server.py 92.51% 21 Missing and 12 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1675      +/-   ##
==========================================
+ Coverage   88.91%   90.11%   +1.19%     
==========================================
  Files         162      754     +592     
  Lines       10783    54848   +44065     
  Branches     1504     8756    +7252     
==========================================
+ Hits         9588    49426   +39838     
- Misses        871     3475    +2604     
- Partials      324     1947    +1623     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@wchemz wchemz changed the title Feat/aws dms troubleshoot mcp server feat(aws-dms-troubleshoot-mcp-server): Add AWS dms troubleshoot mcp server Nov 6, 2025
@scottschreckengaust scottschreckengaust self-assigned this Nov 6, 2025
@scottschreckengaust scottschreckengaust added hold-merging Signals to hold the PR from merging new mcp server A new MCP server ideally linked to an issue labels Nov 6, 2025
@scottschreckengaust scottschreckengaust moved this from To triage to In progress in awslabs/mcp Project Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hold-merging Signals to hold the PR from merging new mcp server A new MCP server ideally linked to an issue

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants