Name	Name	Last commit message	Last commit date
parent directory ..
awslabs	awslabs
tests	tests
.gitignore	.gitignore
.python-version	.python-version
CHANGELOG.md	CHANGELOG.md
Dockerfile	Dockerfile
LICENSE	LICENSE
NOTICE	NOTICE
README.md	README.md
docker-healthcheck.sh	docker-healthcheck.sh
pyproject.toml	pyproject.toml
uv-requirements.txt	uv-requirements.txt
uv.lock	uv.lock

Amazon Redshift MCP Server

Model Context Protocol (MCP) server for Amazon Redshift.

This MCP server provides tools to discover, explore, and query Amazon Redshift clusters and serverless workgroups. It enables AI assistants to interact with Redshift resources safely and efficiently through a comprehensive set of discovery and query execution tools.

Features

Cluster Discovery: Automatically discover both provisioned Redshift clusters and serverless workgroups
Metadata Exploration: Browse databases, schemas, tables, and columns
Safe Query Execution: Execute SQL queries in a READ ONLY mode (a safe READ WRITE support is planned to be implemnted in the future versions)
Multi-Cluster Support: Work with multiple clusters and workgroups simultaneously

Prerequisites

Installation Requirements

Install uv from Astral or the GitHub README
Install Python 3.10 or newer using uv python install 3.10 (or a more recent version)

AWS Client Requirements

Credentials: Configure AWS credentials via AWS CLI, or environment variables
Region: Configure AWS region using one of the following (in order of precedence):
- AWS_REGION environment variable (highest priority)
- AWS_DEFAULT_REGION environment variable
- Region specified in your AWS profile configuration
Permissions: Ensure your AWS credentials have the required permissions (see Permissions section)

Installation

Kiro	Cursor	VS Code

Configure the MCP server in your MCP client configuration (e.g., for Kiro, edit ~/.kiro/settings/mcp.json):

{
  "mcpServers": {
    "awslabs.redshift-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.redshift-mcp-server@latest"],
      "env": {
        "AWS_PROFILE": "default",
        "AWS_DEFAULT_REGION": "us-east-1",
        "FASTMCP_LOG_LEVEL": "INFO"
      },
      "disabled": false,
      "autoApprove": []
    }
  }
}

Windows Installation

For Windows users, the MCP server configuration format is slightly different:

{
  "mcpServers": {
    "awslabs.redshift-mcp-server": {
      "disabled": false,
      "timeout": 60,
      "type": "stdio",
      "command": "uv",
      "args": [
        "tool",
        "run",
        "--from",
        "awslabs.redshift-mcp-server@latest",
        "awslabs.redshift-mcp-server.exe"
      ],
      "env": {
        "AWS_PROFILE": "your-aws-profile",
        "AWS_DEFAULT_REGION": "us-east-1",
        "FASTMCP_LOG_LEVEL": "ERROR"
      }
    }
  }
}

or docker after a successful docker build -t awslabs/redshift-mcp-server:latest .:

{
  "mcpServers": {
    "awslabs.redshift-mcp-server": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "--interactive",
        "--env", "AWS_ACCESS_KEY_ID=[your data]",
        "--env", "AWS_SECRET_ACCESS_KEY=[your data]",
        "--env", "AWS_DEFAULT_REGION=[your data]",
        "awslabs/redshift-mcp-server:latest"
      ]
    }
  }
}

Environment Variables

AWS_REGION: AWS region to use (overrides all other region settings)
AWS_DEFAULT_REGION: Default AWS region (used if AWS_REGION not set and no region in profile)
AWS_PROFILE: AWS profile to use (optional, uses default if not specified)
FASTMCP_LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR)
LOG_FILE: Path to log file (optional, logs to stdout if not specified)

Basic Usage

Discovery Workflow

Discover Clusters: Find available Redshift resources
List Databases: Explore databases in a specific cluster
Browse Database Structures: Navigate through schemas, tables, columns
Query Data: Execute SQL queries safely with a natural language prompt
Analyze Performance: Get execution plans to optimize query performance

Simple Examples

"Show me all available Redshift clusters"
"List databases in cluster 'my-redshift-cluster'"
"What tables are in the 'public' schema of database 'analytics'?"
"Execute: SELECT COUNT(*) FROM users WHERE created_date > '2024-01-01'"
"Explain the execution plan for my query to identify performance issues"

Advanced Examples

Natural Language Data Request

> Show me the last 3 sales from the analytics Redshift cluster.
I'll help you retrieve the last 3 sales from your analytics Redshift cluster. Let me start by discovering your available Redshift clusters and then
explore the database structure to find the sales data.

...skipping the auto-discovery logs...

Here are the last 3 sales from your analytics Redshift cluster:

Most Recent Sales (from tickit.sales table):

1. Sale ID 102373 - December 31, 2008 at 12:58:19
   • Price paid: $1,008.00
   • Seller ID: 26190, Buyer ID: 35491

2. Sale ID 75861 - December 31, 2008 at 12:50:02
   • Price paid: $372.00
   • Seller ID: 43402, Buyer ID: 39545

3. Sale ID 56302 - December 31, 2008 at 12:08:14
   • Price paid: $372.00
   • Seller ID: 5760, Buyer ID: 5797

The data shows ticket sales from the TICKIT sample dataset, which appears to be for event ticket transactions. All three sales occurred on December 31, 2008, with the most recent sale being for $1,008 for 4 tickets.

> Briefly describe your actions.
I performed the following actions to retrieve the last 3 sales from your analytics Redshift cluster:

1. Discovered clusters - Listed all available Redshift clusters and identified redshift-cluster-analytics as your analytics cluster
2. Explored databases - Found two databases: dev and sample_data_dev
3. Located schemas - Checked the sample_data_dev database and found the tickit schema containing sample data
4. Found sales table - Identified the sales table within the tickit schema
5. Analyzed table structure - Examined the columns in the sales table (salesid, pricepaid, qtysold, saletime, etc.)
6. Executed query - Ran a SQL query to get the 3 most recent sales ordered by saletime DESC

The process involved using the Redshift MCP tools to systematically navigate the cluster structure and retrieve the requested sales data from the TICKIT sample dataset.

> Show me the query text which you executed to get the sales.
Here's the exact SQL query I executed to get the last 3 sales:

sql
SELECT salesid, listid, sellerid, buyerid, eventid, qtysold, pricepaid, commission, saletime
FROM tickit.sales
ORDER BY saletime DESC
LIMIT 3;

Cross-Cluster Data Comparison Request

> How off is the analytics sales data from the ETL sales data based on the recent records in Redshift?
I'll help you analyze sales data from your Redshift ETL pipeline. To get started, I need to discover your Redshift environment and understand the data
structure.

...skipping auto-discovery logs...

# Sales Data Comparison: Analytics vs ETL

## Executive Summary
Sales data between Analytics cluster and ETL workgroup is perfectly synchronized with zero discrepancies found.

## Environment Overview
• **Analytics**: redshift-cluster-analytics (Provisioned, 2x dc2.large nodes)
• **ETL**: redshift-workgroup-etl (Serverless)
• **Database**: sample_data_dev.tickit.sales

## Data Comparison Results

### High-Level Metrics
| Metric          | Analytics Cluster | ETL Workgroup     | Status |
|-----------------|-------------------|-------------------|--------|
| Total Records   | 172,456           | 172,456           | Match  |
| Total Revenue   | $110,765,431      | $110,765,431      | Match  |
| Total Quantity  | 345,349           | 345,349           | Match  |
| Average Price   | $642.28           | $642.28           | Match  |
| Date Range      | 2008-01-01 to 2008-12-31 | 2008-01-01 to 2008-12-31 | Match |

### Recent Sales (Q4 2008)
| Month    | Analytics Revenue | ETL Revenue   | Analytics Txns | ETL Txns | Status |
|----------|-------------------|---------------|----------------|----------|--------|
| Dec 2008 | $4,521,806        | $4,521,806    | 7,059          | 7,059    | Match  |
| Nov 2008 | $9,044,905        | $9,044,905    | 14,261         | 14,261   | Match  |
| Oct 2008 | $10,726,368       | $10,726,368   | 16,741         | 16,741   | Match  |

## Key Findings
• Data Integrity: Perfect synchronization across all metrics
• Verification: Transaction-level details match exactly (verified 10 recent records)

> Briefly describe your actions.
I analyzed sales data from your Redshift environments by:

1. Discovered Infrastructure: Found 3 Redshift environments (analytics cluster, marketing cluster, ETL workgroup)
2. Explored Data Structure: Located sales data in sample_data_dev.tickit.sales table with 10 columns including salesid, pricepaid, qtysold, and saletime
3. Compared High-Level Metrics: Ran aggregate queries on both analytics cluster and ETL workgroup to compare total records (172,456), revenue ($110M), and date ranges
4. Analyzed Recent Data: Examined Q4 2008 monthly sales trends and verified the 10 most recent transactions from Dec 30-31, 2008
5. Performance Testing: Measured query execution times across different query types (6.7-19.5 seconds range)
6. Generated Report: Created a comprehensive comparison showing perfect data synchronization between environments with performance insights and recommendations

> Show me the text of the SQL queries you used.
Here are the SQL queries I executed during the analysis:

## 1. High-Level Metrics Query
sql
SELECT
    COUNT(*) as total_records,
    MIN(saletime) as earliest_sale,
    MAX(saletime) as latest_sale,
    SUM(pricepaid) as total_revenue,
    SUM(qtysold) as total_quantity,
    AVG(pricepaid) as avg_price
FROM tickit.sales;

## 2. Recent Monthly Sales Analysis
sql
SELECT
    DATE_TRUNC('month', saletime) as sale_month,
    COUNT(*) as transaction_count,
    SUM(pricepaid) as monthly_revenue,
    SUM(qtysold) as monthly_quantity,
    AVG(pricepaid) as avg_transaction_value
FROM tickit.sales
WHERE saletime >= '2008-10-01'
GROUP BY DATE_TRUNC('month', saletime)
ORDER BY sale_month DESC
LIMIT 10;

## 3. Recent Transaction Details
sql
SELECT
    salesid,
    listid,
    sellerid,
    buyerid,
    eventid,
    qtysold,
    pricepaid,
    commission,
    saletime
FROM tickit.sales
WHERE saletime >= '2008-12-30'
ORDER BY saletime DESC, salesid DESC
LIMIT 10;

Query Performance Analysis

> Explain the execution plan for: SELECT s.salesid, s.pricepaid, e.eventname
  FROM tickit.sales s JOIN tickit.event e ON s.eventid = e.eventid
  WHERE s.saletime > '2008-06-01' ORDER BY s.pricepaid DESC LIMIT 50

...auto-discovery of cluster (redshift-cluster-1), database (sample_data_dev), and tickit schema...

Here's the execution plan breakdown:

XN Limit  (rows=50)
└─ XN Merge  (rows=432,650)
   └─ XN Network  (Send to leader)
      └─ XN Sort  (Sort Key: s.pricepaid DESC)
         └─ XN Hash Join DS_BCAST_INNER  (rows=432,650)
            ├─ XN Seq Scan on sales s  (rows=209,317)
            │     Filter: saletime > '2008-06-01'
            └─ XN Hash
               └─ XN Seq Scan on event e  (rows=17,596)

Step-by-step execution (bottom-up):

1. Seq Scan on event (cost: 175.96) — Full table scan of the event table, reading all
   17,596 rows. No filter needed since every event row is a potential join match.

2. Hash (cost: 175.96) — Builds an in-memory hash table from the event rows, keyed on
   eventid. This is the "inner" side of the join.

3. Seq Scan on sales (cost: 4,311.40) — Full table scan of the sales table with a filter
   on saletime > '2008-06-01'. The planner estimates ~209,317 rows pass the filter out of
   344,912 total (~61%).

4. Hash Join DS_BCAST_INNER (cost: 5,630,735,660.65) — Joins sales and event on eventid.
   The DS_BCAST_INNER distribution means the event table (inner side) is broadcast to all
   compute nodes. This happens because the tables have different distribution keys (sales
   is distributed on listid, event on eventid).

5. Sort (cost: 1,005,630,776,162.84) — Each compute node sorts its local join results by
   pricepaid DESC. This is the most expensive step.

6. Network — Sends sorted results from compute nodes to the leader node.

7. Merge — Leader node merge-sorts the pre-sorted streams from each node.

8. Limit — Returns only the top 50 rows.

Table design details:

| Table         | Rows    | DISTKEY          | SORTKEY |
|---------------|---------|------------------|---------|
| tickit.sales  | 344,912 | listid (KEY)     | dateid  |
| tickit.event  | 17,596  | eventid (KEY)    | dateid  |

Column planner statistics (from pg_stats):

| Column            | Correlation | n_distinct | Encoding | SORTKEY |
|-------------------|-------------|------------|----------|---------|
| sales.pricepaid   | 0.005       | 4,485      | az64     | —       |
| sales.saletime    | 0.467       | ~49%       | az64     | —       |
| sales.dateid      | 0.469       | 367        | none     | 1       |
| sales.eventid     | 0.003       | 8,202      | az64     | —       |
| event.eventid     | 0.003       | ~48%       | az64     | —       |

Key observations:

- DS_BCAST_INNER on Hash Join — The event table is being broadcast to all nodes because
  sales.eventid doesn't match the sales DISTKEY (listid). With only 17K rows this is
  tolerable, but it's still unnecessary data movement.
- Sort is the dominant cost — Sorting 432K joined rows across nodes accounts for the vast
  majority of the query cost. The pricepaid column has very low correlation (0.005).
- No SORTKEY benefit for the WHERE filter — saletime is not a sort key on the sales table
  (the sort key is dateid), so Redshift can't skip blocks via zone maps.

Optimization recommendations:

1. Change event to DISTSTYLE ALL — Since event is a small dimension table (17K rows),
   replicating it to all nodes would eliminate the broadcast entirely.
2. Add saletime as a SORTKEY on sales — This would let Redshift use zone maps to skip
   blocks that don't match the WHERE saletime > '2008-06-01' filter.
3. Add compression to dateid columns — Both tables have dateid with ENCODE none. Using
   ENCODE AUTO or az64 would reduce storage and I/O.

Tools

list_clusters

Discovers all available Amazon Redshift clusters and serverless workgroups.

list_clusters() -> list[RedshiftCluster]

Returns: List of cluster information including:

Cluster identifier and type (provisioned/serverless)
Status and connection details
Configuration information (node type, encryption, etc.)
Tags and metadata

list_databases

Lists all databases in a specified Redshift cluster.

list_databases(cluster_identifier: str, database_name: str = "dev") -> list[RedshiftDatabase]

Parameters:

cluster_identifier: The cluster identifier from list_clusters
database_name: Database to connect to for querying (default: "dev")

Returns: List of database information including:

Database name and owner
Database type (local/shared)
Access control information
Isolation level

list_schemas

Lists all schemas in a specified database.

list_schemas(cluster_identifier: str, schema_database_name: str) -> list[RedshiftSchema]

Parameters:

cluster_identifier: The cluster identifier from list_clusters
schema_database_name: Database name to list schemas for

Returns: List of schema information including:

Schema name and owner
Schema type (local/external/shared)
Access permissions
External schema details (if applicable)

list_tables

Lists all tables in a specified schema.

list_tables(cluster_identifier: str, table_database_name: str, table_schema_name: str) -> list[RedshiftTable]

Parameters:

cluster_identifier: The cluster identifier from list_clusters
table_database_name: Database name containing the schema
table_schema_name: Schema name to list tables for

Returns: List of table information including:

Table name and type (TABLE/VIEW/EXTERNAL TABLE)
Access permissions and remarks
Redshift-specific metadata (from pg_catalog tables):
- Distribution style (KEY, EVEN, ALL, AUTO)
- Estimated row count
Table activity statistics (from pg_stat_user_tables):
- Sequential scan counts and tuples read
- DML activity (rows inserted, updated, deleted)
External table properties (for Spectrum tables):
- S3 location
- JSON parameters (partition columns, etc.)

list_columns

Lists all columns in a specified table.

list_columns(
    cluster_identifier: str,
    column_database_name: str,
    column_schema_name: str,
    column_table_name: str
) -> list[RedshiftColumn]

Parameters:

cluster_identifier: The cluster identifier from list_clusters
column_database_name: Database name containing the table
column_schema_name: Schema name containing the table
column_table_name: Table name to list columns for

Returns: List of column information including:

Column name and data type
Nullable status and default values
Numeric precision and scale
Character length limits
Ordinal position and remarks
Redshift-specific properties (encoding, distkey, sortkey position)
External table column properties (external type, partition key)

Note: Column planner statistics (n_distinct, null_frac, avg_width, correlation from pg_stats) are only populated when columns are returned as part of describe_execution_plan table designs.

execute_query

Executes a SQL query against a Redshift cluster with safety protections.

execute_query(cluster_identifier: str, database_name: str, sql: str) -> QueryResult

Parameters:

cluster_identifier: The cluster identifier from list_clusters
database_name: Database to execute the query against
sql: SQL statement to execute (SELECT statements recommended)

Returns: Query result including:

Column names and data types
Result rows with proper type conversion
Row count and execution time
Query ID for reference

describe_execution_plan

Generates the query execution plan for a SQL statement without executing the query.

describe_execution_plan(cluster_identifier: str, database_name: str, sql: str) -> ExecutionPlan

Parameters:

cluster_identifier: The cluster identifier from list_clusters
database_name: Database where the query would run against
sql: SQL statement to explain (do not include EXPLAIN keyword)

Returns: Execution plan including:

Structured plan nodes with costs, rows, and distribution info
Table design information (DISTKEY, SORTKEY, encoding) for referenced tables
Column planner statistics (n_distinct, null_frac, correlation, most_common_vals, histogram_bounds) for referenced columns
Human-readable plan text (summarized for large plans)
Rule-based performance optimization suggestions derived from plan analysis, table design, and column statistics

Permissions note: Column planner statistics from pg_stats require SELECT privilege on the table for the statistics to be visible. Table activity statistics from pg_stat_user_tables are visible to all users regardless of permissions.

Permissions

AWS IAM Permissions

Your AWS credentials need the following IAM permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "redshift:DescribeClusters",
        "redshift-serverless:ListWorkgroups",
        "redshift-serverless:GetWorkgroup",
        "redshift-data:ExecuteStatement",
        "redshift-data:DescribeStatement",
        "redshift-data:GetStatementResult",
        "redshift-serverless:GetCredentials",
        "redshift:GetClusterCredentialsWithIAM",
        "redshift:GetClusterCredentials"
      ],
      "Resource": "*"
    }
  ]
}

Database Permissions

In addition to AWS IAM permissions, you need appropriate database-level permissions:

Read Access: SELECT permissions on tables/views you want to query
Schema Access: USAGE permissions on schemas you want to explore
Database Access: Connection permissions to databases you want to access

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Amazon Redshift MCP Server

Features

Prerequisites

Installation Requirements

AWS Client Requirements

Installation

Windows Installation

Environment Variables

Basic Usage

Discovery Workflow

Simple Examples

Advanced Examples

Natural Language Data Request

Cross-Cluster Data Comparison Request

Query Performance Analysis

Tools

list_clusters

list_databases

list_schemas

list_tables

list_columns

execute_query

describe_execution_plan

Permissions

AWS IAM Permissions

Database Permissions

FilesExpand file tree

redshift-mcp-server

Directory actions

More options

Directory actions

More options

Latest commit

History

redshift-mcp-server

Folders and files

parent directory

README.md

Amazon Redshift MCP Server

Features

Prerequisites

Installation Requirements

AWS Client Requirements

Installation

Windows Installation

Environment Variables

Basic Usage

Discovery Workflow

Simple Examples

Advanced Examples

Natural Language Data Request

Cross-Cluster Data Comparison Request

Query Performance Analysis

Tools

list_clusters

list_databases

list_schemas

list_tables

list_columns

execute_query

describe_execution_plan

Permissions

AWS IAM Permissions

Database Permissions