Reserved words handling + Support for data store selection #80

jacknagz · 2025-06-03T20:35:05Z

Summary

Refactor reserved words implementation to follow official Snowflake documentation precisely
Add support for quoting reserved words inside function calls (e.g., COUNT(DISTINCT account))
Implement smart filtering to avoid inappropriately quoting SQL keywords
Add comprehensive test coverage including real-world VPC Flow logs scenarios

Test plan

All existing tests pass (105 passed, 4 skipped)
New comprehensive test suite covers success and error cases
Real-world AWS VPC Flow logs query example tested and working
Function call detection prevents quoting function names
CASE WHEN expressions work correctly
Forbidden keyword validation works as expected
Linting passes

🤖 Generated with Claude Code

- Refactor reserved words implementation to follow official Snowflake documentation - Add support for quoting reserved words inside function calls (e.g., COUNT(DISTINCT account)) - Implement smart filtering to avoid quoting SQL keywords inappropriately - Add comprehensive test coverage including real-world VPC Flow logs scenarios - Maintain backward compatibility while improving query reliability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

CLAassistant · 2025-06-03T20:35:11Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ panther-bot-automation
❌ jacknagz
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

src/mcp_panther/panther_mcp_core/tools/data_lake.py

darwayne

Looks mostly good just had a question

- Include complete lists of forbidden words in error messages to help LLMs learn - Add get_reserved_words_info() helper function for reference - Enhance docstring to document error message improvements - Maintain backward compatibility with existing error detection 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

jacknagz · 2025-06-06T18:34:20Z

Resolves #78

jacknagz · 2025-06-11T03:34:53Z

Ok, this is looking ready.

Summary of Reserved Word Testing
I've successfully executed 10 queries testing various Snowflake reserved words and baselining behaviors on the panther_audit logs.

Test Results:

Query 1 - Initially failed with CURRENT_DATE (correctly caught as reserved), then succeeded when using explicit date
Query 2 - Successfully used DATE, TYPE, GROUP BY, and ORDER BY
Query 3 - Successfully used TIMESTAMP functions and HOUR
Query 4 - Failed when trying to use ROWS in window functions (syntax error)
Query 5 - Successfully used ORDER BY with complex expressions
Query 6 - Successfully used TIME functions with EXTRACT and CASE
Query 7 - Successfully used CTEs (WITH clause) and nested queries
Query 8 - Successfully used SELECT with complex field references
Query 9 - Successfully used TABLE references in subqueries
Query 10 - Successfully combined multiple reserved words in a complex query

Key Findings:

The reserved word check is working - It correctly caught CURRENT_DATE and prevented its use
Most reserved words can be used as aliases - Words like USER, DATE, TYPE, COUNT, etc. work fine as column aliases
Context matters - Some reserved words like ROWS cause syntax errors in specific contexts (window functions)
The check appears to focus on specific problematic reserved words rather than all Snowflake reserved words

- Add PANTHER_DATASTORE_TYPE environment variable (defaults to snowflake) - Implement automatic schema reference formatting (.public removal for Redshift) - Add datastore-specific reserved words handling - Update all SQL generation functions to be datastore-aware - Add comprehensive test coverage for both datastores - Update documentation in README.md, src/README.md, and CLAUDE.md 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add validation to require fully qualified table references (database.table format) - Implement automatic database reference conversion (.public removed for Redshift) - Add get_query_syntax_help() tool for datastore-specific guidance - Enhance SQL processing pipeline with conversion and validation steps - Add comprehensive test coverage for new functionality - Fix misleading documentation about automatic SQL syntax adaptation This enables users to write portable SQL queries that work across both Snowflake and Redshift datastores with automatic conversion handling schema differences. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add list_data_lake_queries() with comprehensive filtering options (status, date ranges, contains, pagination) - Add cancel_data_lake_query() to terminate long-running queries and prevent system overload - Include user vs API token issuer details and query timing information - Add comprehensive test coverage with 9 new test functions covering success, error, and edge cases - Enable proactive query management to diagnose and resolve performance bottlenecks 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Replace Optional[T] with modern T | None syntax following fastmcp best practices - Add QueryStatus enum for type-safe status filtering - Add proper MCP tool annotations (readOnlyHint, destructiveHint) - Use regex patterns for ISO 8601 datetime validation - Simplify parameter annotations following metrics.py patterns - Remove redundant status validation (now handled by enum types) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

jacknagz · 2025-06-11T23:26:20Z

🚀 Additional Updates Since Initial PR

This PR has been significantly enhanced with additional improvements beyond the original data lake query management functionality:

📊 New Data Lake Query Management Tools

list_data_lake_queries() - Comprehensive query monitoring with filtering by status, date ranges, content search, and pagination
cancel_data_lake_query() - Resource management tool to terminate long-running queries and prevent system overload

🏗️ Enhanced Type Safety & Modern Annotations

Migrated from Optional[T] to modern T | None syntax following fastmcp best practices
Added QueryStatus(str, Enum) for type-safe status filtering
Implemented proper MCP tool annotations:
- readOnlyHint: true for monitoring operations
- destructiveHint: true for query cancellation
Added regex patterns for ISO 8601 datetime validation

🔍 Comprehensive Test Coverage

Added 9 new test functions covering success, error, and edge cases
Tests for both user-issued and API token-issued queries
Validation of filtering, pagination, and error handling
All tests passing with improved type safety

📋 GraphQL Schema Integration

Added LIST_DATA_LAKE_QUERIES query with complete field selection
Added CANCEL_DATA_LAKE_QUERY mutation for resource management
Proper handling of user vs API token issuer information

⚡ Key Benefits

Performance Monitoring: Identify and cancel resource-intensive queries
System Health: Prevent data lake overload during peak usage
Operational Visibility: Track query history, timing, and ownership
Type Safety: Enum-based validation prevents runtime errors
User Experience: Clear error messages and comprehensive filtering

🎯 Common Use Cases

# Find running queries that need attention
list_data_lake_queries(status=[QueryStatus.RUNNING])

# Monitor recent query activity
list_data_lake_queries(started_at_after="2024-01-01T00:00:00.000Z")

# Cancel problematic queries
cancel_data_lake_query("long-running-query-id")

The implementation follows all fastmcp best practices and provides essential tools for managing data lake performance and preventing system overload. 🎉

jacknagz requested a review from a team as a code owner June 3, 2025 20:35

style: format and fix code with ruff

39e23c7

jacknagz self-assigned this Jun 3, 2025

darwayne reviewed Jun 3, 2025

View reviewed changes

src/mcp_panther/panther_mcp_core/tools/data_lake.py Show resolved Hide resolved

darwayne reviewed Jun 3, 2025

View reviewed changes

darwayne approved these changes Jun 3, 2025

View reviewed changes

jacknagz added 2 commits June 3, 2025 15:59

update CLAUDE.md and Makefile

b508feb

latest uv.lock

00d8fe2

jacknagz and others added 4 commits June 11, 2025 14:31

jacknagz changed the title ~~Enhance Snowflake reserved words handling with function call support~~ Reserved words handling + Support for data store selection Jun 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reserved words handling + Support for data store selection #80

Reserved words handling + Support for data store selection #80

Uh oh!

jacknagz commented Jun 3, 2025

Uh oh!

CLAassistant commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

darwayne left a comment

Uh oh!

jacknagz commented Jun 6, 2025

Uh oh!

jacknagz commented Jun 11, 2025

Uh oh!

jacknagz commented Jun 11, 2025

Uh oh!

Uh oh!

Reserved words handling + Support for data store selection #80

Are you sure you want to change the base?

Reserved words handling + Support for data store selection #80

Uh oh!

Conversation

jacknagz commented Jun 3, 2025

Summary

Test plan

Uh oh!

CLAassistant commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

darwayne left a comment

Choose a reason for hiding this comment

Uh oh!

jacknagz commented Jun 6, 2025

Uh oh!

jacknagz commented Jun 11, 2025

Uh oh!

jacknagz commented Jun 11, 2025

🚀 Additional Updates Since Initial PR

📊 New Data Lake Query Management Tools

🏗️ Enhanced Type Safety & Modern Annotations

🔍 Comprehensive Test Coverage

📋 GraphQL Schema Integration

⚡ Key Benefits

🎯 Common Use Cases

Uh oh!

Uh oh!

CLAassistant commented Jun 3, 2025 •

edited

Loading