Skip to content

Reserved words handling + Support for data store selection #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jacknagz
Copy link
Contributor

@jacknagz jacknagz commented Jun 3, 2025

Summary

  • Refactor reserved words implementation to follow official Snowflake documentation precisely
  • Add support for quoting reserved words inside function calls (e.g., COUNT(DISTINCT account))
  • Implement smart filtering to avoid inappropriately quoting SQL keywords
  • Add comprehensive test coverage including real-world VPC Flow logs scenarios

Test plan

  • All existing tests pass (105 passed, 4 skipped)
  • New comprehensive test suite covers success and error cases
  • Real-world AWS VPC Flow logs query example tested and working
  • Function call detection prevents quoting function names
  • CASE WHEN expressions work correctly
  • Forbidden keyword validation works as expected
  • Linting passes

🤖 Generated with Claude Code

- Refactor reserved words implementation to follow official Snowflake documentation
- Add support for quoting reserved words inside function calls (e.g., COUNT(DISTINCT account))
- Implement smart filtering to avoid quoting SQL keywords inappropriately
- Add comprehensive test coverage including real-world VPC Flow logs scenarios
- Maintain backward compatibility while improving query reliability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@jacknagz jacknagz requested a review from a team as a code owner June 3, 2025 20:35
@CLAassistant
Copy link

CLAassistant commented Jun 3, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ panther-bot-automation
❌ jacknagz
You have signed the CLA already but the status is still pending? Let us recheck it.

@jacknagz jacknagz self-assigned this Jun 3, 2025
Copy link
Collaborator

@darwayne darwayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good just had a question

- Include complete lists of forbidden words in error messages to help LLMs learn
- Add get_reserved_words_info() helper function for reference
- Enhance docstring to document error message improvements
- Maintain backward compatibility with existing error detection

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@jacknagz
Copy link
Contributor Author

jacknagz commented Jun 6, 2025

Resolves #78

@jacknagz
Copy link
Contributor Author

Ok, this is looking ready.

Summary of Reserved Word Testing
I've successfully executed 10 queries testing various Snowflake reserved words and baselining behaviors on the panther_audit logs.

Test Results:

Query 1 - Initially failed with CURRENT_DATE (correctly caught as reserved), then succeeded when using explicit date
Query 2 - Successfully used DATE, TYPE, GROUP BY, and ORDER BY
Query 3 - Successfully used TIMESTAMP functions and HOUR
Query 4 - Failed when trying to use ROWS in window functions (syntax error)
Query 5 - Successfully used ORDER BY with complex expressions
Query 6 - Successfully used TIME functions with EXTRACT and CASE
Query 7 - Successfully used CTEs (WITH clause) and nested queries
Query 8 - Successfully used SELECT with complex field references
Query 9 - Successfully used TABLE references in subqueries
Query 10 - Successfully combined multiple reserved words in a complex query

Key Findings:

The reserved word check is working - It correctly caught CURRENT_DATE and prevented its use
Most reserved words can be used as aliases - Words like USER, DATE, TYPE, COUNT, etc. work fine as column aliases
Context matters - Some reserved words like ROWS cause syntax errors in specific contexts (window functions)
The check appears to focus on specific problematic reserved words rather than all Snowflake reserved words

jacknagz and others added 4 commits June 11, 2025 14:31
- Add PANTHER_DATASTORE_TYPE environment variable (defaults to snowflake)
- Implement automatic schema reference formatting (.public removal for Redshift)
- Add datastore-specific reserved words handling
- Update all SQL generation functions to be datastore-aware
- Add comprehensive test coverage for both datastores
- Update documentation in README.md, src/README.md, and CLAUDE.md

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add validation to require fully qualified table references (database.table format)
- Implement automatic database reference conversion (.public removed for Redshift)
- Add get_query_syntax_help() tool for datastore-specific guidance
- Enhance SQL processing pipeline with conversion and validation steps
- Add comprehensive test coverage for new functionality
- Fix misleading documentation about automatic SQL syntax adaptation

This enables users to write portable SQL queries that work across both Snowflake
and Redshift datastores with automatic conversion handling schema differences.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add list_data_lake_queries() with comprehensive filtering options (status, date ranges, contains, pagination)
- Add cancel_data_lake_query() to terminate long-running queries and prevent system overload
- Include user vs API token issuer details and query timing information
- Add comprehensive test coverage with 9 new test functions covering success, error, and edge cases
- Enable proactive query management to diagnose and resolve performance bottlenecks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Replace Optional[T] with modern T | None syntax following fastmcp best practices
- Add QueryStatus enum for type-safe status filtering
- Add proper MCP tool annotations (readOnlyHint, destructiveHint)
- Use regex patterns for ISO 8601 datetime validation
- Simplify parameter annotations following metrics.py patterns
- Remove redundant status validation (now handled by enum types)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@jacknagz
Copy link
Contributor Author

🚀 Additional Updates Since Initial PR

This PR has been significantly enhanced with additional improvements beyond the original data lake query management functionality:

📊 New Data Lake Query Management Tools

  • list_data_lake_queries() - Comprehensive query monitoring with filtering by status, date ranges, content search, and pagination
  • cancel_data_lake_query() - Resource management tool to terminate long-running queries and prevent system overload

🏗️ Enhanced Type Safety & Modern Annotations

  • Migrated from Optional[T] to modern T | None syntax following fastmcp best practices
  • Added QueryStatus(str, Enum) for type-safe status filtering
  • Implemented proper MCP tool annotations:
    • readOnlyHint: true for monitoring operations
    • destructiveHint: true for query cancellation
  • Added regex patterns for ISO 8601 datetime validation

🔍 Comprehensive Test Coverage

  • Added 9 new test functions covering success, error, and edge cases
  • Tests for both user-issued and API token-issued queries
  • Validation of filtering, pagination, and error handling
  • All tests passing with improved type safety

📋 GraphQL Schema Integration

  • Added LIST_DATA_LAKE_QUERIES query with complete field selection
  • Added CANCEL_DATA_LAKE_QUERY mutation for resource management
  • Proper handling of user vs API token issuer information

Key Benefits

  • Performance Monitoring: Identify and cancel resource-intensive queries
  • System Health: Prevent data lake overload during peak usage
  • Operational Visibility: Track query history, timing, and ownership
  • Type Safety: Enum-based validation prevents runtime errors
  • User Experience: Clear error messages and comprehensive filtering

🎯 Common Use Cases

# Find running queries that need attention
list_data_lake_queries(status=[QueryStatus.RUNNING])

# Monitor recent query activity
list_data_lake_queries(started_at_after="2024-01-01T00:00:00.000Z")

# Cancel problematic queries
cancel_data_lake_query("long-running-query-id")

The implementation follows all fastmcp best practices and provides essential tools for managing data lake performance and preventing system overload. 🎉

@jacknagz jacknagz changed the title Enhance Snowflake reserved words handling with function call support Reserved words handling + Support for data store selection Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants