Description
We want to test the Spark History Server MCP's ability to handle large-scale Spark event logs from long-running jobs (4+ hours) with gigabytes of event data. This will validate the MCP server's performance and scalability when processing enterprise-scale Spark applications.
Test Objectives
- Validate MCP server performance with large event logs (GB-scale data)
- Test response times for complex queries on long-running Spark jobs (4+ hours)
- Identify potential bottlenecks in MCP server processing
- Ensure memory efficiency when handling large event datasets
Test Plan
Phase 1: Data Generation
- Use Spark Benchmarking Kit to run 8-hour benchmark job to generate Spark event logs in S3 buckets
- Duplicate the same logs with different spark app id - 50 to 100 copies
Load Testing
- Test with local MCP server pointing to spark history server (with 50+ jobs Spark events)
- Simulate realistic query patterns:
- Application overview requests
- Stage-level performance analysis
- Task-level bottleneck identification
- Job comparison operations
- Measure response times and resource utilization
Description
We want to test the Spark History Server MCP's ability to handle large-scale Spark event logs from long-running jobs (4+ hours) with gigabytes of event data. This will validate the MCP server's performance and scalability when processing enterprise-scale Spark applications.
Test Objectives
Test Plan
Phase 1: Data Generation
Load Testing