Skip to content

Conversation

codingogre
Copy link

…w if you want the code and if so do you want the pull requests broken up so you can apply them easier?

Key Changes
🚀 Performance Optimizations
Memory and Concurrency Tuning

Queue Memory: Increased from 25MB → 1GB for handling huge records
Concurrent Tasks: Reduced from 1000 → 200 for better resource management
Table Fetch Size: Increased from 50 → 100 records per API call
Table Batch Size: Reduced from 5 → 1 for more granular processing
Attachment Batch Size: Reduced from 10 → 5 for optimization with 10K+ attachments

Specialized sys_db_object Handling

New Constants: Added SYS_DB_OBJECT_FETCH_SIZE = 500 and SYS_DB_OBJECT_BATCH_SIZE = 10
Aggressive Fetching: Uses larger fetch sizes specifically for the sys_db_object table
Optimized Batching: Separate batch processing logic for this critical table

🛡️ Enhanced Error Handling
JSON Processing Improvements

Data Type Validation: Added checks for expected list/dict types in _yield_table_data()
Individual Record Error Handling: Continues processing even if individual records fail
Detailed Logging: Better debugging information for batch processing

🔧 Method Signature Updates
Enhanced get_table_length()

Added custom_filter parameter: Supports advanced filtering when counting table records. The previous code was counting the records in the table without advanced filters and creating huge batches of API calls
Better Error Context: More informative error messages with filter information

Improved URL Preparation

Table-Aware Processing: _prepare_url() now accepts table_name parameter
Dynamic Fetch Sizes: Uses appropriate fetch size based on table type

Conditional Debug Logging

New skip_debug_logging parameter: Reduces verbose logging for sys_db_object operations
Performance Focused: Less logging overhead for high-volume operations

🔒 SSL Configuration Changes
Certificate Validation

Added SSL Parameter: aiohttp.TCPConnector now includes ssl=False parameter
Development and Firewall Friendly: Allows connections to ServiceNow instances with self-signed certificates or certificates that won't pass hostname verification
Flexible Deployment: Supports various SSL certificate configurations

📊 Batch Processing Enhancements
🎯 Intelligent Batch Creation - Major Fix

Context-Aware Batching: Different strategies for different table types based on data volume and processing requirements
Resource-Optimized: Uses appropriate batch sizes (SYS_DB_OBJECT_BATCH_SIZE vs TABLE_BATCH_SIZE) depending on the table being processed
Memory Efficient: Prevents overwhelming system resources with overly large batches

🔧 Critical Advanced Rules Bug Fix

Previous Issue: The get_filter_apis() method was NOT batching API calls at all - it collected all APIs into a single massive list
New Implementation: Now properly yields batched API calls using TABLE_BATCH_SIZE chunks instead of returning one enormous unbatched list
Memory Management: Prevents out-of-memory issues when processing advanced rules with thousands of API calls
Improved Flow: Advanced rules now process in manageable chunks rather than attempting to execute all API calls simultaneously

The advanced rules fix was particularly critical as the original code would attempt to execute potentially thousands of API calls concurrently without any batching mechanism, leading to resource exhaustion and timeouts.

Closes https://github.com/elastic/connectors-py/issues/###

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • For bugfixes: backport safely to all minor branches still receiving patch releases
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

Changes Requiring Extra Attention

  • Security-related changes (encryption, TLS, SSRF, etc)
  • New external service dependencies added.

Related Pull Requests

Release Note

…w if you want the code and if so do you want the pull requests broken up so you can apply them easier?

Key Changes
🚀 Performance Optimizations
Memory and Concurrency Tuning

Queue Memory: Increased from 25MB → 1GB for handling huge records
Concurrent Tasks: Reduced from 1000 → 200 for better resource management
Table Fetch Size: Increased from 50 → 100 records per API call
Table Batch Size: Reduced from 5 → 1 for more granular processing
Attachment Batch Size: Reduced from 10 → 5 for optimization with 10K+ attachments

Specialized sys_db_object Handling

New Constants: Added SYS_DB_OBJECT_FETCH_SIZE = 500 and SYS_DB_OBJECT_BATCH_SIZE = 10
Aggressive Fetching: Uses larger fetch sizes specifically for the sys_db_object table
Optimized Batching: Separate batch processing logic for this critical table

🛡️ Enhanced Error Handling
JSON Processing Improvements

Data Type Validation: Added checks for expected list/dict types in _yield_table_data()
Individual Record Error Handling: Continues processing even if individual records fail
Detailed Logging: Better debugging information for batch processing

🔧 Method Signature Updates
Enhanced get_table_length()

Added custom_filter parameter: Supports advanced filtering when counting table records. The previous code was counting the records in the table without advanced filters and creating huge batches of API calls
Better Error Context: More informative error messages with filter information

Improved URL Preparation

Table-Aware Processing: _prepare_url() now accepts table_name parameter
Dynamic Fetch Sizes: Uses appropriate fetch size based on table type

Conditional Debug Logging

New skip_debug_logging parameter: Reduces verbose logging for sys_db_object operations
Performance Focused: Less logging overhead for high-volume operations

🔒 SSL Configuration Changes
Certificate Validation

Added SSL Parameter: aiohttp.TCPConnector now includes ssl=False parameter
Development and Firewall Friendly: Allows connections to ServiceNow instances with self-signed certificates or certificates that won't pass hostname verification
Flexible Deployment: Supports various SSL certificate configurations

📊 Batch Processing Enhancements
🎯 Intelligent Batch Creation - Major Fix

Context-Aware Batching: Different strategies for different table types based on data volume and processing requirements
Resource-Optimized: Uses appropriate batch sizes (SYS_DB_OBJECT_BATCH_SIZE vs TABLE_BATCH_SIZE) depending on the table being processed
Memory Efficient: Prevents overwhelming system resources with overly large batches

🔧 Critical Advanced Rules Bug Fix

Previous Issue: The get_filter_apis() method was NOT batching API calls at all - it collected all APIs into a single massive list
New Implementation: Now properly yields batched API calls using TABLE_BATCH_SIZE chunks instead of returning one enormous unbatched list
Memory Management: Prevents out-of-memory issues when processing advanced rules with thousands of API calls
Improved Flow: Advanced rules now process in manageable chunks rather than attempting to execute all API calls simultaneously

The advanced rules fix was particularly critical as the original code would attempt to execute potentially thousands of API calls concurrently without any batching mechanism, leading to resource exhaustion and timeouts.
@seanstory
Copy link
Member

buildkite test this

@seanstory seanstory marked this pull request as draft September 4, 2025 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants