feat: add batching for internal links crawl detection to prevent Lamb… #1874
+4,418
−181
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…da timeouts
This commit introduces batch processing for the crawl-based broken internal links detection to prevent AWS Lambda 15-minute timeout issues on sites with 100+ pages.
Key Changes:
Architecture:
batch-state.js: S3-based state management utilitiescrawl-detection.js: Batch processing logic with URL cachinghandler.js: Batch orchestration with SQS self-loopinghelpers.js: Enhanced timeout detection for link validationTesting:
This implementation ensures audits complete successfully even for large sites (500+ pages) by processing them in manageable batches across multiple Lambda invocations, with each invocation staying well under the 15-minute timeout limit.
Please ensure your pull request adheres to the following guidelines:
Related Issues
Thanks for contributing!