-
Notifications
You must be signed in to change notification settings - Fork 636
Fix HTTP 413 errors via multipart upload and dynamic chunking in ImportStreamer #3663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…k with the new Timesketch form memory limit introduced by Werkzeug.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly enhances the Timesketch importer's ability to handle large data uploads by implementing dynamic payload splitting. The ImportStreamer class now defines a DEFAULT_MAX_PAYLOAD_SIZE and a PAYLOAD_SAFETY_BUFFER. The _upload_data_buffer and _upload_data_frame methods were refactored to serialize data, check its size against a calculated safe limit, and recursively split the data into smaller chunks for upload if the limit is exceeded. Data is now sent as a multipart/form-data field. A new set_max_payload_size method was added to allow configuration of the maximum payload size, which is exposed via a new --max-payload-size command-line argument in the timesketch_importer.py tool. Minor changes include import reordering and the removal of a time.sleep workaround.
…t contain the value (backwards compatibility)
jaegeral
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments / suggestions
Co-authored-by: Alexander J <[email protected]>
jaegeral
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR updates the
timesketch-import-clientto resolve persistentHTTP 413 Payload Too Largeerrors when uploading large datasets (CSV, JSONL). This issue arose due to the recent enforcement ofMAX_FORM_MEMORY_SIZE(default 200MB) in Werkzeug/Flask, combined with the data expansion caused by URL-encoding large JSON payloads.Key Changes:
Switch to
multipart/form-data:_upload_data_frameand_upload_data_bufferfromapplication/x-www-form-urlencodedtomultipart/form-data.Dynamic Recursive Chunking:
Configurable Payload Limits:
--max-payload-sizeargument to the CLI (timesketch_importer) and a corresponding setter inImportStreamer.Optimization & Cleanup:
time.sleep(2)calls in the upload loop. Retries and backoff are already handled robustly by theurllib3HTTPAdapterin the API client session.MAX_FORM_MEMORY_SIZEvalue of 200Mb to the app.py to handle situations where the config is not updated.Impact: