-
Notifications
You must be signed in to change notification settings - Fork 362
feat: bridge bulk insert #5927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: bridge bulk insert #5927
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others LGTM
8f69db5
to
a113631
Compare
## Implement Bulk Insert and Update Dependencies - **Bulk Insert Implementation**: Added `handle_bulk_inserts` method in `src/operator/src/bulk_insert.rs` to manage bulk insert requests using `FlightDecoder` and `FlightData`. - **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to use the latest revision of `greptime-proto` and added new dependencies like `arrow`, `arrow-ipc`, `bytes`, and `prost`. - **gRPC Enhancements**: Modified `put_record_batch` method in `src/frontend/src/instance/grpc.rs` and `src/servers/src/grpc/flight.rs` to handle `FlightData` instead of `RawRecordBatch`. - **Error Handling**: Added new error types in `src/operator/src/error.rs` for handling Arrow operations and decoding flight data. - **Miscellaneous**: Updated `src/operator/src/insert.rs` to expose `partition_manager` and `node_manager` as public fields.
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`. - **Refactor gRPC Query Handling**: Removed `RawRecordBatch` usage from `grpc.rs`, `flight.rs`, `greptime_handler.rs`, and test files, simplifying the gRPC query handling. - **Enhance Bulk Insert Logic**: Improved bulk insert logic in `bulk_insert.rs` and `region_request.rs` by using `FlightDecoder` and `BooleanArray` for better performance and clarity. - **Add `common-grpc` Dependency**: Added `common-grpc` as a workspace dependency in `store-api/Cargo.toml` to support gRPC functionalities.
Add error handling for encoding/decoding in `metadata.rs` and `region_request.rs` - Introduced new error variants `FlightCodec` and `Prost` in `MetadataError` to handle encoding/decoding failures in `metadata.rs`. - Updated `make_region_bulk_inserts` function in `region_request.rs` to use `context` for error handling with `ProstSnafu` and `FlightCodecSnafu`. - Enhanced error handling for `FlightData` decoding and `filter_record_batch` operations.
- **Remove Logging**: Removed unnecessary logging of affected rows in `region_server.rs`. - **Error Handling Enhancement**: Improved error handling in `bulk_insert.rs` by adding context to `split_record_batch` and handling single datanode fast path. - **Error Enum Cleanup**: Removed unused `Arrow` error variant from `error.rs`.
### Enhance Bulk Insert Handling and Metadata Management - **`lib.rs`**: Enabled the `result_flattening` feature for improved error handling. - **`request.rs`**: Made `name_to_index` and `has_null` fields public in `WriteRequest` for better accessibility. - **`handle_bulk_insert.rs`**: - Added `handle_record_batch` function to streamline processing of bulk insert payloads. - Improved error handling and task management for bulk insert operations. - Updated `region_metadata_to_column_schema` to return both column schemas and a name-to-index map for efficient data access.
- **Refactor `handle_bulk_insert.rs`:** - Replaced `handle_record_batch` with `handle_payload` for handling payloads. - Modified the fast path to use `common_runtime::spawn_global` for asynchronous task execution. - **Optimize `multi_dim.rs`:** - Added a fast path for single-region scenarios in `MultiDimPartitionRule::partition_record_batch`.
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`. - **Optimize Memory Allocation**: Increased initial and builder capacities in `time_series.rs` to improve performance. - **Enhance Data Handling**: Modified `bulk_insert.rs` to use `Bytes` for efficient data handling. - **Improve Bulk Insert Logic**: Refined the bulk insert logic in `region_request.rs` to handle schema and payload data more effectively and optimize record batch filtering. - **String Handling Improvement**: Updated string conversion in `helper.rs` for better performance.
**Add Metrics and Improve Error Handling** - **Metrics Enhancements**: Introduced new metrics for bulk insert operations in `metrics.rs`, `bulk_insert.rs`, `greptime_handler.rs`, and `region_request.rs`. Added `HANDLE_BULK_INSERT_ELAPSED`, `BULK_REQUEST_MESSAGE_SIZE`, and `GRPC_BULK_INSERT_ELAPSED` histograms to monitor performance. - **Error Handling Improvements**: Removed unnecessary error handling in `handle_bulk_insert.rs` by eliminating redundant `let _ =` patterns. - **Dependency Updates**: Added `lazy_static` and `prometheus` to `Cargo.lock` and `Cargo.toml` for metrics support. - **Code Refactoring**: Simplified function calls in `region_server.rs` and `handle_bulk_insert.rs` for better readability.
6ff5bf7
to
30cfe62
Compare
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR bridges bulk insert request from frontend to datanode.
PR Checklist