-
Notifications
You must be signed in to change notification settings - Fork 0
feat: support streaming processing of binary search tree #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rules for development, naming conventions, and performance optimization
…efactor type parameter of TypeErasedStreamableIndex
WalkthroughThis pull request updates configuration settings and documentation guidelines, notably changing the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Serializer
participant Writer
User->>CLI: Execute serialize command with bbox & ge flag
CLI->>Serializer: Call serialize(input, output, attr_index, bbox, ge)
Serializer->>Serializer: parse_bbox(bbox)
alt bbox provided
Serializer->>Serializer: Filter features by bounding box
alt ge flag true
Serializer->>Serializer: calculate_geospatial_extent(features)
end
else No bbox provided
Serializer->>Serializer: Process all features
end
Serializer->>Writer: Write FCB file with header options
Writer->>CLI: Return FCB output
CLI->>User: Display output file created message
sequenceDiagram
participant QueryClient
participant MultiIndex
participant Reader
QueryClient->>MultiIndex: Initiate stream_query(query, index_offsets)
MultiIndex->>Reader: Seek and read index data
Reader-->>MultiIndex: Return matching ValueOffsets
MultiIndex->>QueryClient: Send streaming query results
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Here's the code health analysis summary for commits Analysis Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (23)
.cursor/rules/memory/tdd-rust-guidelines.md (1)
60-62
: Add a language specifier to the fenced code block.The code block should include a language specifier for proper syntax highlighting.
-``` -.docs/tdd-rust-guidelines.md -``` +```markdown +.docs/tdd-rust-guidelines.md +```🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
60-60: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
.cursor/rules/memory/productContext.md (1)
277-279
: Consider using a more modern approach for asynchronous iteration.While the current implementation is functional, you might want to consider using async iterators or async generators for a more elegant approach to handling streaming data.
- let features = []; - let feature; - while ((feature = await iter.next()) !== null) { - features.push(feature); - } + // Using Array.from with an async iterator + const features = []; + for await (const feature of { + [Symbol.asyncIterator]() { + return { + async next() { + const value = await iter.next(); + return value === null ? { done: true } : { done: false, value }; + } + }; + } + }) { + features.push(feature); + }Alternatively, if your target environment supports it:
- let features = []; - let feature; - while ((feature = await iter.next()) !== null) { - features.push(feature); - } + const features = []; + let feature; + for await (feature of iter) { + features.push(feature); + }This assumes that
iter
implements the async iterable protocol or can be adapted to do so.src/rust/fcb_core/tests/http.rs (1)
39-69
: Consider removing commented-out codeThe original function has been commented out but preserved in the file. While this can be helpful during development for reference, it's generally better to remove commented-out code before merging to keep the codebase clean.
-// async fn read_http_file_bbox(path: &str) -> Result<(), Box<dyn Error>> { -// let http_reader = HttpFcbReader::open(path).await?; -// let minx = 84227.77; -// let miny = 445377.33; -// let maxx = 85323.23; -// let maxy = 446334.69; -// let mut iter = http_reader.select_bbox(minx, miny, maxx, maxy).await?; -// let header = iter.header(); -// let cj = to_cj_metadata(&header)?; -// -// // let mut writer = BufWriter::new(File::create("delft_http.city.jsonl")?); -// // writeln!(writer, "{}", serde_json::to_string(&cj)?)?; -// -// let mut feat_num = 0; -// let feat_count = header.features_count(); -// let mut features = Vec::new(); -// while let Some(feature) = iter.next().await? { -// let cj_feature = feature.cj_feature()?; -// features.push(cj_feature); -// // writeln!(writer, "{}", serde_json::to_string(&cj_feature)?)?; -// -// feat_num += 1; -// if feat_num >= feat_count { -// break; -// } -// } -// println!("cj: {:?}", cj); -// println!("features count: {:?}", features.len()); -// // TODO: add more tests -// Ok(()) -// }src/rust/fcb_core/benches/read_attr.rs (1)
183-192
: Consider keeping the baseline benchmark.The benchmark for reading without attribute index has been commented out. While focusing on the new implementations is important, retaining a baseline benchmark would provide valuable context for performance comparisons.
-// group.bench_with_input( -// BenchmarkId::new(format!("{} without", dataset), file_without), -// &file_without, -// |b, &path| { -// b.iter(|| { -// read_fcb_without_attr_index(path).unwrap(); -// }) -// }, -// ); +group.bench_with_input( + BenchmarkId::new(format!("{} without index (baseline)", dataset), file_without), + &file_without, + |b, &path| { + b.iter(|| { + read_fcb_without_attr_index(path).unwrap(); + }) + }, +);src/rust/fcb_core/src/writer/attribute.rs (1)
321-321
: Reevaluate the fallback epoch for invalid dates
Defaulting to Unix epoch when parsing fails can mask data problems. Consider skipping the attribute or storing a null-like value.src/rust/fcb_core/src/reader/attr_query.rs (4)
38-66
: Deserializing indexes withBufferedIndex
InvokingBufferedIndex::<T>::deserialize
for each type is straightforward. Keep an eye on potential large-memory overhead for very big indexes, and ensure error messages remain clear when deserialization fails.
70-71
: Skipping index logs
Printing directly to stdout can clutter output. Consider using a debug-level logger or structured logging for more controlled verbosity.
78-86
: Building the query conditions
ConvertingAttrQuery
intoQueryCondition
is concise. Watch for potential user input errors or unknown column names; the current approach assumes they are valid.
211-245
: Avoid duplicating query loading logic
select_attr_query_seq
shares much of the same flow asselect_attr_query
. Consider refactoring to a shared helper to keep the code DRY.src/rust/fcb_core/tests/attr_index.rs (1)
119-215
:test_attr_index_seq
for sequential queries
Reusing similar logic to verify sequential query capacities is ideal, though you could extract shared setup code to reduce duplication.src/rust/bst/src/byte_serializable.rs (1)
79-159
: Solid type-to-ID conversion with minor redundancy.
Theto_bytes
,from_bytes
, andfrom_type_id
methods comprehensively handle valid type IDs and produce meaningful errors for invalid ones. Consider storing these IDs in a table or using an associated constant to reduce boilerplate.src/rust/fcb_core/src/writer/attr_index.rs (1)
41-41
: Rename variable for clarity.
The identifiersorted_index
no longer aligns with itsBufferedIndex
type. Consider renaming it tobuffered_index
.- let mut sorted_index = BufferedIndex::new(); + let mut buffered_index = BufferedIndex::new();src/rust/bst/README.md (2)
112-112
: Refine grammar for better clarity.Static analysis suggests the phrase around this line may contain a grammatical inconsistency. Consider reviewing and adjusting the phrasing to ensure the noun's number matches the context. For instance, if you wrote "queries - Fully type-aware..." ensure the preceding or following noun and verbs maintain coherent number agreement.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~112-~112: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...queries - Fully type-aware with generic parameterT
- Implements bothSearchableIndex
...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
355-364
: Add a language specifier to the fenced code block.MarkdownLint (MD040) indicates that fenced code blocks should specify a language. Update it as follows:
-``` +[rust]🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
355-355: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
src/rust/cli/src/main.rs (2)
81-186
: Validate edge cases for bounding box filtering and ge flag.The code handles parsing errors and warns if no features are found, which is solid. As an optional enhancement, consider verifying if the ge flag is redundant when there are zero features (e.g., skipping computation) or caching repeated calls to
get_vertices_from_feature
for performance if needed. Otherwise, this is well-structured.
266-295
: Solid approach for computing geospatial extent.The function correctly defaults to [0.0; 6] if no vertices exist. Consider adding a log (or one-time warning) when no vertices are found, in case that indicates an unexpected empty dataset.
src/rust/bst/src/sorted_index.rs (2)
242-242
: Consider adjusting async trait usage.Pipeline warnings suggest that using
async fn
in public traits can cause bounds issues. If feasible, consider an alternative approach (e.g., returning a future type, adopting a dedicated async trait library, or using a separate async wrapper) to avoid friction with auto trait bounds.Also applies to: 252-252
🧰 Tools
🪛 GitHub Actions: Rust CI
[warning] 242-242: use of
async fn
in public traits is discouraged as auto trait bounds cannot be specified
564-570
: Resolve unused variables in HTTP streaming code.Variables such as
client
,index_offset
,key
,lower
, andupper
trigger warnings. Prefixing them with an underscore (e.g.,_client
) or removing them entirely if the stubs remain unimplemented will fix these warnings in the pipeline.Apply a diff similar to:
#[cfg(feature = "http")] async fn http_stream_query_exact<C: http_range_client::AsyncHttpRangeClient>( &self, - client: &mut http_range_client::AsyncBufferedHttpRangeClient<C>, - index_offset: usize, - key: &T, ) -> std::io::Result<Vec<ValueOffset>> { unimplemented!("Type-aware HTTP streaming not yet implemented") }Also applies to: 575-582
🧰 Tools
🪛 GitHub Actions: Rust CI
[warning] 564-564: unused variable:
client
[warning] 565-565: unused variable:
index_offset
[warning] 566-566: unused variable:
key
src/rust/bst/src/query.rs (3)
47-47
: Consider Memory Usage & Trait Object Overheads
Storing indices in aHashMap<String, Box<dyn SearchableIndex>>
is fine for flexibility. However, if you have many fields/indices, consider whether reference counting (Arc<dyn SearchableIndex>
) or a more concrete generic approach could reduce overhead or complexity.
175-202
: HTTP Streaming Placeholder
Thehttp_stream_query
function is currently a placeholder (todo!()
). Implementation or removal is recommended to avoid confusion. If there are near-future plans to implement, consider adding a clear comment or tracking issue.🧰 Tools
🪛 GitHub Actions: Rust CI
[warning] 191-191: unused variable:
client
[warning] 193-193: unused variable:
index_offsets
[warning] 194-194: unused variable:
feature_begin
466-467
: Follow Up on the "TODO: Fix me!!!!!!"
The comment indicates some intention to refine or rework the key comparison logic for DateTime or other types. Ensure the partial code for datetime comparison is finalized, especially if you have multi-regional or timezone nuances.Do you want me to help finalize the
compare_keys
method for robust date/time comparisons?src/rust/fcb_core/src/http_reader/mod.rs (2)
158-158
: Potential Edge Case for Attribute Index Size Overflow
UsingError::AttributeIndexSizeOverflow
here is good, but be mindful of user-friendly error reporting when the attribute index sections become unexpectedly large. You might also want to log or track the underlying cause for easier debugging.
191-191
: Fix Unused Variable Warnings
Rust CI warns about unused variables (client
,query
,index_offset
,feature_begin
) at these lines. If these variables will be used in future code, consider ignoring them with_unused_var
naming or removing them if not needed.- pub async fn http_stream_query<T: AsyncHttpRangeClient>( - &self, - client: &mut AsyncBufferedHttpRangeClient<T>, - query: &Query, - index_offsets: &HashMap<String, usize>, - feature_begin: usize, - ) -> std::io::Result<Vec<HttpSearchResultItem>> { - // If there are no conditions, return an empty result. - if query.conditions.is_empty() { - return Ok(Vec::new()); - } - todo!() - } + pub async fn http_stream_query<T: AsyncHttpRangeClient>( + &self, + _client: &mut AsyncBufferedHttpRangeClient<T>, + _query: &Query, + _index_offsets: &HashMap<String, usize>, + _feature_begin: usize, + ) -> std::io::Result<Vec<HttpSearchResultItem>> { + // If there are no conditions, return an empty result. + return Ok(Vec::new()); // or todo!() if not implemented + }Also applies to: 742-742, 743-743, 744-744, 745-745
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
src/ts/fcb_wasm_bg.wasm
is excluded by!**/*.wasm
📒 Files selected for processing (29)
.cursor/rules/general.mdc
(3 hunks).cursor/rules/memory/productContext.md
(2 hunks).cursor/rules/memory/tdd-rust-guidelines.md
(1 hunks).cursor/rules/mpc.md
(0 hunks).gitignore
(1 hunks)src/rust/bst/Cargo.toml
(2 hunks)src/rust/bst/README.md
(1 hunks)src/rust/bst/src/byte_serializable.rs
(17 hunks)src/rust/bst/src/error.rs
(1 hunks)src/rust/bst/src/lib.rs
(4 hunks)src/rust/bst/src/query.rs
(5 hunks)src/rust/bst/src/sorted_index.rs
(6 hunks)src/rust/cli/Cargo.toml
(1 hunks)src/rust/cli/src/main.rs
(7 hunks)src/rust/fcb_core/benches/read_attr.rs
(6 hunks)src/rust/fcb_core/src/bin/write.rs
(1 hunks)src/rust/fcb_core/src/error.rs
(2 hunks)src/rust/fcb_core/src/http_reader/mod.rs
(17 hunks)src/rust/fcb_core/src/reader/attr_query.rs
(4 hunks)src/rust/fcb_core/src/writer/attr_index.rs
(5 hunks)src/rust/fcb_core/src/writer/attribute.rs
(3 hunks)src/rust/fcb_core/src/writer/geom_encoder.rs
(0 hunks)src/rust/fcb_core/src/writer/header_writer.rs
(2 hunks)src/rust/fcb_core/src/writer/serializer.rs
(2 hunks)src/rust/fcb_core/tests/attr_index.rs
(1 hunks)src/rust/fcb_core/tests/e2e.rs
(1 hunks)src/rust/fcb_core/tests/http.rs
(3 hunks)src/rust/fcb_core/tests/read.rs
(2 hunks)src/rust/wasm/src/lib.rs
(2 hunks)
💤 Files with no reviewable changes (2)
- .cursor/rules/mpc.md
- src/rust/fcb_core/src/writer/geom_encoder.rs
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
.cursor/rules/memory/tdd-rust-guidelines.md
27-27: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
28-28: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
31-31: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
32-32: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
33-33: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
36-36: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
41-41: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
44-44: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
45-45: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
51-51: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
52-52: Unordered list indentation
Expected: 2; Actual: 5
(MD007, ul-indent)
53-53: Unordered list indentation
Expected: 2; Actual: 5
(MD007, ul-indent)
54-54: Unordered list indentation
Expected: 2; Actual: 5
(MD007, ul-indent)
60-60: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
src/rust/bst/README.md
355-355: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🪛 GitHub Actions: Rust CI
src/rust/bst/src/sorted_index.rs
[warning] 564-564: unused variable: client
[warning] 565-565: unused variable: index_offset
[warning] 566-566: unused variable: key
[warning] 575-575: unused variable: client
[warning] 576-576: unused variable: index_offset
[warning] 577-577: unused variable: lower
[warning] 578-578: unused variable: upper
[warning] 242-242: use of async fn
in public traits is discouraged as auto trait bounds cannot be specified
[warning] 252-252: use of async fn
in public traits is discouraged as auto trait bounds cannot be specified
[error] 342-342: can't find crate for std
[warning] 1-1: unsafe function's docs are missing a # Safety
section
src/rust/bst/src/query.rs
[warning] 191-191: unused variable: client
[warning] 193-193: unused variable: index_offsets
[warning] 194-194: unused variable: feature_begin
[warning] 742-742: unused variable: client
[warning] 743-743: unused variable: query
[warning] 744-744: unused variable: index_offset
[warning] 745-745: unused variable: feature_begin
🪛 LanguageTool
src/rust/bst/README.md
[uncategorized] ~112-~112: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...queries - Fully type-aware with generic parameter T
- Implements both SearchableIndex
...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
🔇 Additional comments (110)
.gitignore (1)
68-68
: Looks good: Ignoring.cursor/mcp.json
file.Adding
.cursor/mcp.json
to the gitignore list is appropriate as it appears to be a local configuration file that shouldn't be tracked in version control..cursor/rules/memory/tdd-rust-guidelines.md (1)
1-65
: Great addition of TDD guidelines for Rust development.The document provides a comprehensive overview of Test-Driven Development practices specific to Rust, covering the Red-Green-Refactor cycle, test naming conventions, and essential tools. This will be valuable for maintaining consistent development practices.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
27-27: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
28-28: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
31-31: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
32-32: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
33-33: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
36-36: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
41-41: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
44-44: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
45-45: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
51-51: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
52-52: Unordered list indentation
Expected: 2; Actual: 5(MD007, ul-indent)
53-53: Unordered list indentation
Expected: 2; Actual: 5(MD007, ul-indent)
54-54: Unordered list indentation
Expected: 2; Actual: 5(MD007, ul-indent)
60-60: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
.cursor/rules/memory/productContext.md (1)
248-280
: Updated HTTP Range Requests implementation with streaming feature support.The JavaScript implementation has been improved to support streaming processing of binary search tree features. Key improvements include:
- Added WasmAttrQuery to imports
- Added proper WASM module initialization
- Updated to use the new HttpFcbReader API
- Implemented streaming iteration over fetched features
These changes align with the PR objective to support streaming processing of binary search tree data.
.cursor/rules/general.mdc (4)
4-4
: Configuration change: alwaysApply set to true.Changing
alwaysApply
fromfalse
totrue
means these rules will now automatically be applied to all relevant files without requiring explicit activation.
14-14
: New documentation update guideline added.Good addition of a guideline to ensure documentation files are kept updated. This aligns well with the project's emphasis on maintaining comprehensive documentation.
27-41
: Added direct links to core documentation files.Adding explicit Markdown links to the core documentation files improves navigation and makes it easier to maintain the documentation structure.
41-43
: New TDD guidelines reference added to core files.The addition of the TDD guidelines to the core files section is appropriate and consistent with the project's documentation structure. This ensures that development practices are standardized across the project.
src/rust/cli/Cargo.toml (1)
8-8
: LGTM: Adding cjseq dependencyAdding the
cjseq
dependency with workspace configuration aligns with the PR's goal of supporting streaming processing for binary search trees. This will provide necessary functionality for the CLI tool's new spatial filtering and geospatial extent calculation features.src/rust/fcb_core/tests/e2e.rs (1)
55-55
: LGTM: Adding geographical_extent field to HeaderWriterOptionsThe addition of the
geographical_extent: None
field is consistent with other changes across the codebase and properly extends the HeaderWriterOptions struct to support the new geospatial functionality.src/rust/fcb_core/src/bin/write.rs (1)
33-33
: LGTM: Adding geographical_extent field to HeaderWriterOptionsThe addition of the
geographical_extent: None
field is consistent with the changes in other files and properly extends the structure to support geospatial extent functionality.src/rust/fcb_core/tests/read.rs (2)
42-42
: LGTM: Adding geographical_extent field to HeaderWriterOptionsThe addition of the
geographical_extent: None
field is consistent with the changes in other files and properly extends the data structure to support geospatial functionality in the read_bbox test.
147-147
: LGTM: Adding geographical_extent field to HeaderWriterOptionsThe addition of the
geographical_extent: None
field is consistent with other instances in the codebase and properly extends the data structure to support geospatial functionality in the read_bbox_nonseekable test.src/rust/fcb_core/src/writer/header_writer.rs (2)
32-33
: Good addition of geographical extent supportThe addition of the
geographical_extent
field to theHeaderWriterOptions
struct is a good enhancement that will support geospatial functionality. The typeOption<[f64; 6]>
is appropriate for representing a 3D bounding box with min/max coordinates.
43-43
: Default implementation properly updatedThe default implementation has been correctly updated to initialize the new
geographical_extent
field toNone
.src/rust/fcb_core/src/error.rs (2)
55-60
: Good error variants for index and query operationsThe addition of
IndexCreationError
andQueryExecutionError
variants improves error handling granularity for the binary search tree streaming functionality. These specific error types will help with better error reporting and handling.
97-103
: Useful helper method for error classificationThe
is_index_error
method is a helpful addition that consolidates the different index-related errors, making it easier to determine if an error is related to indexing or querying operations.src/rust/fcb_core/tests/http.rs (2)
9-13
: Updated bounding box coordinates for testingThe bounding box coordinates have been updated, likely to better test the new streaming functionality with more appropriate test data.
138-139
:✅ Verification successful
Updated test data URL
The test is now using a different dataset (
3dbag_100k.fcb
instead ofdelft_attr.fcb
). Make sure this dataset is consistently available for CI/CD.
🏁 Script executed:
#!/bin/bash # Verify that the test URL is accessible curl -s -I "https://storage.googleapis.com/flatcitybuf/3dbag_100k.fcb" | grep "HTTP"Length of output: 94
Test Data URL Verified
The updated test now references the correct dataset (
3dbag_100k.fcb
), and we've confirmed via a curl check that the URL returns an HTTP 200 status. Please ensure that this dataset remains consistently available in CI/CD environments.src/rust/bst/src/error.rs (2)
1-1
: Added necessary import for new error variantsThe addition of the
ByteSerializableType
import is appropriate as it's used in the new error variants.
8-15
: Enhanced error handling for binary search tree operationsThe new error variants significantly improve type safety and error reporting for the BST module. The
TypeMismatch
variant is particularly useful as it captures both the expected and actual types, making debugging easier.src/rust/fcb_core/src/writer/serializer.rs (3)
67-72
: Good enhancement for geographical extent prioritization.This addition properly extracts the geographical extent from the header options, adding flexibility to the FCB header creation process.
79-84
: Well-implemented fallback pattern.The code now prioritizes the geographical extent from header options before falling back to metadata, following a clean pattern of "use provided value or fallback to default."
161-161
: Consistent handling in the metadata-absent case.This change ensures that geographical extent from options is properly used even when metadata is not present, maintaining consistency with the updated logic.
src/rust/wasm/src/lib.rs (1)
72-76
: Simplified logger initialization.The logger initialization logic has been streamlined into a single conditional, which is more concise and easier to read.
src/rust/bst/Cargo.toml (3)
8-8
: Updated dependencies for streaming support.Good refactoring of the
http
feature to use more appropriate dependencies for HTTP streaming functionality.
18-20
: New dependencies for async HTTP operations.The addition of these optional dependencies properly supports the streaming processing mentioned in the PR objectives.
22-26
: Dev dependencies for testing async code.The added dev dependencies will help with testing the new streaming functionality, especially with the Tokio runtime for async tests.
src/rust/fcb_core/benches/read_attr.rs (4)
51-108
: Renamed and updated seekable reader benchmark.The function has been appropriately renamed to clarify it uses a seekable reader implementation and now tests attribute queries with numeric range conditions rather than exact string matches.
110-167
: Added non-seekable reader benchmark.Good addition of a benchmark for the non-seekable implementation, which helps compare performance between the two approaches.
193-216
: Clear benchmark naming for the two implementations.The benchmark names clearly distinguish between the seekable (streamable) and non-seekable (sequential) implementations, which will make the performance results easier to interpret.
223-224
: Improved results output formatting.The updated output formatting provides clearer information about the benchmark methods being compared.
src/rust/fcb_core/src/writer/attribute.rs (2)
3-3
: Use ofDateTime<Utc>
Switching fromNaiveDateTime
toDateTime<Utc>
is a robust way to handle time zones and avoid ambiguity.
316-316
: UTC conversion for parsedDateTime
Usingdt.to_utc()
to convert fromFixedOffset
toUtc
is correct and consistent with the rest of the code.src/rust/fcb_core/src/reader/attr_query.rs (9)
1-2
: New imports forResult
andHashMap
Importinganyhow::Result
andstd::collections::HashMap
is standard and aligns with the error-handling approach.
6-10
: Use ofBufferedIndex
and re-exports
AdoptingBufferedIndex
is a logical transition for in-memory indexing, and re-exporting relevant items ensures a clear public API.
99-101
: Error on empty attribute index
Explicitly returning an error if no attribute index entries exist avoids silent failures during queries.
103-104
: Sorting attribute index entries
Sorting by column index ensures predictable traversal for subsequent steps.
145-155
: CreatingStreamableMultiIndex
The error handling and fallback viaIndexCreationError
is clean, ensuring that users see a clear failure mode when something goes wrong.
157-166
: Streaming query result handling
Catching and returning errors fromstream_query
helps maintain robust error propagation.
170-173
: Sorting query results
Sorting the feature IDs before returning them can be beneficial for consistent output ordering.
182-183
: Track total features in result
Usingresult_vec.len()
to settotal_feat_count
accurately reflects the number of matched features.
109-145
:❓ Verification inconclusive
Index offset calculations and R-Tree skipping
Populatingindex_offsets
by iterating over sorted entries is sensible. Confirm that every declared attribute index can be matched to a column without triggeringAttributeIndexNotFound
.
🏁 Script executed:
#!/bin/bash # Searches for columns vs. index usage to ensure alignment. ast-grep --pattern $'Column($_)'Length of output: 33
Manual Verification Needed: Confirm Attribute Index Mapping
The logic for computing
index_offsets
appears sound. However, our initial verification using an AST search forColumn($_)
returned no output, so it remains unclear whether the declared attribute indices align with the corresponding columns. Please manually verify that every attribute index found inattr_index_entries
has a matching column entry incolumns
—otherwise, theError::AttributeIndexNotFound
may be triggered at runtime.
- Double-check that the codebase contains the expected formal declarations or usages of
Column
(or an equivalent identifier) to ensure that the.find(|col| col.index() == column_idx)
lookup is reliable.- If these identifiers aren’t present or are named differently, update the verification query or add tests to assert the correct mapping.
src/rust/fcb_core/tests/attr_index.rs (3)
17-21
: Test module extraction
Encapsulating tests inmod tests
with relevant imports clarifies scope and keeps test code organized.
22-117
:test_attr_index
integration test
This test checks whether the attribute schema and query logic work end-to-end, verifying expected attributes. The coverage looks good.
217-425
: Multiple query test coverage
The table-driven approach here increases confidence in your indexing and querying. Good thoroughness in checking various conditions.src/rust/bst/src/byte_serializable.rs (23)
4-5
: Import usage confirmed.
The added import forcrate::error
is correctly utilized in the subsequent code.
16-17
: New method well-documented.
Addingfn value_type(&self) -> ByteSerializableType;
and its doc comment helps clarify the variant’s type.
39-59
: Collectiveto_bytes
match looks good.
Each variant delegates to its specificto_bytes
implementation properly. Consider ensuring test coverage for every enum variant.
61-77
: Enum for type representation is appropriate.
DefiningByteSerializableType
explicitly is a clear way to handle type IDs.
171-173
: Correct type indicator for i64.
Matches theByteSerializableType::I64
variant correctly.
185-187
: Correct type indicator for i32.
Matches theByteSerializableType::I32
variant correctly.
199-201
: Correct type indicator for i16.
Matches theByteSerializableType::I16
variant correctly.
211-213
: Correct type indicator for i8.
Matches theByteSerializableType::I8
variant correctly.
225-226
: Correct type indicator for u64.
Matches theByteSerializableType::U64
variant correctly.
238-239
: Correct type indicator for u32.
Matches theByteSerializableType::U32
variant correctly.
252-253
: Correct type indicator for u16.
Matches theByteSerializableType::U16
variant correctly.
264-265
: Correct type indicator for u8.
Matches theByteSerializableType::U8
variant correctly.
276-277
: Correct type indicator for String.
Matches theByteSerializableType::String
variant correctly.
290-291
: Correct type indicator for f64.
Matches theByteSerializableType::F64
variant correctly.
301-305
: Verify default return for empty f32 slices.
Returning0.0
may obscure invalid data. Consider returning an error or clarifying why 0.0 is a safe fallback.Would you like a script to search for all callsites and confirm whether empty slices are handled properly?
311-313
: Correct type indicator for f32.
Matches theByteSerializableType::F32
variant.
328-330
: Float type indicator is correct.
Successfully aligns withByteSerializableType::F64
.
341-345
: Verify default return for empty Float slices.
ReturningOrderedFloat(0.0)
may mask incorrect scenarios. Consider returning an error or clarifying use cases.Would you like a script to verify usage of
from_bytes
to ensure no unintended conversions?
352-353
: Correct type indicator for Float.
MatchesByteSerializableType::F32
appropriately.
365-367
: Correct type indicator for bool.
Maps toByteSerializableType::Bool
.
392-394
: Correct type indicator for NaiveDateTime.
MatchesByteSerializableType::NaiveDateTime
.
427-429
: Correct type indicator for NaiveDate.
MatchesByteSerializableType::NaiveDate
.
444-446
: Correct type indicator for DateTime.
MatchesByteSerializableType::DateTime
accurately.src/rust/fcb_core/src/writer/attr_index.rs (9)
3-4
: Imports match updated indexing strategy.
Switching toBufferedIndex
is consistent with the rest of the codebase.
21-21
: Lifetime requirement is sensible.
Enforcing'static
ensures safe usage ofT
in index storage.
63-75
: Boolean index logic looks good.
Matches schema index and extracts the correct value.
169-169
: UTC-based DateTime.
Swapping outNaiveDateTime
forDateTime<Utc>
is valid if the timezone context is needed.
181-193
: Short (i16) indexing is consistent.
Builds and registers the type properly.
195-207
: UShort (u16) indexing is consistent.
Implementation aligns well withbuild_index_generic
.
208-219
: Verify signed vs unsigned “Byte” logic.
Current code usesu8
forByte
, which potentially conflicts with the usual assumption of signedness.Are we certain that “Byte” is always unsigned in your domain? Otherwise, consider using
i8
.
221-232
: Verify duplication between Byte and UByte.
Both map to the sameu8
type. This may create ambiguity in usage.Shall we unify them under one variant or differentiate them further?
234-245
: Indexing JSON as String is acceptable.
Storing JSON text is fine for textual queries. Make sure it aligns with your search needs.src/rust/bst/src/lib.rs (6)
13-13
: Import withBufferedIndex
is aligned.
Removing references toSortedIndex
fosters consistency.
128-134
: Switch fromSortedIndex
toBufferedIndex
is correct.
Index creation calls reflect the renamed indexing abstraction.
288-297
: Additional query for< 30.6
is valid.
Expands the test coverage for the bounding condition.
298-307
: Additional query for<= 30.6
flows correctly.
Checks inclusive boundary and helps ensure thorough coverage.
355-357
: Serialization test indices updated toBufferedIndex
.
Ensures test alignment with the new index abstraction.
365-367
: Deserialization calls matchBufferedIndex
.
ReplacingSortedIndex
methods closes the loop on the rename.src/rust/bst/README.md (1)
1-432
: Overall documentation clarity is excellent.The multi-layered explanation with mermaid diagrams and structured headings provides a thorough overview. The architecture, data flows, and future enhancements are well-documented. Great job!
🧰 Tools
🪛 LanguageTool
[uncategorized] ~112-~112: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...queries - Fully type-aware with generic parameterT
- Implements bothSearchableIndex
...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
🪛 markdownlint-cli2 (0.17.2)
355-355: Fenced code blocks should have a language specified
null(MD040, fenced-code-language)
src/rust/cli/src/main.rs (6)
39-45
: Excellent addition of bounding box (bbox) and geospatial extent (ge) parameters.This new interface is flexible for users who only want to filter data or also compute geospatial extents. Good parameter naming and usage of Option types to handle the presence or absence of arguments.
188-214
: Parse function looks robust.
parse_bbox
carefully checks the component count, numeric parsing, and range validity. This is well done. No further issues spotted.
216-234
: Efficient transformation of vertex data.Using the transform’s scale and translate arrays ensures correctness. Ensure the upstream logic consistently sets
vertex
with at least three components. If there's a possibility of incomplete vertex data, you might add additional safeguards or checks.
235-247
: Bounding box intersection check is straightforward.
feature_intersects_bbox()
delegates logic effectively. Be sure that the transform always produces valid 3D coordinates for all city objects. If uncertain, consider verifying the dimension consistency of the CityJSON data.
249-259
: City object intersection logic is concise.Looping through vertices and checking with
point_in_bbox_2d()
is a clear approach. Looks good.
261-264
: Lightweight check for 2D bounding box.This conditional is straightforward and performs as expected.
src/rust/bst/src/sorted_index.rs (14)
1-1
: Address pipeline warnings about missing# Safety
section and crate references.The pipeline logs indicate “unsafe function's docs are missing a
# Safety
section,” referencing line 1. Even though this file doesn't appear to contain an explicit unsafe function, double-check if there's an external or macro-generated unsafe context. Also ensure your Rust toolchain or environment is configured so thestd
crate is located. The “can't find crate forstd
” error may indicate a toolchain misconfiguration.🧰 Tools
🪛 GitHub Actions: Rust CI
[warning] 1-1: unsafe function's docs are missing a
# Safety
section
16-19
: Good renaming to reflect memory-based indexing.Renaming
SortedIndex
toBufferedIndex
clarifies that data is fully loaded in memory for smaller datasets or faster queries.
30-43
: Efficient build_index approach.Sorting
data
once and assigning it toself.entries
ensures an efficient build process. This is well done.
45-56
: Object-safe design forSearchableIndex
.Using byte-based queries is an elegant solution for dynamic dispatch scenarios. The boundary between typed/untyped queries is cleanly separated. Looks good.
58-74
: TypedSearchableIndex trait for strongly-typed queries.A well-structured separation of typed vs. serialized approaches. This fosters clarity and ensures each environment has the appropriate abstraction.
76-121
: Implementation details for typed queries.Binary searching the sorted
entries
is correct. This code is straightforward and consistent with the typed trait approach.
123-139
: Bridging typed logic to the byte-based interface.
query_exact_bytes
andquery_range_bytes
effectively convert between bytes and typed calls. No issues spotted.
144-150
: Serialization and deserialization adjustments.These methods correctly handle type identifiers, lengths, and offsets. The usage of custom
error::Error
fosters consistent error reporting.Also applies to: 152-213
215-259
: TypedStreamableIndex trait and stubs for HTTP usage.Streaming and partial reading approaches are well-structured. Implementing placeholders for future HTTP functionality is fine, just keep pipeline warnings in mind for the unused variables.
🧰 Tools
🪛 GitHub Actions: Rust CI
[warning] 242-242: use of
async fn
in public traits is discouraged as auto trait bounds cannot be specified
[warning] 252-252: use of
async fn
in public traits is discouraged as auto trait bounds cannot be specified
261-299
: IndexMeta structure and from_reader method.Retaining only metadata plus a method to read the type ID and entry count is a good design for streaming scenarios. Great job.
301-344
: Entry-level seeking logic.Ensures that the index format can be traversed incrementally. The “out of bounds” error is also well-handled.
🧰 Tools
🪛 GitHub Actions: Rust CI
[error] 342-342: can't find crate for
std
346-390
: Binary-search-based bounds scanning.These methods systematically handle seeking and reading key bytes to compare. The approach is correct for large indexes beyond memory size.
Also applies to: 392-435, 437-469
471-583
: Comprehensive streaming queries.Automatic handling for both exact and range queries with straightforward fallbacks is well integrated. Consider either removing or marking the unimplemented sections for the HTTP case to eliminate warnings.
🧰 Tools
🪛 GitHub Actions: Rust CI
[warning] 564-564: unused variable:
client
[warning] 565-565: unused variable:
index_offset
[warning] 566-566: unused variable:
key
[warning] 575-575: unused variable:
client
[warning] 576-576: unused variable:
index_offset
[warning] 577-577: unused variable:
lower
[warning] 578-578: unused variable:
upper
585-1019
: Test suite is thorough and well-documented.Your coverage of exact and range queries across multiple data types demonstrates reliability. The performance comparison test is a nice touch for measuring in-memory vs. streaming overhead.
src/rust/bst/src/query.rs (3)
15-15
: Good Use ofPartialEq
andEq
Derives on theOperator
Enum
Allowing equality checks onOperator
is especially helpful for comparing operators in multiple places, such as query-building logic.
137-173
: Efficient Streaming Query Implementation
Thestream_query
method provides a nice pattern for reading only the required indices from disk. The approach of filteringindex_offsets
and constructing aStreamableMultiIndex
on demand is well-structured.
765-989
: Impressive Test Coverage
The test module thoroughly checks scenarios (e.g. exact matches, range queries, combined queries). This coverage is excellent for guaranteeing correctness of the streaming multi-index logic. Keep it up!src/rust/fcb_core/src/http_reader/mod.rs (3)
60-60
: Improved Error Handling on Initialization
These changes use your customError
type instead of a generic error, which adds clarity and consistency. Great job aligning initialization with more specific errors likeMissingMagicBytes
andIllegalHeaderSize
.Also applies to: 68-68, 72-72
197-202
: Graceful Bounds Checking Withselect_bbox
Falling back toErr(Error::NoIndex)
when no R-Tree is present orfeatures_count
is 0 is clear. Confirm that all callers handle theNoIndex
scenario gracefully, especially if older data might lack an index.Do you want a quick script to locate all
select_bbox
calls to ensure they handleNoIndex
?
337-337
: Consistent Result Handling in Async Methods
Bothnext
andcur_cj_feature
returningResult<..., Error>
unify the error model. This is valuable for robust error flows. No further concerns here!Also applies to: 353-353
### 1. **Static Analysis & Linting** | ||
- Run `cargo check` for type checking and borrow checking. | ||
- Use `cargo clippy` to detect potential issues and enforce best practices. | ||
|
||
### 2. **Dead Code Detection & Removal** | ||
- Run `cargo deadlinks` to check for dead documentation links. | ||
- Use `cargo udeps` to find unused dependencies. | ||
- Run `cargo rustc -- -W dead_code` to detect unused functions. | ||
|
||
### 3. **Code Coverage Analysis** | ||
- Install `cargo-tarpaulin` for test coverage measurement: | ||
```bash | ||
cargo install cargo-tarpaulin | ||
cargo tarpaulin --out html | ||
``` | ||
- Open the generated HTML report to review coverage. | ||
|
||
### 4. **Version Control with Git** | ||
- Commit after each phase (test creation → implementation → refactoring). | ||
- Review changes before committing: | ||
```bash | ||
git status # Check modified files | ||
git add <relevant files> | ||
git commit -m "<appropriate commit message>" | ||
``` | ||
- Use commit prefixes for clarity: | ||
- `test:` - Adding or modifying tests | ||
- `feat:` - Implementing new features | ||
- `refactor:` - Code refactoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Markdown list indentation issues.
The current indentation of unordered lists doesn't follow standard Markdown formatting conventions, which is causing linting errors.
-### 1. **Static Analysis & Linting**
- - Run `cargo check` for type checking and borrow checking.
- - Use `cargo clippy` to detect potential issues and enforce best practices.
+### 1. **Static Analysis & Linting**
+- Run `cargo check` for type checking and borrow checking.
+- Use `cargo clippy` to detect potential issues and enforce best practices.
-### 2. **Dead Code Detection & Removal**
- - Run `cargo deadlinks` to check for dead documentation links.
- - Use `cargo udeps` to find unused dependencies.
- - Run `cargo rustc -- -W dead_code` to detect unused functions.
+### 2. **Dead Code Detection & Removal**
+- Run `cargo deadlinks` to check for dead documentation links.
+- Use `cargo udeps` to find unused dependencies.
+- Run `cargo rustc -- -W dead_code` to detect unused functions.
-### 3. **Code Coverage Analysis**
- - Install `cargo-tarpaulin` for test coverage measurement:
+### 3. **Code Coverage Analysis**
+- Install `cargo-tarpaulin` for test coverage measurement:
-### 4. **Version Control with Git**
- - Commit after each phase (test creation → implementation → refactoring).
- - Review changes before committing:
+### 4. **Version Control with Git**
+- Commit after each phase (test creation → implementation → refactoring).
+- Review changes before committing:
- - Use commit prefixes for clarity:
- - `test:` - Adding or modifying tests
- - `feat:` - Implementing new features
- - `refactor:` - Code refactoring
+- Use commit prefixes for clarity:
+ - `test:` - Adding or modifying tests
+ - `feat:` - Implementing new features
+ - `refactor:` - Code refactoring
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
### 1. **Static Analysis & Linting** | |
- Run `cargo check` for type checking and borrow checking. | |
- Use `cargo clippy` to detect potential issues and enforce best practices. | |
### 2. **Dead Code Detection & Removal** | |
- Run `cargo deadlinks` to check for dead documentation links. | |
- Use `cargo udeps` to find unused dependencies. | |
- Run `cargo rustc -- -W dead_code` to detect unused functions. | |
### 3. **Code Coverage Analysis** | |
- Install `cargo-tarpaulin` for test coverage measurement: | |
```bash | |
cargo install cargo-tarpaulin | |
cargo tarpaulin --out html | |
``` | |
- Open the generated HTML report to review coverage. | |
### 4. **Version Control with Git** | |
- Commit after each phase (test creation → implementation → refactoring). | |
- Review changes before committing: | |
```bash | |
git status # Check modified files | |
git add <relevant files> | |
git commit -m "<appropriate commit message>" | |
``` | |
- Use commit prefixes for clarity: | |
- `test:` - Adding or modifying tests | |
- `feat:` - Implementing new features | |
- `refactor:` - Code refactoring | |
### 1. **Static Analysis & Linting** | |
- Run `cargo check` for type checking and borrow checking. | |
- Use `cargo clippy` to detect potential issues and enforce best practices. | |
### 2. **Dead Code Detection & Removal** | |
- Run `cargo deadlinks` to check for dead documentation links. | |
- Use `cargo udeps` to find unused dependencies. | |
- Run `cargo rustc -- -W dead_code` to detect unused functions. | |
### 3. **Code Coverage Analysis** | |
- Install `cargo-tarpaulin` for test coverage measurement: |
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
27-27: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
28-28: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
31-31: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
32-32: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
33-33: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
36-36: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
41-41: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
44-44: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
45-45: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
51-51: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
52-52: Unordered list indentation
Expected: 2; Actual: 5
(MD007, ul-indent)
53-53: Unordered list indentation
Expected: 2; Actual: 5
(MD007, ul-indent)
54-54: Unordered list indentation
Expected: 2; Actual: 5
(MD007, ul-indent)
// let result = bst::stream_query(&multi_index, query, feature_begin) | ||
// .await | ||
// .map_err(|e| JsValue::from_str(&e.to_string()))?; | ||
// TODO: remove this | ||
let result: Vec<HttpSearchResultItem> = vec![]; | ||
|
||
let count = result.len(); | ||
let combine_request_threshold = 256 * 1024; | ||
|
||
let http_ranges: Vec<HttpRange> = result.into_iter().map(|item| item.range).collect(); | ||
// let http_ranges: Vec<HttpRange> = result.into_iter().map(|item| item.range).collect(); | ||
// TODO: remove this | ||
let http_ranges: Vec<HttpRange> = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete implementation for streaming queries.
The code contains commented-out sections with TODOs that will need to be properly addressed. The current implementation replaces actual functionality with empty vectors, which will not provide the expected behavior for attribute queries.
Consider either:
- Implementing the proper streaming functionality, or
- Removing these placeholders and adding a clear status comment indicating the feature is not yet supported
-// let result = bst::stream_query(&multi_index, query, feature_begin)
-// .await
-// .map_err(|e| JsValue::from_str(&e.to_string()))?;
-// TODO: remove this
-let result: Vec<HttpSearchResultItem> = vec![];
+// Stream query functionality not yet implemented in this version
+let result = bst::query(&multi_index, query)
+ .map_err(|e| JsValue::from_str(&e.to_string()))?;
+let http_ranges: Vec<HttpRange> = result.into_iter()
+ .map(|item| HttpRange::Range(feature_begin + item.pos as usize..feature_begin + item.pos as usize + item.len as usize))
+ .collect();
DateTime { index: u16, val: DateTime<Utc> }, | ||
Short { index: u16, val: i16 }, | ||
UShort { index: u16, val: u16 }, | ||
Byte { index: u16, val: u8 }, | ||
UByte { index: u16, val: u8 }, | ||
Json { index: u16, val: String }, | ||
Binary { index: u16, val: String }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential mismatch for Byte
vs UByte
.
Currently, both Byte
and UByte
variants use u8
, which might be confusing if one is meant to store signed 8-bit values. Verify whether Byte { val: i8 }
reflects your intended design.
- Byte { index: u16, val: u8 },
+ Byte { index: u16, val: i8 },
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
DateTime { index: u16, val: DateTime<Utc> }, | |
Short { index: u16, val: i16 }, | |
UShort { index: u16, val: u16 }, | |
Byte { index: u16, val: u8 }, | |
UByte { index: u16, val: u8 }, | |
Json { index: u16, val: String }, | |
Binary { index: u16, val: String }, | |
DateTime { index: u16, val: DateTime<Utc> }, | |
Short { index: u16, val: i16 }, | |
UShort { index: u16, val: u16 }, | |
Byte { index: u16, val: i8 }, | |
UByte { index: u16, val: u8 }, | |
Json { index: u16, val: String }, | |
Binary { index: u16, val: String }, |
build_index_generic::<String, _>(*schema_index, attribute_entries, |entry| { | ||
if let AttributeIndexEntry::Binary { index, val } = entry { | ||
if *index == *schema_index { | ||
Some(val.clone()) | ||
} else { | ||
None | ||
} | ||
} else { | ||
None | ||
} | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary stored as String is risky.
Binary data can contain non-UTF-8 bytes, causing decoding errors. Storing in String
may be a design flaw.
Consider using a Vec<u8>
for robust handling of arbitrary binary content.
pub async fn select_attr_query( | ||
mut self, | ||
query: &AttrQuery, | ||
) -> Result<AsyncFeatureIter<T>, Error> { | ||
trace!("starting: select_attr_query via http reader"); | ||
|
||
let header = self.fbs.header(); | ||
let header_len = self.header_len(); | ||
// Assume the header provides rtree and attribute index sizes. | ||
let rtree_index_size = self.rtree_index_size() as usize; | ||
let attr_index_size = self.attr_index_size() as usize; | ||
let attr_index_offset = header_len + rtree_index_size; | ||
let feature_begin = header_len + rtree_index_size + attr_index_size; | ||
|
||
// Fetch the attribute index block via HTTP range request. | ||
let mut attr_index_bytes = self | ||
.client | ||
.get_range(attr_index_offset, attr_index_size) | ||
.await?; | ||
|
||
let attr_index_entries = header | ||
.attribute_index() | ||
.ok_or_else(|| anyhow!("attribute index not found"))?; | ||
let columns: Vec<Column> = header | ||
.columns() | ||
.ok_or_else(|| anyhow!("no columns found in header"))? | ||
.iter() | ||
.collect(); | ||
|
||
let mut multi_index = MultiIndex::new(); | ||
|
||
for attr_info in attr_index_entries.iter() { | ||
process_attr_index_entry( | ||
&mut attr_index_bytes, | ||
&mut multi_index, | ||
&columns, | ||
&query, | ||
attr_info, | ||
)?; | ||
} | ||
|
||
let query = build_query(&query); | ||
|
||
let result = bst::stream_query(&multi_index, query, feature_begin).await?; | ||
|
||
let count = result.len(); | ||
let combine_request_threshold = 256 * 1024; | ||
|
||
let http_ranges: Vec<HttpRange> = result.into_iter().map(|item| item.range).collect(); | ||
|
||
trace!( | ||
"completed: select_attr_query via http reader, matched features: {}", | ||
count | ||
); | ||
Ok(AsyncFeatureIter { | ||
client: self.client, | ||
fbs: self.fbs, | ||
selection: FeatureSelection::SelectAttr(SelectAttr { | ||
ranges: http_ranges, | ||
range_pos: 0, | ||
}), | ||
count, | ||
}) | ||
unimplemented!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Unimplemented select_attr_query
The function is left unimplemented!()
, so it will panic at runtime if called. If you’re not ready to implement it soon, consider returning an error variant (e.g. Error::FeatureUnimplemented("attr_query")
) to fail more gracefully.
This change is
Summary by CodeRabbit
New Features
Documentation
Refactor