Skip to content

Commit c3139e1

Browse files
HideBadeepsource-autofix[bot]Copilot
authored
feat: support CityJSON's extension (#20)
* manage test data with git * added test data * define extension schema * fix fb schema * impl encoding of extension * decode * add e2e for extension * style: format code with Rustfmt This commit fixes the style issues introduced in 805d9c8 according to the output from Rustfmt. Details: #20 * docs: document about extension * Update src/rust/fcb_core/src/writer/serializer.rs Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com> Co-authored-by: Copilot <[email protected]>
1 parent 423744d commit c3139e1

26 files changed

+2377
-151
lines changed

.cursor/rules/memory/progress.md

+46-3
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,25 @@ While optimized data formats such as PMTiles, FlatBuffers, Mapbox Vector Tiles,
99
## Problems to be Solved
1010

1111
### Lack of Efficient 3D City Model Data Formats
12+
1213
- Existing formats like CityJSON and CityJSONSeq are not optimized for large-scale cloud processing.
1314
- Limited support for spatial indexing and efficient querying.
1415
- Inefficiencies in downloading and processing large 3D city model datasets.
1516

1617
### Scalability Issues in Cloud-Native Environments
18+
1719
- High storage and processing costs for unoptimized 3D city models.
1820
- Challenges in handling arbitrary extents of urban data dynamically.
1921
- Lack of standardized methods for fetching, sorting, and filtering large-scale 3D datasets.
2022

2123
### Limited Adoption of Optimized Binary Formats
24+
2225
- Current 3D data formats do not leverage modern binary serialization techniques.
2326
- Need for improved compression, indexing, and partial fetching for cloud and web applications.
2427
- Performance limitations in current file formats for visualization and analysis.
2528

2629
### Research Gaps
30+
2731
- Lack of specialized approaches for cloud-native processing of 3D city models.
2832
- Existing research has focused on text-based formats rather than optimized binary encoding.
2933
- Limited studies evaluating real-world performance benefits of FlatBuffers in geospatial applications.
@@ -32,11 +36,13 @@ While optimized data formats such as PMTiles, FlatBuffers, Mapbox Vector Tiles,
3236
## How It Should Work
3337

3438
### Implementation of FlatBuffers for CityJSON
39+
3540
- Integrate FlatBuffers as an optimized binary format for CityJSON.
3641
- Support for spatial indexing to enhance data retrieval performance.
3742
- Implement spatial sorting and partial fetching via HTTP Range requests.
3843

3944
### Optimization Methodology
45+
4046
1. **Comprehensive Review**: Evaluate existing optimized formats (e.g., PMTiles, Cloud Optimized GeoTIFF).
4147
2. **Format Adaptation**: Modify CityJSON to incorporate efficient binary storage and indexing.
4248
3. **Benchmarking**: Compare performance with traditional CityJSON and assess scalability in cloud environments.
@@ -45,6 +51,7 @@ While optimized data formats such as PMTiles, FlatBuffers, Mapbox Vector Tiles,
4551
6. **Web-Based Query Optimization**: Enhance interactive applications through HTTP Range requests and on-the-fly decoding.
4652

4753
### Cloud-Native Processing Enhancements
54+
4855
- Enable single-file containment of entire urban areas.
4956
- Reduce cloud storage and computation costs through efficient serialization.
5057
- Improve web-based access and real-time querying capabilities.
@@ -90,7 +97,6 @@ While optimized data formats such as PMTiles, FlatBuffers, Mapbox Vector Tiles,
9097
- Implement intelligent batching of nearby features based on spatial proximity.
9198
- Add client-side caching to avoid redundant requests for previously fetched data.
9299

93-
94100
3. **Performance Benchmarking for Large Datasets**
95101
- Evaluate large-scale data retrieval and identify performance bottlenecks.
96102
- Develop standardized benchmark suite for comparing with other formats.
@@ -174,37 +180,74 @@ While optimized data formats such as PMTiles, FlatBuffers, Mapbox Vector Tiles,
174180
## Next Milestones
175181

176182
### Milestone 1: Core Optimization
183+
177184
- Optimize Attribute Index for streaming with progressive loading.
178185
- Implement intelligent batching for HTTP Range Requests.
179186
- Complete comprehensive benchmarking suite for large datasets.
180187
- Address critical performance bottlenecks identified in profiling.
181188
- Enhance documentation with performance optimization guidelines.
182189

183190
### Milestone 2: Format Extensions
191+
184192
- Evaluate and implement support for Arrow and Parquet encoding.
185193
- Develop compression strategies for geometry and attribute data.
186194
- Create adapters for seamless format conversion.
187195
- Implement advanced spatial indexing techniques.
188196
- Enhance CI/CD pipeline with automated performance testing.
189197

190198
### Milestone 3: Language Support
199+
191200
- Release Python implementation.
192201
- Develop C++ implementation.
193202
- Create JavaScript/TypeScript SDK for browser environments.
194203
- Ensure cross-language test compatibility.
195204
- Publish packages to language-specific repositories.
196205

197206
### Milestone 4: Visualization & Integration
207+
198208
- Implement Three.js-based Web Viewer with LOD support.
199209
- Develop browser-based conversion tools for common 3D formats.
200210
- Create plugins for QGIS, ArcGIS, and other GIS software.
201211
- Implement texture and material rendering in web environments.
202212
- Release comprehensive integration examples for third-party tools.
203213

204-
205214
## Recent Updates
215+
206216
- Integrated spatial indexing and binary search tree.
207217
- Added WebAssembly support for FlatCityBuf.
208218
- Improved texture handling in CityJSON encoding.
209219
- Completed initial benchmarking against CityJSON and CityJSONSeq.
210-
- Created preliminary documentation for the file format specification.
220+
- Created preliminary documentation for the file format specification.
221+
222+
## progress status
223+
224+
- [x] basic flatbuffers schema for cityjson - completed
225+
- [x] spatial indexing implementation (hilbert r-tree) - completed
226+
- [x] encoding of geoms with shared vertices - completed
227+
- [x] encoding of materials - completed
228+
- [x] encoding of textures - completed
229+
- [x] encoding of appearance - completed
230+
- [x] encoding of semantics - completed
231+
- [x] encoding of attributes - completed
232+
- [x] extension support - completed
233+
- [ ] js/wasm query engine - in progress
234+
- [ ] python wrapper - in progress
235+
- [ ] web-based query optimizer - in progress
236+
- [ ] partial geom retrieval - planned
237+
- [ ] versioning support - planned
238+
239+
## what's next
240+
241+
- **query engine refinement:** optimize the query engine for more complex spatial and attribute queries
242+
- **python wrapper:** complete the python interface for broader ecosystem integration
243+
- **web-based query optimizer:** finish the visualization tool for query plan optimization
244+
- **partial geometry retrieval:** implement efficient retrieval of partial geometries for large objects
245+
- **extension documentation:** create more comprehensive documentation and examples for utilizing extensions
246+
- **benchmarking extensions:** measure performance impact of different extension usage patterns
247+
248+
## known issues
249+
250+
- performance bottlenecks in attribute indexing with large datasets
251+
- memory usage spikes during encoding of complex geometries
252+
- limitations in the current implementation of lod switching
253+
- extension attributes may impact query performance for complex filter operations

.cursor/rules/memory/specification.md

+112
Original file line numberDiff line numberDiff line change
@@ -510,3 +510,115 @@ graph TD
510510
```
511511

512512
this structure allows for efficient querying and retrieval of city objects based on both spatial and attribute criteria.
513+
514+
## extension support
515+
516+
flatcitybuf implements full support for the cityjson extension mechanism, allowing customization of the data model while maintaining compatibility with standard tools.
517+
518+
### extension mechanism background
519+
520+
cityjson extensions enable users to:
521+
522+
1. **add new attributes** to existing cityobjects
523+
2. **create new cityobject types** beyond the standard types
524+
3. **add new properties** at the root level of a cityjson file
525+
4. **define new semantic surface types**
526+
527+
extensions are identified using a "+" prefix (e.g., "+noise") and are defined in json schema files hosted at urls referenced in the cityjson file.
528+
529+
### extension schema implementation
530+
531+
flatcitybuf supports extensions through specialized schema components in three main areas:
532+
533+
#### 1. extension definition in `extension.fbs`
534+
535+
```flatbuffers
536+
table Extension {
537+
name: string; // Extension name (e.g., "+Noise")
538+
description: string; // Description of the extension
539+
url: string; // URL to the extension schema
540+
version: string; // Extension version
541+
version_cityjson: string; // Compatible CityJSON version
542+
extra_attributes: string; // Stringified JSON schema for attributes
543+
extra_city_objects: string; // Stringified JSON schema for city objects
544+
extra_root_properties: string; // Stringified JSON schema for root properties
545+
extra_semantic_surfaces: string; // Stringified JSON schema for semantic surfaces
546+
}
547+
```
548+
549+
#### 2. extended cityobjects in `feature.fbs`
550+
551+
```flatbuffers
552+
enum CityObjectType:ubyte {
553+
// ... standard types ...
554+
ExtensionObject
555+
}
556+
557+
table CityObject {
558+
type: CityObjectType;
559+
extension_type: string; // e.g. "+NoiseCityFurnitureSegment"
560+
// ... other fields ...
561+
}
562+
```
563+
564+
#### 3. extended semantic surfaces in `geometry.fbs`
565+
566+
```flatbuffers
567+
enum SemanticSurfaceType:ubyte {
568+
// ... standard types ...
569+
ExtraSemanticSurface
570+
}
571+
572+
table SemanticObject {
573+
type: SemanticSurfaceType;
574+
extension_type: string; // e.g. "+ThermalSurface"
575+
// ... other fields ...
576+
}
577+
```
578+
579+
#### 4. extension references in header
580+
581+
extensions are referenced in the header, allowing applications to understand which extensions are used in the file:
582+
583+
```flatbuffers
584+
table Header {
585+
// ... other fields ...
586+
extensions: [Extension]; // List of extensions used
587+
// ... other fields ...
588+
}
589+
```
590+
591+
### encoding and decoding strategy
592+
593+
the encoding and decoding of extensions follows these principles:
594+
595+
1. **self-contained extensions**: extension schemas are embedded directly in the file as stringified json, making the file self-contained and usable without external references.
596+
597+
2. **enum with extension marker**: special enum values (`extensionobject`, `extrasemanticssurface`) combined with a string field (`extension_type`) handle extended types. this approach maintains enum efficiency while supporting unlimited extension types.
598+
599+
3. **unified attribute storage**: extension attributes are treated the same as core attributes, both encoded in the attributes byte array. this simplifies implementation and maintains query performance.
600+
601+
4. **root properties**: extension properties at the root level are stored in the header's attributes field.
602+
603+
when encoding:
604+
605+
- if a cityobject type starts with "+", it's encoded as `extensionobject` with the full type name stored in the `extension_type` field
606+
- if a semantic surface type starts with "+", it's encoded as `extrasemanticssurface` with the full type name stored in the `extension_type` field
607+
- extension-specific attributes are encoded as regular attributes in the binary attribute array
608+
609+
when decoding:
610+
611+
- if a cityobject has type `extensionobject` and a non-null `extension_type`, the extension type name is used
612+
- if a semantic surface has type `extrasemanticssurface` and a non-null `extension_type`, the extension type name is used
613+
- all attributes are decoded, regardless of whether they belong to the core schema or extensions
614+
615+
### benefits of this approach
616+
617+
this implementation balances several key factors:
618+
619+
- **performance**: maintains fast access to core data structures
620+
- **flexibility**: supports any cityjson extension
621+
- **self-containment**: makes files usable without external references
622+
- **simplicity**: uses consistent patterns for extension handling
623+
624+
the schema is agnostic about the validity of extension data, focusing instead on accurately representing the structure of extended cityjson files in an efficient binary format.

.cursor/rules/rust.mdc

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
description: Coding rules for Rust implementation in FlatCityBuf
33
globs: src/rust/**
4+
alwaysApply: false
45
---
56
# Rust Coding Guidelines for Library Development
67

@@ -73,7 +74,7 @@ flatcitybuf/
7374
---
7475

7576
## Error Handling
76-
- Use `anyhow` for application code and `thiserror` for package-level errors.
77+
- Use `thiserror` for package-level errors.
7778
- Avoid panics in library code; return errors instead.
7879
- Handle errors and edge cases early, returning errors where appropriate.
7980

.gitignore

+2-2
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@ __pycache__/
5959
temp/
6060

6161
benchmark_data/
62-
src/rust/fcb_core/tests/data/
63-
!src/rust/fcb_core/tests/data/.gitkeep
62+
src/rust/fcb_core/tests/data/*.fcb
63+
src/rust/fcb_core/tests/data/*.json
6464
.cursorrules
6565

6666
*.fcb

src/fbs/extension.fbs

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
2+
3+
// Extension is a struct that contains schema of the extension.
4+
// To simplify FlatBuffers schema, we just store stringified JSON schema.
5+
// This Extension can be derived from ExtensionMeta's url.
6+
table Extension {
7+
// "type": "CityJSONExtension" isn't written in the file as it's obvious
8+
name: string; // name of the extension. It's the same as ExtensionMeta's name.
9+
description: string; // description of the extension. It's the same as ExtensionMeta's description.
10+
url: string; // url of the extension. It's the same as ExtensionMeta's url.
11+
version: string; // version of the extension. It's the same as ExtensionMeta's version.
12+
version_cityjson: string; // version of the extension in CityJSON format.
13+
extra_attributes: string; // extra attributes of the extension. stringified JSON object.
14+
extra_city_objects: string; // extra city objects of the extension. stringified JSON object.
15+
extra_root_properties: string; // extra root properties of the extension. stringified JSON object.
16+
extra_semantic_surfaces: string; // extra semantic surfaces of the extension. stringified JSON object.
17+
}

src/fbs/feature.fbs

+5-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,10 @@ enum CityObjectType:ubyte {
4646
TunnelHollowSpace,
4747
TunnelFurniture,
4848

49-
WaterBody
49+
WaterBody,
50+
51+
// Extension objects. Since we can't expect the extended city object type, just mark it as "ExtensionObject".
52+
ExtensionObject
5053
}
5154

5255
struct Vertex {
@@ -64,6 +67,7 @@ table CityFeature {
6467

6568
table CityObject {
6669
type:CityObjectType;
70+
extension_type: string; // extension type of the city object. e.g. +NoiseCityFurnitureSegment
6771
id:string (key, required);
6872
geographical_extent:GeographicalExtent;
6973
geometry:[Geometry];

src/fbs/geometry.fbs

+5-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,10 @@ enum SemanticSurfaceType:ubyte {
2121
TrafficArea,
2222
AuxiliaryTrafficArea,
2323
TransportationMarking,
24-
TransportationHole
24+
TransportationHole,
25+
26+
// Extension objects. In the JSON data, it's written like "+ThermalSurface". However as we can't expect the extended semantic surface type, just mark it as "ExtraSemanticSurface".
27+
ExtraSemanticSurface
2528
}
2629

2730
enum GeometryType:ubyte {
@@ -85,6 +88,7 @@ table SemanticObject {
8588
attributes:[ubyte];
8689
children:[uint];
8790
parent:uint = null; // default is null, important to be able to check if this field is set
91+
extension_type: string; // extension type of the semantic object. e.g. "+ThermalSurface"
8892
}
8993

9094
struct TransformationMatrix {

src/fbs/header.fbs

+4
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
// Reference: https://github.com/flatgeobuf/flatgeobuf/blob/master/src/fbs/header.fbs
55

66
include "geometry.fbs";
7+
include "extension.fbs";
78

89
enum ColumnType: ubyte {
910
Byte, // Signed 8-bit integer
@@ -138,8 +139,11 @@ table Header {
138139
identifier: string; // Dataset identifier
139140
reference_date: string; // Reference date
140141
title: string; // Dataset title
142+
// geometry templates
141143
templates: [Geometry];
142144
templates_vertices: [DoubleVertex];
145+
// extensions
146+
extensions: [Extension];
143147
// Point of contact
144148
poc_contact_name: string; // Point of contact name
145149
poc_contact_type: string; // Point of contact type

src/rust/fcb_core/Cargo.toml

+2-1
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,13 @@ wasm = [
1515
"js-sys",
1616
"getrandom",
1717
]
18+
extension = ["cjseq/extension"]
1819

1920
[dependencies]
2021
bytes = { workspace = true, optional = true }
2122
flatbuffers = { workspace = true }
2223
byteorder = { workspace = true }
23-
cjseq = { workspace = true }
24+
cjseq = { workspace = true, features = [] }
2425
tempfile = { workspace = true }
2526
serde_json = { workspace = true }
2627
anyhow = { workspace = true }

src/rust/fcb_core/src/error.rs

+9
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
use cjseq::error::CjseqError;
12
use flatbuffers::InvalidFlatbuffer;
23
use serde_json;
34
use thiserror::Error;
@@ -70,6 +71,12 @@ pub enum Error {
7071
#[from]
7172
source: crate::cjerror::CjError,
7273
},
74+
75+
#[error("Cjseq error: {source}")]
76+
CjseqError {
77+
#[from]
78+
source: CjseqError,
79+
},
7380
}
7481

7582
impl Error {
@@ -102,3 +109,5 @@ impl Error {
102109
)
103110
}
104111
}
112+
113+
pub type Result<T> = std::result::Result<T, Error>;

0 commit comments

Comments
 (0)