-
-
Notifications
You must be signed in to change notification settings - Fork 932
fix: Don't serialize GeoJSON vector tiles from the worker to support undefined properties #6803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1c98c72 to
db722ad
Compare
src/source/worker_tile.ts
Outdated
|
|
||
| abort: AbortController; | ||
| vectorTile: VectorTileLike; | ||
| geoJsonFeatures?: Feature[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be inferred from the vectorTile? It is mandatory to add this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so - with the update to vt-pbf I can likely simplify things further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look it is being used in the parse method, so I wonder if this can be removed from WorkerTile class...
| } | ||
|
|
||
| workerTile.vectorTile = response.vectorTile; | ||
| workerTile.geoJsonFeatures = response.isGeojson ? (response.vectorTile as GeoJSONWrapper).features : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this can be done better considering we have inheritance here with geojson and vector tile...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is reflecting the type of an object here more or less the same, performance-wise? No issues with changing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less about reflection, more about OOP... if you are doing as GeoJSONWrapper in a class that doesn't suppose to know about GeoJSONWrapper then it seems weird...
| 'type': 'Point', | ||
| 'coordinates': [0, 0] | ||
| }, | ||
| 'properties': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to split this out to a different test...
|
I can't say I'm loving this, seems to heckish, but I can't place my finger on a specific issue, what other alternatives do we have to solve the bug? |
| return { | ||
| vectorTile: geojsonWrapper, | ||
| isGeojson: true | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we convert the GeoJSON data to a vector tile here, the data is encoded as a UInt8Array. This can be transferred to the main thread with zero overhead via the WebWorker transfer API.
The plain GeoJSON data cannot take advantage of the transfer API and will be much slower.
See https://developer.chrome.com/blog/transferable-objects-lightning-fast/ for more info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested against my performance benchmark (add 100k points, move one, https://jsbin.com/bubejanoha/edit?html,output) this branch is ~100% slower than main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running your tool, I'm seeing about a 15% regression on my system over an average of 40 samples in Firefox. Could you add this test to the benchmarks? I think it'd be really valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gotta step out for the afternoon but I'll try to get back to these questions in the next ~12 hrs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running your tool, I'm seeing about a 15% regression on my system over an average of 40 samples in Firefox.
Are you sure you're running npm run build-dist on this branch and on main? The dist build is not generated automatically as part of the dev server. I'm certain that you'll see a measurable and noticeable difference.
Could you add this test to the benchmarks? I think it'd be really valuable.
I added a GeoJSONDiff benchmark that measures perf on a relatively small (10k features) source. You can manually edit the bench to test a large (100k features) source but the runtime is prohibitively long to include in the standard suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Prior to merging main my local branch was behind enough to not include your GeoJSONDiff benchmark. Not many great options here seeing this regression. Indirectly looking up properties after querying the feature index might be the best approach if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Prior to merging main my local branch was behind enough to not include your GeoJSONDiff benchmark.
If you pull in the latest changes from main you can compare this branch to main with one command. It takes a minute to run. Good time to get a glass of water, bother your pet, etc.
npm run benchmark -- GeoJSONSourceUpdateData && npm run benchmark -- GeoJSONSourceSetData
Indirectly looking up properties after querying the feature index might be the best approach if it works.
I think we can make it work! The one big hurdle to navigate will be supporting GeoJSON sources with string feature IDs or no feature IDs. I'm happy to chat more, help brainstorm or prototype but I respect that you're the primary issue owner here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a couple other performance solutions worth considering...
- pin to the previous
vector-tile-jsversion - fork
vector-tile-jsto maintain support fornull - accept the breaking change (how many users are really differentiating between
nullandundefinedinqueryRenderedFeaturesresults?) - use a "sentinel" value to represent
nullin vector tiles (like the string"__MAPLIBRE_NULL_VALUE__")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 & 4 are certainly worth considering but I'm hesitant about using an older or modified version of vector-tile-js. For better or worse the current in-memory tile representation used is coupled to MVT, trying to ignore the specification (even if a minor deviation) doesn't sound too appealing to me.
While I question the usefulness of undefined/null GeoJSON properties, that specification allows for them so if possible I'd prefer a solution that complies with both formats. Other options like custom serialization, a forked vector tile deserializer or tile feature culling on the main thread would all add more complexity compared to a simple sentinel, so I'm currently leaning more towards that approach. An indirect lookup from the feature index would be the runner up, with the perk of not needing a special reserved property value.
@HarelM What are your thoughts on the possible approaches here? If we're fine introducing another reserved string then this whole PR becomes a lot more simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that specification allows for them so if possible I'd prefer a solution that complies with both formats
queryRenderedFeatures has never supported the GeoJSON specification. This project's stance has consistently been that GeoJSON gets converted to a vector tile internally and that's what comes out of qRF. For example, it's never supported arrays or nested objects. Support for the null value was a quirk and does not indicate a broader commitment to full support of the GeoJSON specification.
|
If it's the case that copying the raw feature data is slower than serializing the tile to protobuf, then the remaining options to fix this issue are a bit more involved:
I'm not certain what the best approach here is, we can't get away with just serializing to MVT anymore unless we pin an old version and stay there or get the spec changed. |
|
Probably not a small lift, but can we serialize to MLT now? |
|
No, we don't have a js derializer for mlt at the moment... :-/ |
|
I'm just getting up to speed on this issue, so I apologize if this is already answered somewhere. Why can't we simply filter out any GeoJSON property keys that have |
|
While filtering would avoid serializing the now unsupported properties, that would result in those values not making it into the feature index. Since the GeoJSON spec explicitly supports null/undefined properties, we want to see those keys persist in |
|
Because these are valid geojson definitions... |
|
I'm guessing mapbox should have the same issue when I think about it. |
|
As a side note, performance is less important than "error handling", i.e. if we need to solve this and the only way is by degrading performance, then we'll need to bite the bullet. |
|
Undefined properties were previously supported. vt-pbf explicitly handles |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6803 +/- ##
==========================================
- Coverage 92.42% 92.39% -0.03%
==========================================
Files 288 288
Lines 23807 23817 +10
Branches 5056 5064 +8
==========================================
+ Hits 22003 22006 +3
- Misses 1804 1811 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Are there any immediate consequences to doing this? I can help implement a longer term solution if this is an acceptable temporary bandaid. I've spent about 100 hours optimizing |
|
A user who needs this fixed can simply rollback to previous msplibre version, so we can take the time and fix this properly. Is it possible to use JSON.strinfigy with TextEncoder to pass the geojson data to the main thread instead of coverting it to MVT? There are other limitations of MVT that do no apply to geojson (like objects in properties) that I'll be happy to solve too. |
|
If we are willing to entertain other formats, we can have a look at geobuf and geoarrow. |
|
@HarelM We've been benefiting enormously from the performance question on GeoJSON sources that @lucaswoj has been working on over the last couple of months. We have 10-100k points updated on a regular basis as we have live first responder locations for all of Colorado, as such we are likely more sensitive to performance loss as we've been working hard to try to get things as smooth as possible. As such if the |
|
Another thought: the |
| ? new VectorTile(new Protobuf(this.rawTileData)).layers | ||
| : new MLTVectorTile(this.rawTileData).layers; | ||
| if (this.geoJsonFeatureData) { | ||
| this.vtLayers = new GeoJSONWrapper(this.geoJsonFeatureData, {version: 2, extent: EXTENT}).layers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does constructing GeoJSONWrapper take for a source with 100k points? We used to do this step on the worker thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only a wrapper, it doesn't take right...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I thought this was actually constructing some kind of vector tiles.
|
I think we came to the conclusion this is not the case for data set as irl, didn't we? |
It wouldn't be very hard to send the full GeoJSON data to the main thread in just that case |
|
EDIT: moved to thread #6803 (comment) |
|
Fetching the features on the main thread would still need to slice them up to tiles for querying which is also heavy. |
Could we keep using |
Just created #6811 to provide the GeoJSON data back to the main thread when it's loaded from a URL |
Launch Checklist
This fixes #6730
By returning raw feature data from the web worker instead of serializing, we can regain support for undefined GeoJSON properties and get a nice performance boost.
CHANGELOG.mdunder the## mainsection.