You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No built-in pagination to the library. The view you get out is an incrementally maintained collection of all your data. This scaled fine for even 4 million rows of data in the final view (see below videos which demonstrate a computation over 4 million rows that is maintained on item changes)
No support for custom sources. Your source of data needed to be a Set or Map.
4million.webmno-cpu-churn.webm
Being able to maintain computations over massive (for the borwser anyway) datasets, Materialite gained some paying interest pretty quickly which has been driving the next round of revisions.
Revisions
In the original implementation of Materialite, to scale to 4 million rows (or even 100,000 if you need to do many updates to inner rows) required a custom data structure. In this case, a persistent treap.
This is because inserting into the middle of a very large JS array is expensive (O(n)). That and, if you're integrating with React, your state should be immutable. Copying large JS arrays is also expensive. If you're copying the materialized view on every keystroke just to update a single record... well that is a problem. This is why Materialite used a custom data structure for views. We can update that structure, even create a new version of it without modifying the old one, in O(logn) time. In english: creating a new version of a 4,000,000 item only view takes 22 comparisons, 23 for 8 million, 24 for 16 million, etc.
Pagination
It was pointed out that it'd be ideal if Materialite could somehow use plain old JS arrays for the views. We certainly can when the final view is constrained to a reasonable size.
You can see where this is going. Materialite is getting support for after(cursor) and take(limit) operators. These operators constrain the size of the output view so, when users desire, they can materialize into plain old JS arrays with little to no cost.
In other words, pagination will become a part of the library. A page of a view can be incrementally maintained on any changes to data that falls within the page.
Later pages can be fetched by updated after.
Pagination also strikes a balance between two competing concerns.
Fast query results for visible data
Fast incremental results for all data
When materializing all pages into the view (e.g., all 4 million rows), handling changes in the data is quick and easy -- we just process the rows that were changed. Scrolling is also easy: all the records are there in memory and sorted. We simply jump to them. Updating filters, however, can be problematic. Changing filters can, in the worst case, require re-constructing a view that is the size of your source dataset. Re-creating a tree of 4 million items isn't cheap.
If the view is paginated then changing filters only requires constructing one page of the view.
There is a caveat to pagination, however. If the source data is not sorted the same as the view, we'll have to re-scan the entire source collection to find the contents of the next page.
There's two options in this case:
Opt into the old behavior and materialize all the pages
Provide a sorted source
Hoisting
Providing a sorted source means that a view is able to jump to the next page of input. It jumps to after(cusror) and resumes processing for the next page there.
Being able to jump to the right position in the source is a property of the source. We have to "hoist" the after expression and pass it to the provided data source.
Note that after could occur after a filter but still needs to be passed down to source
source.filter(...).after(cursor)
A nice thing about after and hoisting it to a source + combining it with take is that we're no longer constrained to keeping everything in-memory. We can use sources persisted to disk, start reading them from after(cursor) and stop reading them at take(limit).
You can see where this is going too... SQLite, IndexedDB or other things as data sources for incremental computation.
If we're going to hoist after we might as well make other operators hoistable.
Hoist:
filter
take
join
etc.
Where this leaves us
So there's the Materialite as it originally existed. The simple one as described in the first paragraphs. This Materialite is mostly ready sans some optimization and additional testing of join, count, reduce.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Backlinks
Beginning
Materialite started simply:
Take a source collection, set up an incremental computation pipeline against it, materialize that into a view.
There was:
No built-in pagination to the library. The view you get out is an incrementally maintained collection of all your data. This scaled fine for even 4 million rows of data in the final view (see below videos which demonstrate a computation over 4 million rows that is maintained on item changes)
No support for custom sources. Your source of data needed to be a Set or Map.
4million.webm
no-cpu-churn.webm
Being able to maintain computations over massive (for the borwser anyway) datasets, Materialite gained some paying interest pretty quickly which has been driving the next round of revisions.
Revisions
In the original implementation of Materialite, to scale to 4 million rows (or even 100,000 if you need to do many updates to inner rows) required a custom data structure. In this case, a persistent treap.
This is because inserting into the middle of a very large JS array is expensive (O(n)). That and, if you're integrating with React, your state should be immutable. Copying large JS arrays is also expensive. If you're copying the materialized view on every keystroke just to update a single record... well that is a problem. This is why Materialite used a custom data structure for views. We can update that structure, even create a new version of it without modifying the old one, in O(logn) time. In english: creating a new version of a 4,000,000 item only view takes
22
comparisons, 23 for 8 million, 24 for 16 million, etc.Pagination
It was pointed out that it'd be ideal if Materialite could somehow use plain old JS arrays for the views. We certainly can when the final view is constrained to a reasonable size.
You can see where this is going. Materialite is getting support for
after(cursor)
andtake(limit)
operators. These operators constrain the size of the output view so, when users desire, they can materialize into plain old JS arrays with little to no cost.In other words, pagination will become a part of the library. A page of a view can be incrementally maintained on any changes to data that falls within the page.
Later pages can be fetched by updated
after.
Pagination also strikes a balance between two competing concerns.
When materializing all pages into the view (e.g., all 4 million rows), handling changes in the data is quick and easy -- we just process the rows that were changed. Scrolling is also easy: all the records are there in memory and sorted. We simply jump to them. Updating filters, however, can be problematic. Changing filters can, in the worst case, require re-constructing a view that is the size of your source dataset. Re-creating a tree of 4 million items isn't cheap.
If the view is paginated then changing filters only requires constructing one page of the view.
There is a caveat to pagination, however. If the source data is not sorted the same as the view, we'll have to re-scan the entire source collection to find the contents of the next page.
There's two options in this case:
Hoisting
Providing a sorted source means that a view is able to jump to the next page of input. It jumps to
after(cusror)
and resumes processing for the next page there.Being able to jump to the right position in the source is a property of the source. We have to "hoist" the after expression and pass it to the provided data source.
Similar to this old post: https://tantaman.com/2022-05-26-query-planning.html
Note that
after
could occur after afilter
but still needs to be passed down tosource
A nice thing about
after
and hoisting it to a source + combining it withtake
is that we're no longer constrained to keeping everything in-memory. We can use sources persisted to disk, start reading them fromafter(cursor)
and stop reading them attake(limit)
.You can see where this is going too... SQLite, IndexedDB or other things as data sources for incremental computation.
If we're going to hoist
after
we might as well make other operators hoistable.Hoist:
Where this leaves us
So there's the
Materialite
as it originally existed. The simple one as described in the first paragraphs. This Materialite is mostly ready sans some optimization and additional testing ofjoin
,count
,reduce
.Then the more advanced features.
These are implemented as
Beta Was this translation helpful? Give feedback.
All reactions