[Dev Notes] Where we're going: Pagination, Hoisting and Backends #2

tantaman · 2023-11-03T18:17:27Z

tantaman
Nov 3, 2023
Maintainer

Backlinks

react demo app walkthrough

Beginning

Materialite started simply:

Take a source collection, set up an incremental computation pipeline against it, materialize that into a view.

const source = materialite.newSet();
const result = source.map(...).filter(...).join(otherSource).materialize();

There was:

No built-in pagination to the library. The view you get out is an incrementally maintained collection of all your data. This scaled fine for even 4 million rows of data in the final view (see below videos which demonstrate a computation over 4 million rows that is maintained on item changes)
No support for custom sources. Your source of data needed to be a Set or Map.

4million.webm

no-cpu-churn.webm

Being able to maintain computations over massive (for the borwser anyway) datasets, Materialite gained some paying interest pretty quickly which has been driving the next round of revisions.

Revisions

In the original implementation of Materialite, to scale to 4 million rows (or even 100,000 if you need to do many updates to inner rows) required a custom data structure. In this case, a persistent treap.

This is because inserting into the middle of a very large JS array is expensive (O(n)). That and, if you're integrating with React, your state should be immutable. Copying large JS arrays is also expensive. If you're copying the materialized view on every keystroke just to update a single record... well that is a problem. This is why Materialite used a custom data structure for views. We can update that structure, even create a new version of it without modifying the old one, in O(logn) time. In english: creating a new version of a 4,000,000 item only view takes 22 comparisons, 23 for 8 million, 24 for 16 million, etc.

Pagination

It was pointed out that it'd be ideal if Materialite could somehow use plain old JS arrays for the views. We certainly can when the final view is constrained to a reasonable size.

You can see where this is going. Materialite is getting support for after(cursor) and take(limit) operators. These operators constrain the size of the output view so, when users desire, they can materialize into plain old JS arrays with little to no cost.

In other words, pagination will become a part of the library. A page of a view can be incrementally maintained on any changes to data that falls within the page.

Later pages can be fetched by updated after.

Pagination also strikes a balance between two competing concerns.

Fast query results for visible data
Fast incremental results for all data

When materializing all pages into the view (e.g., all 4 million rows), handling changes in the data is quick and easy -- we just process the rows that were changed. Scrolling is also easy: all the records are there in memory and sorted. We simply jump to them. Updating filters, however, can be problematic. Changing filters can, in the worst case, require re-constructing a view that is the size of your source dataset. Re-creating a tree of 4 million items isn't cheap.

If the view is paginated then changing filters only requires constructing one page of the view.

There is a caveat to pagination, however. If the source data is not sorted the same as the view, we'll have to re-scan the entire source collection to find the contents of the next page.

There's two options in this case:

Opt into the old behavior and materialize all the pages
Provide a sorted source

Hoisting

Providing a sorted source means that a view is able to jump to the next page of input. It jumps to after(cusror) and resumes processing for the next page there.

source.after(cursor).map(...).filter(...).take(n).materialize();

Being able to jump to the right position in the source is a property of the source. We have to "hoist" the after expression and pass it to the provided data source.

Similar to this old post: https://tantaman.com/2022-05-26-query-planning.html

Note that after could occur after a filter but still needs to be passed down to source

source.filter(...).after(cursor)

A nice thing about after and hoisting it to a source + combining it with take is that we're no longer constrained to keeping everything in-memory. We can use sources persisted to disk, start reading them from after(cursor) and stop reading them at take(limit).

You can see where this is going too... SQLite, IndexedDB or other things as data sources for incremental computation.

If we're going to hoist after we might as well make other operators hoistable.

Hoist:

filter
take
join
etc.

Where this leaves us

So there's the Materialite as it originally existed. The simple one as described in the first paragraphs. This Materialite is mostly ready sans some optimization and additional testing of join, count, reduce.

Then the more advanced features.

Plugging in a custom source
Paginated views

These are implemented as

New operators in Materialite (after, take)
Sorted sources
Third party sources

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dev Notes] Where we're going: Pagination, Hoisting and Backends #2

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Dev Notes] Where we're going: Pagination, Hoisting and Backends #2

Uh oh!

Uh oh!

tantaman Nov 3, 2023 Maintainer

Beginning

Revisions

Pagination

Hoisting

Where this leaves us

Replies: 0 comments

tantaman
Nov 3, 2023
Maintainer