Use Kahn's algorithm for CRDT toposort #49

andymatuschak · 2026-01-18T18:15:55Z

The existing implementation is O(n^2) in the number of nodes. That's fine for drawings, but it takes minutes to process big text documents because the reMarkable creates a node for each word.

This patch implements Kahn's algorithm. That would ordinarily be O(n), but to maintain deterministic ordering for concurrent edits (as the current algorithm does), we use a heap instead of a set, making the runtime O(n log n).

My text-heavy notebook now processes in about a second instead of minutes. I've verified that the tests pass as before.

The existing implementation is O(n^2) in the number of nodes. That's fine for drawings, but it takes minutes to process big text documents because the reMarkable creates a node for each word. This patch implements [Kahn's algorithm](https://en.wikipedia.org/wiki/Topological_sorting#Kahn's_algorithm). That would ordinarily be O(n), but to maintain deterministic ordering for concurrent edits (as the current algorithm does), we use a heap instead of a set, making the runtime O(n log n). My text-heavy notebook now processes in about a second instead of minutes.

Azeirah · 2026-01-18T22:25:44Z

This is amazing! There are two primary bottlenecks I've encountered in processing rM documents to PDF. Toposort is the first, PDF generation the second.

This makes the former negligible, leaving only PDF generation as a slow part of the pipeline!

Azeirah · 2026-01-21T22:05:08Z

I'm not sure if @ricklupton is still actively maintaining rmsene and rmc. I've been maintaining forks at https://github.com/scrybbling-together/rmscene.git, I aim to review the code myself when I have time soon, hopefully this weekend.

andymatuschak · 2026-01-22T22:27:38Z

I'm not deep enough in this library to know, but it's possible we could get away with using sets rather than heaps—I don't know if it's actually important to maintain deterministic ordering for CRDT nodes which aren't properly ordered by their edges. I just maintained that to match the existing behavior. Probably the extra log(N) factor isn't all that meaningful—haven't tested.

Azeirah · 2026-01-25T22:14:46Z

I have a dataset of real-world .rm files, I sampled a bunch of them to get data. The O(n^2) labels are the original algorithm, and the O(n log n) are yours.

The impact is mostly at the tail, so for larger files, processing is much faster.

I also ran a simulation sampling 10000 files randomly, which is what is bottom two graphs.

======================================================================
PERCENTILE COMPARISON
======================================================================
Percentile   Nodes      Old (O(n²))     New (O(n log n))   Speedup   
----------------------------------------------------------------------
p50         30         7.2ms           7.0ms              1.0x
p75         241        47.3ms          39.7ms             1.2x
p90         568        120.1ms         93.3ms             1.3x
p95         906        199.9ms         145.0ms            1.4x
p99         2152       594.3ms         354.9ms            1.7x
p99.9       7621       3.86s           1.14s              3.4x
max          58155      1.8min          6.38s              16.8x

I do think optimizing further to O(n) would be only mildly meaningful here.

Note, this was a dataset of real files and there were 0 errors, so that's a good sign. In my opinion this is ok to merge.

andymatuschak marked this pull request as ready for review January 18, 2026 18:16

Azeirah mentioned this pull request Jan 25, 2026

Use Kahn's algorithm for CRDT toposort Scrybbling-together/rmscene#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Kahn's algorithm for CRDT toposort #49

Use Kahn's algorithm for CRDT toposort #49

Uh oh!

andymatuschak commented Jan 18, 2026 •

edited

Loading

Uh oh!

Azeirah commented Jan 18, 2026

Uh oh!

Azeirah commented Jan 21, 2026

Uh oh!

andymatuschak commented Jan 22, 2026

Uh oh!

Azeirah commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use Kahn's algorithm for CRDT toposort #49

Are you sure you want to change the base?

Use Kahn's algorithm for CRDT toposort #49

Uh oh!

Conversation

andymatuschak commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Azeirah commented Jan 18, 2026

Uh oh!

Azeirah commented Jan 21, 2026

Uh oh!

andymatuschak commented Jan 22, 2026

Uh oh!

Azeirah commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andymatuschak commented Jan 18, 2026 •

edited

Loading