Skip to content

refactor: Use real iterators over nodes#375

Open
Peeja wants to merge 2 commits intofeat/faster-index-generationfrom
refactor/use-iterators
Open

refactor: Use real iterators over nodes#375
Peeja wants to merge 2 commits intofeat/faster-index-generationfrom
refactor/use-iterators

Conversation

@Peeja
Copy link
Member

@Peeja Peeja commented Mar 4, 2026

In response to #354 (comment)

PR Dependency Tree

This tree was auto-generated by Charcoal

@Peeja Peeja force-pushed the feat/faster-index-generation branch from 0be118d to 67683a6 Compare March 4, 2026 22:44
@Peeja Peeja force-pushed the refactor/use-iterators branch from d4ff53d to f3ae0c5 Compare March 4, 2026 22:44
Copy link
Member

@frrist frrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@volmedo volmedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

NodesByShard(ctx context.Context, shardID id.ShardID, startOffset uint64) ([]dagsmodel.Node, error)
// ForEachNodeInShard iterates over all the nodes for a given shard, in the
// order they should appear in the shard.
ForEachNodeInShard(ctx context.Context, shardID id.ShardID, startOffset uint64) iter.Seq2[NodeInShard, error]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, likely subjective preference: I'd find the name of iterator functions to be more idiomatic if the "ForEach" was dropped and the plural was used. I'd call them just NodesInShard and NodesInIndex. Personally, the resulting "for node = range for each node..." feels a bit awkward and redundant 🙂.

Comment on lines +365 to +366
yield(blobs.NodeInIndex{}, fmt.Errorf("failed to prepare statement: %w", err))
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: iterator functions should check what yield returns and only return if yield == false. That lets the caller decide whether the loop should end in case of error.

In this specific case it's not a huge deal because these errors will likely happen for every element, but I think checking the result of yield would be nice from a correctness standpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Peeja added a commit that referenced this pull request Mar 9, 2026
Especially for indexes with many shards, this is dramatically faster. We
were finding the shards in the index, and then for each shard, looping
over the nodes/slices. Now we find all of the nodes in a single query.
It doesn't even matter what order they come in in, as long as we see
them all.











#### PR Dependency Tree


* **PR #360** 👈
  * **PR #361**
    * **PR #362**
      * **PR #363**
        * **PR #364**
          * **PR #365**
            * **PR #366**
  * **PR #375**

This tree was auto-generated by
[Charcoal](https://github.com/danerwilliams/charcoal)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants