Minimal Steps by Segment Index PR by ashwinsawant17 · Pull Request #239 · cucapra/pollen

ashwinsawant17 · 2025-11-24T12:55:48Z

I created the initialization of the index, and some basic get functions for the slice of StepRefs and its length (given by the len() function for Spans).

sampsyo

Hi, @ashwinsawant17! To bring this to the next step, can you please do a couple of minor "homework" tasks?

Split this into a separate file, probably index.rs or something. It would be great to make sure this stays distinct from the core FlatGFA data structures.
Run cargo fmt. Doing this is always a good idea when opening/updating a PR so your reviewers don't get distracted by formatting details.

sampsyo

Oops, I hit "submit" before including granular code-level comments. Here are some suggestions on the work so far!

sampsyo · 2026-01-07T21:47:38Z

flatgfa/src/flatgfa.rs

+
+
+        /// helper to extract the segment index from the stepref
+        fn segment_of_step(fgfa: &FlatGFA, step: &StepRef) -> usize {


You can move this function to the top level, because it doesn't seem to reference the context.

Moved to the top level in index.rs in the latest commit.

sampsyo · 2026-01-07T21:50:32Z

flatgfa/src/flatgfa.rs

+
+
+        /// helper to extract the segment index from the stepref
+        fn segment_of_step(fgfa: &FlatGFA, step: &StepRef) -> usize {


Maybe this would be a little clearer/more type-safe if it returned an Id<Segment> instead of a plain u32?

sampsyo · 2026-01-07T21:54:03Z

flatgfa/src/flatgfa.rs

+
+        // organize by the index of the segment in the segment pool
+        all_steps.sort_by_key(|a| {
+            segment_of_step(fgfa, a)


It occurs to me that we could maybe make this a little more efficient by preserving the segment ID that we already had available in the previous stanza. That is, when we do the .enumerate() iteration, we know the segment ID at that point—so we could simply store that in the array. The array would then store (Id<Segment>, PathRef) pairs, which we could then sort conventionally without needing a custom sort key (that entails another lookup per element).

sampsyo · 2026-01-07T21:55:38Z

flatgfa/src/flatgfa.rs

+        all_steps.sort_by_key(|a| {
+            segment_of_step(fgfa, a)
+        });
+


This could perhaps use a little bit more of a long comment here describing the strategy for the rest of the function. The idea is that, now that we've sorted stuff, we now need to identify the "runs" of PathRefs that are for the same segment; those "runs" become the spans that go in segment_steps. But a little explanation of how that works would go a long way…

…commented

…get pushed earlier

sampsyo

Here are a few initial comments!

sampsyo · 2026-02-23T16:54:44Z

flatgfa/src/ops/depth.rs

-                // The first traversal of this path over this segment.
-                uniq_depths[seg_id] += 1;
-                seen.set(seg_id, true);
+    if use_index {


Because these two routes have such completely different implementations, I say let's just put them in separate functions. It will make each one easier to read.

sampsyo · 2026-02-23T16:57:40Z

flatgfa/src/index.rs

+
+        // sort the steprefs by the index of the segment in the segment pool
+        // by extracting the actual numeric index from the Id<Segment>
+        all_steps.sort_by_key(|a| a.0.index());


You mentioned you weren't sure whether this addressed my comment about sorting stuff. It does! This is exactly what I was thinking.

sampsyo · 2026-02-23T16:59:14Z

flatgfa/src/index.rs

+impl StepsBySegIndex {
+    pub fn new(fgfa: &FlatGFA) -> Self {
+        // will be our `steps` vector that contains all steprefs
+        let mut all_steps = Vec::new();


This would be a bit clearer and more efficient using a Vec::collect() to avoid the mutability and the pushes in a loop. Here's how that would look:

let all_steps: Vec<_> = fgfa.paths.items().map(|(path_id, path)| { // your loop body here (seg, step) }).collect();

sampsyo

And a couple more comments about building up the vector of spans.

sampsyo · 2026-02-23T17:01:39Z

flatgfa/src/index.rs

+        for _ in 0..fgfa.segs.len() {
+            segment_steps.push(Span::new_empty());
+        }


If you do want to initialize a big array, there is a short syntax for that too: vec![n; initial]. So something like vec![fgfa.segs.len(); Span::new_empty()].

This was my initial thought! But I ran into errors with Spans not being cloneable (or something similar, I'll update this comment with the exact error I got).

On further inspection, I think I would just need to make sure that StepRef needs to be cloneable. I think that's the reason I was running into issues earlier.

Yeah, we could make it cloneable!

sampsyo · 2026-02-23T17:02:58Z

flatgfa/src/index.rs

+        // TODO: we definitely don't need to do another iteration to fill this with empty spans
+        // It's likely more efficient to push empty spans as needed


I think it would be worth measuring the cost. You could do a little benchmarking with Hyperfine to see what happens if you make these uninitialized.

sampsyo · 2026-02-23T17:04:24Z

flatgfa/src/index.rs

+                let new_span: Span<StepRef> = Span::new(Id::new(span_start), Id::new(i));
+
+                // assign the span to the index in segment_steps that maps to the index of the segment in the FlatGFA segment pool
+                segment_steps[seg_ind.index()] = new_span;


Because you are starting with an initialized array of spans, you could just mutate it in place instead of replacing the old one. It is also worth measuring whether this makes a difference too…

Building segment_steps, and some basic get fns

4cec0f0

ashwinsawant17 requested a review from sampsyo November 24, 2025 12:55

sampsyo requested changes Jan 7, 2026

View reviewed changes

WinashAntsaw and others added 5 commits February 2, 2026 16:01

moved stepsegindex to new index.rs, added to lib.rs, more thoroughly …

0eb4415

…commented

removing code in index.rs from flatgfa.rs

523f690

added -n or --index flag to depth command, pushed bug fix that didnt …

26fb424

…get pushed earlier

removed debug print statement from index.rs

5ed77de

fixed bug with 0 depth segments, added unique depth functionality

887d72a

sampsyo requested changes Feb 23, 2026

View reviewed changes



		/// helper to extract the segment index from the stepref
		fn segment_of_step(fgfa: &FlatGFA, step: &StepRef) -> usize {

		// TODO: we definitely don't need to do another iteration to fill this with empty spans
		// It's likely more efficient to push empty spans as needed

Conversation

ashwinsawant17 commented Nov 24, 2025

Uh oh!

sampsyo left a comment

Choose a reason for hiding this comment

Uh oh!

sampsyo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sampsyo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sampsyo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants