Skip to content

Minimal Steps by Segment Index PR#239

Open
ashwinsawant17 wants to merge 6 commits intomainfrom
ashwin/step-seg-index
Open

Minimal Steps by Segment Index PR#239
ashwinsawant17 wants to merge 6 commits intomainfrom
ashwin/step-seg-index

Conversation

@ashwinsawant17
Copy link

I created the initialization of the index, and some basic get functions for the slice of StepRefs and its length (given by the len() function for Spans).

Copy link
Collaborator

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @ashwinsawant17! To bring this to the next step, can you please do a couple of minor "homework" tasks?

  1. Split this into a separate file, probably index.rs or something. It would be great to make sure this stays distinct from the core FlatGFA data structures.
  2. Run cargo fmt. Doing this is always a good idea when opening/updating a PR so your reviewers don't get distracted by formatting details.

Copy link
Collaborator

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I hit "submit" before including granular code-level comments. Here are some suggestions on the work so far!



/// helper to extract the segment index from the stepref
fn segment_of_step(fgfa: &FlatGFA, step: &StepRef) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this function to the top level, because it doesn't seem to reference the context.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the top level in index.rs in the latest commit.



/// helper to extract the segment index from the stepref
fn segment_of_step(fgfa: &FlatGFA, step: &StepRef) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would be a little clearer/more type-safe if it returned an Id<Segment> instead of a plain u32?


// organize by the index of the segment in the segment pool
all_steps.sort_by_key(|a| {
segment_of_step(fgfa, a)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that we could maybe make this a little more efficient by preserving the segment ID that we already had available in the previous stanza. That is, when we do the .enumerate() iteration, we know the segment ID at that point—so we could simply store that in the array. The array would then store (Id<Segment>, PathRef) pairs, which we could then sort conventionally without needing a custom sort key (that entails another lookup per element).

all_steps.sort_by_key(|a| {
segment_of_step(fgfa, a)
});

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could perhaps use a little bit more of a long comment here describing the strategy for the rest of the function. The idea is that, now that we've sorted stuff, we now need to identify the "runs" of PathRefs that are for the same segment; those "runs" become the spans that go in segment_steps. But a little explanation of how that works would go a long way…

Copy link
Collaborator

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few initial comments!

// The first traversal of this path over this segment.
uniq_depths[seg_id] += 1;
seen.set(seg_id, true);
if use_index {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these two routes have such completely different implementations, I say let's just put them in separate functions. It will make each one easier to read.


// sort the steprefs by the index of the segment in the segment pool
// by extracting the actual numeric index from the Id<Segment>
all_steps.sort_by_key(|a| a.0.index());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned you weren't sure whether this addressed my comment about sorting stuff. It does! This is exactly what I was thinking.

impl StepsBySegIndex {
pub fn new(fgfa: &FlatGFA) -> Self {
// will be our `steps` vector that contains all steprefs
let mut all_steps = Vec::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a bit clearer and more efficient using a Vec::collect() to avoid the mutability and the pushes in a loop. Here's how that would look:

let all_steps: Vec<_> = fgfa.paths.items().map(|(path_id, path)| {
  // your loop body here
  (seg, step)
}).collect();

Copy link
Collaborator

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a couple more comments about building up the vector of spans.

Comment on lines +78 to +80
for _ in 0..fgfa.segs.len() {
segment_steps.push(Span::new_empty());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do want to initialize a big array, there is a short syntax for that too: vec![n; initial]. So something like vec![fgfa.segs.len(); Span::new_empty()].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my initial thought! But I ran into errors with Spans not being cloneable (or something similar, I'll update this comment with the exact error I got).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further inspection, I think I would just need to make sure that StepRef needs to be cloneable. I think that's the reason I was running into issues earlier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could make it cloneable!

Comment on lines +76 to +77
// TODO: we definitely don't need to do another iteration to fill this with empty spans
// It's likely more efficient to push empty spans as needed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth measuring the cost. You could do a little benchmarking with Hyperfine to see what happens if you make these uninitialized.

Comment on lines +88 to +91
let new_span: Span<StepRef> = Span::new(Id::new(span_start), Id::new(i));

// assign the span to the index in segment_steps that maps to the index of the segment in the FlatGFA segment pool
segment_steps[seg_ind.index()] = new_span;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you are starting with an initialized array of spans, you could just mutate it in place instead of replacing the old one. It is also worth measuring whether this makes a difference too…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants