Skip to content

Add make stitch operation#148

Merged
dkhofer merged 7 commits intomainfrom
make-stitch
Mar 5, 2025
Merged

Add make stitch operation#148
dkhofer merged 7 commits intomainfrom
make-stitch

Conversation

@dkhofer
Copy link
Contributor

@dkhofer dkhofer commented Feb 15, 2025

Add "make stitch" operation.

cargo run -- init
cargo run -- import --fasta fixtures/simple.fa
# Create two example integration points
cargo run -- update --fasta fixtures/aa.fa --start 3 --end 5 --region-name m123 --new-sample test1
cargo run -- update --fasta fixtures/aa.fa --start 15 --end 20 --region-name m123 --sample test1 --new-sample test2
# Derive chunks with breakpoints in between the edits
cargo run -- derive-chunks --new-sample splits --region m123 --breakpoints 1,8,25 --sample test2
# View one of the new chunks
cargo run -- view -s splits m123.2
# Stitch two chunks together
cargo run -- make-stitch --sample splits --new-sample stitched --regions m123.2,m123.3 --new-region m123.2-3

.collect::<Vec<_>>();
let source_edges = Edge::bulk_load(conn, &subgraph_edge_ids);

// Instead of reusing the existing edges, we create copies of the nodes and edges. This
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like the easiest solution. From talking with Bob about it today, one problem is propagating annotations. If nodes change from one sample to a child sample, but represent the same part of a sequence, it's harder to pass annotations along. I'm open to other ideas for what to do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the challenges hit when reusing nodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave this comment here for future-me?

TODO: Annotation support. We do not reuse node ids because nodes control the graph topology. If we duplicated nodes via these operations, it would leave to unintentional cycles. A node cannot exist in 2 places. This will be non-intuitive for annotation propagation and needs to be handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done

@dkhofer dkhofer requested a review from Chris7 February 26, 2025 21:30
let new_end_edge = child_block_group_edges
.iter()
.find(|e| {
e.edge.target_node_id == PATH_END_NODE_ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the Node::is_start_node/etc. methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yep, done

new_path_edge_ids.push(concatenated_path_edges[concatenated_path_edges.len() - 1].id);

// Part 5: Create bg edges for the new edges
let mut chromosome_index_counter = edges_to_reuse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're adding +1 here, it seems like we're trying to get the first unused chromosome_index. If this is true, shouldn't this by max_by?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 No idea why I had min_by there, fixed

Copy link
Contributor

@Chris7 Chris7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info on reusing node ids. non-blocking comments added.

@dkhofer dkhofer merged commit 505f1c8 into main Mar 5, 2025
1 check passed
@dkhofer dkhofer deleted the make-stitch branch March 5, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants