Skip to content

Conversation

@ESultanik
Copy link
Collaborator

Summary

  • Replace O(n²) linear search with heap-based O(n log n) algorithm
  • Improves performance for diffs with many overlapping edit bounds

Changes

In bounds.py:make_distinct(), the previous implementation searched all intervals linearly on each iteration to find the biggest interval. This made the overall complexity O(n²).

Now we use a max-heap (via heapq with negative sizes) to find the biggest interval in O(log n) amortized time:

  • Track valid intervals with a set for O(1) membership checks
  • Re-add intervals to heap with updated sizes after tightening
  • Handle stale heap entries by verifying sizes match before use

Complexity Analysis

  • Before: O(n²) - linear search on each of O(n) iterations
  • After: O(n log n) - heap operations on each iteration

Test plan

  • All 66 tests pass

🤖 Generated with Claude Code

Replace O(n²) linear search for biggest interval with heap-based
approach. The previous implementation searched all intervals linearly
on each iteration of the outer loop. Now we use a max-heap to find
the biggest interval in O(log n) amortized time.

Key changes:
- Use heapq with negative sizes for max-heap behavior
- Track valid intervals with a set for O(1) membership checks
- Re-add intervals to heap with updated sizes after tightening
- Handle stale heap entries by verifying sizes before use

This improves performance for diffs with many overlapping edit bounds.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants