-
Notifications
You must be signed in to change notification settings - Fork 31
Add support for restoring indexes from HTTP tar snapshots #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements async index restoration via restore_from parameter in CreateIndexRequest. Indexes can now be created from remote tar snapshots accessible via HTTP/HTTPS URLs. Key changes: - Added restore_from field to CreateIndexRequest and IndexOptions - New being_restored flag and restoration_complete condition variable - Background restoration using Scheduler.runOnce() for async execution - waitForIndexReady() method for clients to wait on restoration - Safety checks to prevent operations on restoring indexes - Cluster mode explicitly rejects restore (not compatible with NATS sync) - Refactored snapshot.zig to work with pre-initialized Index instances HTTP API returns ready: false when restoration is in progress. Clients must either poll or use waitForIndexReady() until complete.
WalkthroughAdds restore-from support for single-node indices, surfaces restoration state in index lifecycle, refactors snapshot restore to operate on pre-initialized Index objects, extends the create-index API with Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)src/MultiIndex.zig📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (2)📚 Learning: 2025-08-22T19:57:04.144ZApplied to files:
📚 Learning: 2025-08-17T10:37:12.955ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/ClusterMultiIndex.zig(1 hunks)src/MultiIndex.zig(11 hunks)src/api.zig(1 hunks)src/server.zig(3 hunks)src/snapshot.zig(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/MultiIndex.zig
📄 CodeRabbit inference engine (CLAUDE.md)
Support managing multiple named indexes via the service
Files:
src/MultiIndex.zig
🧠 Learnings (3)
📚 Learning: 2025-08-17T10:37:12.955Z
Learnt from: CR
PR: acoustid/acoustid-index#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T10:37:12.955Z
Learning: Applies to src/MultiIndex.zig : Support managing multiple named indexes via the service
Applied to files:
src/ClusterMultiIndex.zigsrc/MultiIndex.zig
📚 Learning: 2025-08-22T19:57:04.144Z
Learnt from: lalinsky
PR: acoustid/acoustid-index#114
File: src/server.zig:520-536
Timestamp: 2025-08-22T19:57:04.144Z
Learning: In src/server.zig handlePutIndex: PUT /index without restore options should be idempotent (not fail if index exists), but PUT /index with restore options should fail if index already exists (return 409). The fail_if_exists flag should only be set when restore options are provided.
Applied to files:
src/ClusterMultiIndex.zigsrc/server.zigsrc/MultiIndex.zig
📚 Learning: 2025-09-14T10:39:11.033Z
Learnt from: lalinsky
PR: acoustid/acoustid-index#134
File: src/ClusterMultiIndex.zig:245-261
Timestamp: 2025-09-14T10:39:11.033Z
Learning: In src/ClusterMultiIndex.zig, the stopping flag effectively prevents map mutations during graceful shutdown - handleMetaMessage and handleUpdateMessage both check stopping.load(.acquire) early and return, preventing processMetaOperation calls that would add/remove updaters from index_updaters map, making iterator-across-unlock pattern safe.
Applied to files:
src/ClusterMultiIndex.zig
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: fpindex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/MultiIndex.zig (1)
982-1013: Consider exposing restoration state in list.
listIndexes()currently returnsdeletedstatus but notbeing_restored. SinceCreateIndexResponseexposesready: falsefor restoring indexes, exposing restoration state here would help clients track restore progress across the fleet without polling individual index info endpoints.Add a
being_restoredfield toIndexInfoand populate it:pub const IndexInfo = struct { name: []const u8, generation: u64, deleted: bool, + being_restored: bool = false, };Then in
listIndexes():try result.append(IndexInfo{ .name = name, .generation = generation, .deleted = deleted, + .being_restored = index_ref.being_restored, });
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/MultiIndex.zig(11 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/MultiIndex.zig
📄 CodeRabbit inference engine (CLAUDE.md)
Support managing multiple named indexes via the service
Files:
src/MultiIndex.zig
🧠 Learnings (2)
📚 Learning: 2025-08-22T19:57:04.144Z
Learnt from: lalinsky
PR: acoustid/acoustid-index#114
File: src/server.zig:520-536
Timestamp: 2025-08-22T19:57:04.144Z
Learning: In src/server.zig handlePutIndex: PUT /index without restore options should be idempotent (not fail if index exists), but PUT /index with restore options should fail if index already exists (return 409). The fail_if_exists flag should only be set when restore options are provided.
Applied to files:
src/MultiIndex.zig
📚 Learning: 2025-08-17T10:37:12.955Z
Learnt from: CR
PR: acoustid/acoustid-index#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T10:37:12.955Z
Learning: Applies to src/MultiIndex.zig : Support managing multiple named indexes via the service
Applied to files:
src/MultiIndex.zig
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: fpindex
🔇 Additional comments (7)
src/MultiIndex.zig (7)
34-34: LGTM!Clean addition of the restore_from parameter to IndexOptions.
63-66: LGTM!Restoration state fields are well-chosen for coordinating async restoration. The condition variable enables blocking waits via
waitForIndexReady().
572-572: LGTM!The assertion ensures no code path can borrow an index that's being restored, which would violate the restoration invariant.
589-595: LGTM!Proper guards prevent operations on indexes being restored. The check after
createNewIndexat lines 628-630 correctly handles the restore-from case where the index isn't ready to borrow yet.Also applies to: 625-630
639-681: LGTM!Solid implementation of a blocking wait with timeout. The re-check at lines 669-676 after each wakeup correctly handles concurrent index deletion or recreation, preventing confusion from identically named but different IndexRefs.
709-711: LGTM!Correct guard prevents deletion of indexes undergoing restoration.
897-918: Guard against concurrent restore correctly implemented.The past review comment flagged the risk of overwriting a live IndexRef during restore. Lines 902-910 now correctly prevent this by checking if the index exists and is live before calling
createNewIndex, returningIndexBeingRestoredfor ongoing restores orIndexAlreadyExistsfor completed indexes. This aligns with the learning that PUT withrestore_fromshould not be idempotent.Based on learnings
Validates the invariant that index_ref.index.has_value is true before dereferencing index_ref.index.value when being_restored is true.
Ensures being_restored is cleared if index name/URL duplication or scheduler.runOnce fails during index creation with restore_from.
Ensures all in-progress restore operations finish before cleaning up indexes to prevent use-after-free or incomplete restore state.
Summary
Implements async index restoration from remote tar snapshots via HTTP/HTTPS URLs.
restore_fromparameter toCreateIndexRequestbeing_restoredduring async download/restoreScheduler.runOnce()for fire-and-forget executionready: falsein API response when restoration in progressImplementation Details
MultiIndex.zig:
being_restoredflag andrestoration_completecondition variablerestoreIndexTask()handles async download, restore, and error cleanupwaitForIndexReady()allows blocking wait with timeoutborrowIndex()prevents race conditionssnapshot.zig:
downloadAndRestoreSnapshot()downloads and extracts into existing indexHTTP API:
{ready: false}for restore operationsIndexBeingRestorederror mapped to 404 (GET) or 409 (DELETE)Test Coverage
All existing unit tests pass. Integration tests for the full restore flow would be beneficial but are not included in this PR.