fix: retry cloudflared quick tunnel startup#66
Conversation
📝 WalkthroughWalkthroughThis pull request refactors CloudflaredManager to introduce dependency injection, configurable retry logic, and enhanced error handling. The manager now supports customizable startup timeouts, maximum retry attempts, and retry delays, with a robust start flow that gracefully handles process failures and implements a graceful stop mechanism. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Manager as CloudflaredManager
participant SpawnProcess as spawnProcess (injected)
participant Streams as Process Streams
Caller->>Manager: start(port)
loop Retry Loop (1 to maxAttempts)
Manager->>Manager: startOnce(port)
Manager->>SpawnProcess: spawn cloudflared with port
SpawnProcess-->>Streams: create stdout/stderr
par Stream Handlers
Streams->>Manager: emit stderr data
Manager->>Manager: accumulate stderr & check for errors
and
Streams->>Manager: emit stdout data
Manager->>Manager: extract URL via regex
end
alt URL Found
Manager-->>Manager: resolve promise with CloudflaredResult
else Error Detected
Manager->>Manager: stop() - kill process
Manager->>Manager: delay(retryDelayMs)
Note over Manager: Continue to next retry attempt
else Timeout/Exit
Manager->>Manager: descriptive error from stderr
Manager->>Manager: attempt next retry or throw
end
end
Manager-->>Caller: CloudflaredResult | Error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip Migrating from UI to YAML configuration.Use the |
There was a problem hiding this comment.
Pull request overview
This PR improves the robustness of Cloudflare cloudflared quick tunnel startup by retrying startup failures and by waiting for an actual trycloudflare.com URL instead of failing on early stderr “error” output.
Changes:
- Add retry support and configurable startup timing (timeout / attempts / delay) to
CloudflaredManager. - Change startup parsing to resolve only when a
trycloudflare.comURL is detected and improve error messages by summarizing stderr. - Add Vitest regression coverage for noisy stderr and transient first-attempt failures.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/backend/cloudflared/manager.ts |
Implements retry loop, injectable spawn/exec for testing, and improved startup output handling. |
tests/backend/cloudflared-manager.test.ts |
Adds regression tests for noisy stderr and transient startup failures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| timeout = setTimeout(() => { | ||
| if (this.process === child) { | ||
| this.process = undefined; | ||
| } | ||
| finish(() => reject(buildError("Timed out waiting for cloudflared URL"))); | ||
| }, this.startupTimeoutMs); |
| clearTimeout(timeout); | ||
| reject(new Error(`cloudflared error: ${text.trim()}`)); | ||
| } | ||
| const onStderr = (chunk: Buffer | string): void => { |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 039595b8-5f18-4685-ae9d-ae9b267598fa
📒 Files selected for processing (2)
src/backend/cloudflared/manager.tstests/backend/cloudflared-manager.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (7)
tests/backend/cloudflared-manager.test.ts (1)
1-77: Test suite looks well-designed for the retry logic.The dependency injection approach with
FakeCloudflaredProcessand mockedspawnProcessprovides good isolation. The two tests cover the core scenarios from the PR objectives: handling noisy stderr and retrying after transient failures.Consider adding a test for the exhausted-retries path (all attempts fail) to verify the final error message includes attempt count and last error.
[approve_code_changes, suggest_optional_refactor]src/backend/cloudflared/manager.ts (6)
8-38: LGTM!The dependency injection types and default constants are well-defined. The
SpawnedProcessinterface properly abstracts the child process for testability while matching the essential Node.js ChildProcess shape.
55-62: LGTM!Clean helper for truncating stderr in error messages. The math is correct (397 + 3 = 400 max chars).
64-82: LGTM!The constructor properly implements dependency injection with sensible defaults. The double cast on line 78 is necessary for type compatibility with the simplified
SpawnedProcessinterface.
112-131: LGTM!The retry loop is well-structured: catches errors, cleans up with
stop(), delays between attempts, and provides a clear final error message with attempt count.
133-138: LGTM!The
stop()method correctly guards against double-kill and uses SIGTERM for graceful termination.
220-256: LGTM!The
isExecutableAvailable,tryInstall, anddelaymethods properly use the injectedexecFileImplfor testability. Thedelayhelper correctly short-circuits for zero/negative values, which the tests rely on.
| timeout = setTimeout(() => { | ||
| if (this.process === child) { | ||
| this.process = undefined; | ||
| } | ||
| finish(() => reject(buildError("Timed out waiting for cloudflared URL"))); | ||
| }, this.startupTimeoutMs); |
There was a problem hiding this comment.
Resource leak: child process not killed on timeout.
When the timeout fires, this.process is cleared before rejecting, but the actual child process is never killed. When the retry loop catches the error and calls this.stop(), this.process is already undefined, so the orphaned process continues running.
🐛 Proposed fix: kill the child before clearing the reference
timeout = setTimeout(() => {
+ if (!child.killed) {
+ child.kill("SIGTERM");
+ }
if (this.process === child) {
this.process = undefined;
}
finish(() => reject(buildError("Timed out waiting for cloudflared URL")));
}, this.startupTimeoutMs);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| timeout = setTimeout(() => { | |
| if (this.process === child) { | |
| this.process = undefined; | |
| } | |
| finish(() => reject(buildError("Timed out waiting for cloudflared URL"))); | |
| }, this.startupTimeoutMs); | |
| timeout = setTimeout(() => { | |
| if (!child.killed) { | |
| child.kill("SIGTERM"); | |
| } | |
| if (this.process === child) { | |
| this.process = undefined; | |
| } | |
| finish(() => reject(buildError("Timed out waiting for cloudflared URL"))); | |
| }, this.startupTimeoutMs); |
|
Thanks for the contribution! Can you provide some background on what you saw happen? I've personally never seen cloudflared fail to start up |
Summary
Testing