Skip to content

fix: retry cloudflared quick tunnel startup#66

Open
cmungall wants to merge 1 commit into
DagsHub:mainfrom
cmungall:feat/issue-0-cloudflared-retry
Open

fix: retry cloudflared quick tunnel startup#66
cmungall wants to merge 1 commit into
DagsHub:mainfrom
cmungall:feat/issue-0-cloudflared-retry

Conversation

@cmungall

Copy link
Copy Markdown

Summary

  • retry Cloudflare quick tunnel startup before surfacing an error
  • wait for the actual trycloudflare URL instead of failing on any early stderr line containing "error"
  • add regression tests for noisy stderr and transient first-attempt failures

Testing

  • npm test
  • npm run typecheck

Copilot AI review requested due to automatic review settings March 18, 2026 00:08
@coderabbitai

coderabbitai Bot commented Mar 18, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This pull request refactors CloudflaredManager to introduce dependency injection, configurable retry logic, and enhanced error handling. The manager now supports customizable startup timeouts, maximum retry attempts, and retry delays, with a robust start flow that gracefully handles process failures and implements a graceful stop mechanism.

Changes

Cohort / File(s) Summary
Manager Refactoring
src/backend/cloudflared/manager.ts
Introduced dependency injection via CloudflaredManagerOptions constructor, replacing hardcoded spawn behavior. Added retryable start lifecycle with startOnce() encapsulation, configurable timeouts/attempts, and graceful stop(). Enhanced error handling with stderr accumulation, URL extraction via regex, and descriptive error messages. Made isExecutableAvailable and tryInstall injectable for testability.
Test Suite
tests/backend/cloudflared-manager.test.ts
Added comprehensive test suite with FakeCloudflaredProcess mock. Tests validate successful start with error-to-success stderr/stdout flow and retry scenario where first attempt fails and second succeeds, verifying process invocation counts and URL extraction.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Manager as CloudflaredManager
    participant SpawnProcess as spawnProcess (injected)
    participant Streams as Process Streams

    Caller->>Manager: start(port)
    
    loop Retry Loop (1 to maxAttempts)
        Manager->>Manager: startOnce(port)
        Manager->>SpawnProcess: spawn cloudflared with port
        SpawnProcess-->>Streams: create stdout/stderr
        
        par Stream Handlers
            Streams->>Manager: emit stderr data
            Manager->>Manager: accumulate stderr & check for errors
        and
            Streams->>Manager: emit stdout data
            Manager->>Manager: extract URL via regex
        end
        
        alt URL Found
            Manager-->>Manager: resolve promise with CloudflaredResult
        else Error Detected
            Manager->>Manager: stop() - kill process
            Manager->>Manager: delay(retryDelayMs)
            Note over Manager: Continue to next retry attempt
        else Timeout/Exit
            Manager->>Manager: descriptive error from stderr
            Manager->>Manager: attempt next retry or throw
        end
    end
    
    Manager-->>Caller: CloudflaredResult | Error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Poem

🐰 A tunnel we build with care and retry,
When clouds flutter down, we spring back high!
With streams that flow and URLs we seek,
Our manager now stands strong, not weak! ✨

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding retry logic to cloudflared tunnel startup, which aligns with the primary objective of the changeset.
Description check ✅ Passed The description is directly related to the changeset, providing specific details about the changes: retry logic, URL waiting behavior, and new tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Migrating from UI to YAML configuration.

Use the @coderabbitai configuration command in a PR comment to get a dump of all your UI settings in YAML format. You can then edit this YAML file and upload it to the root of your repository to configure CodeRabbit programmatically.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the robustness of Cloudflare cloudflared quick tunnel startup by retrying startup failures and by waiting for an actual trycloudflare.com URL instead of failing on early stderr “error” output.

Changes:

  • Add retry support and configurable startup timing (timeout / attempts / delay) to CloudflaredManager.
  • Change startup parsing to resolve only when a trycloudflare.com URL is detected and improve error messages by summarizing stderr.
  • Add Vitest regression coverage for noisy stderr and transient first-attempt failures.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/backend/cloudflared/manager.ts Implements retry loop, injectable spawn/exec for testing, and improved startup output handling.
tests/backend/cloudflared-manager.test.ts Adds regression tests for noisy stderr and transient startup failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +211 to +216
timeout = setTimeout(() => {
if (this.process === child) {
this.process = undefined;
}
finish(() => reject(buildError("Timed out waiting for cloudflared URL")));
}, this.startupTimeoutMs);
clearTimeout(timeout);
reject(new Error(`cloudflared error: ${text.trim()}`));
}
const onStderr = (chunk: Buffer | string): void => {

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 039595b8-5f18-4685-ae9d-ae9b267598fa

📥 Commits

Reviewing files that changed from the base of the PR and between 493c404 and a4074b0.

📒 Files selected for processing (2)
  • src/backend/cloudflared/manager.ts
  • tests/backend/cloudflared-manager.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (7)
tests/backend/cloudflared-manager.test.ts (1)

1-77: Test suite looks well-designed for the retry logic.

The dependency injection approach with FakeCloudflaredProcess and mocked spawnProcess provides good isolation. The two tests cover the core scenarios from the PR objectives: handling noisy stderr and retrying after transient failures.

Consider adding a test for the exhausted-retries path (all attempts fail) to verify the final error message includes attempt count and last error.
[approve_code_changes, suggest_optional_refactor]

src/backend/cloudflared/manager.ts (6)

8-38: LGTM!

The dependency injection types and default constants are well-defined. The SpawnedProcess interface properly abstracts the child process for testability while matching the essential Node.js ChildProcess shape.


55-62: LGTM!

Clean helper for truncating stderr in error messages. The math is correct (397 + 3 = 400 max chars).


64-82: LGTM!

The constructor properly implements dependency injection with sensible defaults. The double cast on line 78 is necessary for type compatibility with the simplified SpawnedProcess interface.


112-131: LGTM!

The retry loop is well-structured: catches errors, cleans up with stop(), delays between attempts, and provides a clear final error message with attempt count.


133-138: LGTM!

The stop() method correctly guards against double-kill and uses SIGTERM for graceful termination.


220-256: LGTM!

The isExecutableAvailable, tryInstall, and delay methods properly use the injected execFileImpl for testability. The delay helper correctly short-circuits for zero/negative values, which the tests rely on.

Comment on lines +211 to +216
timeout = setTimeout(() => {
if (this.process === child) {
this.process = undefined;
}
finish(() => reject(buildError("Timed out waiting for cloudflared URL")));
}, this.startupTimeoutMs);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resource leak: child process not killed on timeout.

When the timeout fires, this.process is cleared before rejecting, but the actual child process is never killed. When the retry loop catches the error and calls this.stop(), this.process is already undefined, so the orphaned process continues running.

🐛 Proposed fix: kill the child before clearing the reference
       timeout = setTimeout(() => {
+        if (!child.killed) {
+          child.kill("SIGTERM");
+        }
         if (this.process === child) {
           this.process = undefined;
         }
         finish(() => reject(buildError("Timed out waiting for cloudflared URL")));
       }, this.startupTimeoutMs);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
timeout = setTimeout(() => {
if (this.process === child) {
this.process = undefined;
}
finish(() => reject(buildError("Timed out waiting for cloudflared URL")));
}, this.startupTimeoutMs);
timeout = setTimeout(() => {
if (!child.killed) {
child.kill("SIGTERM");
}
if (this.process === child) {
this.process = undefined;
}
finish(() => reject(buildError("Timed out waiting for cloudflared URL")));
}, this.startupTimeoutMs);

@guysmoilov guysmoilov added area:backend Backend/server code area:network Connections and client networking labels Mar 18, 2026
@guysmoilov guysmoilov self-requested a review March 18, 2026 08:39
@guysmoilov

Copy link
Copy Markdown
Member

Thanks for the contribution!
Until I get to reviewing it myself, please check out the comments from copilot and coderabbit.

Can you provide some background on what you saw happen? I've personally never seen cloudflared fail to start up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:backend Backend/server code area:network Connections and client networking

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants