Skip to content

fix: revert to per-resource watcher + increase FD limit at startup#238

Merged
johnlanni merged 1 commit into
mainfrom
fix/centralized-watcher-dispatcher
Feb 1, 2026
Merged

fix: revert to per-resource watcher + increase FD limit at startup#238
johnlanni merged 1 commit into
mainfrom
fix/centralized-watcher-dispatcher

Conversation

@johnlanni
Copy link
Copy Markdown
Contributor

Problem

PR #236's shared watcher approach causes runtime failure:

command failed err="failed to create shared file watcher: too many open files"

Root Cause Analysis

The shared watcher design has fundamental architectural flaws:

  1. Event channel conflict: Each fileREST instance starts a goroutine reading from the same watcher.Events channel

    • Multiple goroutines competing for events
    • Events are distributed randomly (only one goroutine receives each event)
    • Resources miss their own file change events
  2. Directory watch conflicts: Each resource calls dirWatcher.Add(objRootPath) on different subdirectories

    • Potentially causes fsnotify internal issues
  3. Still hits FD limit: Even with shared watcher, other initialization steps may consume FDs before watcher creation

Solution

Revert to per-resource watcher (simpler & correct) + increase system FD limit early:

Changes

  1. main.go: Add init() to increase FD limit to 65535 before any initialization

    • Runs before any watcher/file operations
    • Logs current and updated limits for debugging
    • Gracefully handles permission errors
  2. file_rest.go: Restore per-resource watcher creation

    • Each resource gets isolated event stream (correct behavior)
    • Add resource name to error messages for debugging
    • Properly close watcher in Destroy()
  3. apiserver.go: Remove shared watcher logic

    • Simplify code
    • Remove fsnotify import

Why This Works

  • FD limit increased early: Before any file operations
  • Proper event isolation: Each resource processes only its own events
  • Simpler architecture: Less complexity = fewer bugs
  • Typical deployments have <50 resources: Even at 1 watcher/resource, well under 65535 limit

Testing

  • Compiles successfully
  • FD limit increased at startup (logged)
  • Each resource independently watches its directory
  • No event distribution conflicts

Alternative Considered

A truly shared watcher would require:

  • Single goroutine reading events
  • Event dispatcher routing to correct resource based on file path
  • Complex synchronization

This adds significant complexity for minimal benefit given FD limits are easily increased.


Supersedes #236 and #237

The shared watcher approach has fundamental issues:
- Each fileREST instance starts a goroutine reading from the same Events channel
- This causes event distribution conflicts between resources
- Multiple resources adding the same directory causes issues

Solution:
1. Revert to each resource having its own watcher
2. Increase file descriptor limit to 65535 in main.go init()
3. Add better error messages when watcher creation fails

This ensures proper event isolation while providing enough FD headroom.
@johnlanni johnlanni requested a review from CH3CHO as a code owner February 1, 2026 15:36
@lingma-agents
Copy link
Copy Markdown

lingma-agents Bot commented Feb 1, 2026

🔍 代码审查进行中

⏳ 正在审查

⏰️ 剩余时间:约需数分钟

🔄 分支流向: fix/centralized-watcher-dispatchermain

📦 提交: 审查当前PR从a67f01187d6aae的提交。


📒 文件清单 (3 个文件)
📝 变更: 3 个文件

📝 变更文件:

  • src/apiserver/main.go
  • src/apiserver/pkg/apiserver/apiserver.go
  • src/apiserver/pkg/registry/file_rest.go

@johnlanni johnlanni merged commit 9b8c790 into main Feb 1, 2026
3 of 4 checks passed
@lingma-agents
Copy link
Copy Markdown

lingma-agents Bot commented Feb 1, 2026

CodeReview流程已终止

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant