Skip to content

fix: use shared fsnotify watcher to avoid 'too many open files' error#236

Merged
johnlanni merged 1 commit into
mainfrom
fix/shared-watcher
Feb 1, 2026
Merged

fix: use shared fsnotify watcher to avoid 'too many open files' error#236
johnlanni merged 1 commit into
mainfrom
fix/shared-watcher

Conversation

@johnlanni
Copy link
Copy Markdown
Contributor

Problem

The API server crashes with panic: too many open files when creating REST storage for resources.

Root cause: Each resource type (ConfigMap, Secret, Ingress, etc.) creates its own fsnotify.Watcher instance in NewFileREST(), which consumes file descriptors. When the number of resource types exceeds the system limit, the application crashes.

Solution

Use a shared fsnotify.Watcher across all fileREST instances:

  1. Create one global watcher in completedConfig.New() for File storage mode
  2. Pass the shared watcher to all NewFileREST() calls
  3. Update Destroy() to not close the watcher (managed by server lifecycle)

Changes

  • src/apiserver/pkg/apiserver/apiserver.go: Create and pass shared watcher
  • src/apiserver/pkg/registry/file_rest.go: Accept shared watcher parameter, remove per-instance watcher creation

Testing

Before: Application crashes with "too many open files" when registering multiple resources
After: All resources share one watcher, file descriptor usage reduced significantly

Fixes the panic reported in production environments.

@johnlanni johnlanni requested a review from CH3CHO as a code owner February 1, 2026 14:37
@lingma-agents
Copy link
Copy Markdown

lingma-agents Bot commented Feb 1, 2026

🔍 代码审查进行中

⏳ 正在审查

⏰️ 剩余时间:约需数分钟

🔄 分支流向: fix/shared-watchermain

📦 提交: 审查当前PR从5d222ac64986c6的提交。


📒 文件清单 (2 个文件)
📝 变更: 2 个文件

📝 变更文件:

  • src/apiserver/pkg/apiserver/apiserver.go
  • src/apiserver/pkg/registry/file_rest.go

@johnlanni johnlanni merged commit 7948cb4 into main Feb 1, 2026
3 of 4 checks passed
@lingma-agents
Copy link
Copy Markdown

lingma-agents Bot commented Feb 1, 2026

CodeReview流程已终止

johnlanni added a commit to higress-group/higress that referenced this pull request Feb 2, 2026
Add troubleshooting section for 'too many open files' error caused by
insufficient fs.inotify.max_user_instances limit. This commonly occurs
on systems with many Docker containers.

Solution: Increase the limit to 8192 via sysctl.

Related: higress-group/higress-standalone#236
johnlanni added a commit to higress-group/higress that referenced this pull request Feb 2, 2026
Move detailed troubleshooting content from SKILL.md to
references/TROUBLESHOOTING.md to keep the main skill file concise.

Changes:
- Create references/TROUBLESHOOTING.md with all troubleshooting content
- Replace Troubleshooting section in SKILL.md with summary + link
- Add inotify max_user_instances issue and solution

Related: higress-group/higress-standalone#236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant