Skip to content

Add Critical Logging for Low Disk Space Condition #9617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions healthcheck/fdcheck.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
package healthcheck

import (
"errors"
"fmt"
"os"
"syscall"
)

// CheckFileDescriptors checks if there are any free file descriptors available.
// It returns an error if no free file descriptors are available or if an unexpected error occurs.
func CheckFileDescriptors() error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commenting for lines ~ 12-26 .....

This check only triggers when file descriptors are completely exhausted, which might be too late for preventive action. Consider modifying this to accept a threshold parameter (similar to how DiskCheck uses RequiredRemaining) that would warn when file descriptors are running low but not yet exhausted .

// Attempt to open /dev/null to test for available file descriptors
fd, err := os.OpenFile(os.DevNull, os.O_RDONLY, 0)
if err != nil {
// Check if the error is due to "too many open files"
if errors.Is(err, syscall.EMFILE) {
return fmt.Errorf("no free file descriptors available")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider enhancing this error message to provide more context about the implications:

Suggested change
return fmt.Errorf("no free file descriptors available")
return fmt.Errorf("no free file descriptors available ; node operations requiring new connections may fail")

}

return fmt.Errorf("error checking file descriptors: %w", err)
}

fd.Close()
return nil
}
6 changes: 6 additions & 0 deletions lncfg/healthcheck.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ type HealthCheckConfig struct {

DiskCheck *DiskCheckConfig `group:"diskspace" namespace:"diskspace"`

FileDescriptorCheck *DiskCheckConfig `group:"file_descriptor" namespace:"file_descriptor"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new configuration option needs to be documented in the CLI help text and sample configurations. Ensure users understand:

  1. What this check monitors
  2. How to configure it
  3. What actions to take if the check fails


TLSCheck *CheckConfig `group:"tls" namespace:"tls"`

TorConnection *CheckConfig `group:"torconnection" namespace:"torconnection"`
Expand All @@ -48,6 +50,10 @@ func (h *HealthCheckConfig) Validate() error {
return err
}

if err := h.FileDescriptorCheck.validate("file descriptor"); err != nil {
return err
}

if err := h.TLSCheck.validate("tls"); err != nil {
return err
}
Expand Down
26 changes: 25 additions & 1 deletion server.go
Original file line number Diff line number Diff line change
Expand Up @@ -2022,6 +2022,14 @@ func (s *server) createLivenessMonitor(cfg *Config, cc *chainreg.ChainControl,
return nil
}

// Define the critical threshold (e.g., 5%)
const criticalThreshold = 0.05

// If free space is lesser than critical value, log a warning
if free < criticalThreshold {
srvrLog.Errorf("Disk space low: %.1f%% free space remaining", free*100)
}

return fmt.Errorf("require: %v free space, got: %v",
cfg.HealthChecks.DiskCheck.RequiredRemaining,
free)
Expand All @@ -2032,6 +2040,22 @@ func (s *server) createLivenessMonitor(cfg *Config, cc *chainreg.ChainControl,
cfg.HealthChecks.DiskCheck.Attempts,
)

// Add file descriptor check
fdCheck := healthcheck.NewObservation(
"file descriptors",
func() error {
if err := healthcheck.CheckFileDescriptors(); err != nil {
srvrLog.Criticalf("CRITICAL: No free file descriptors available: %v", err)
return err
}
return nil
},
cfg.HealthChecks.FileDescriptorCheck.Interval,
cfg.HealthChecks.FileDescriptorCheck.Timeout,
cfg.HealthChecks.FileDescriptorCheck.Backoff,
cfg.HealthChecks.FileDescriptorCheck.Attempts,
)

tlsHealthCheck := healthcheck.NewObservation(
"tls",
func() error {
Expand All @@ -2057,7 +2081,7 @@ func (s *server) createLivenessMonitor(cfg *Config, cc *chainreg.ChainControl,
)

checks := []*healthcheck.Observation{
chainHealthCheck, diskCheck, tlsHealthCheck,
chainHealthCheck, diskCheck, fdCheck, tlsHealthCheck,
}

// If Tor is enabled, add the healthcheck for tor connection.
Expand Down