Description
🐞 Bug Report
Affected Services [REQUIRED]
App Function SDK
Is this a regression?
No.
Description and Minimal Reproduction [REQUIRED]
🔥 Exception or Error
We are experiencing an error with the Run()
and Stop()
functions. From this error, we have found a second bug that causes a panic.
Error 1: Call to Stop() fails:
Having an application runnung:
err := app.service.Run()
if err != nil {
app.lc.Errorf("Run returned error: %s", err.Error())
return -1
}
When Stop()
is called either via a signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
or directly calling the Stop()
function, the defered functions from the Run()
return an error:
level=INFO ts=2025-03-26T08:55:37.572349939Z app=neurona-app-dlm source=service.go:262 msg="Terminating: svc.Stop called"
level=INFO ts=2025-03-26T08:55:37.572388486Z app=neurona-app-dlm source=messaging.go:107 msg="Exiting waiting for MessageBus 'edgex/events/#' topic messages"
level=INFO ts=2025-03-26T08:55:37.572408793Z app=neurona-app-dlm source=messaging.go:123 msg="Exiting waiting for MessageBus errors and background publishing"
level=INFO ts=2025-03-26T08:55:37.572420665Z app=neurona-app-dlm source=configupdates.go:55 msg="Exiting waiting for App Service configuration updates"
level=INFO ts=2025-03-26T08:55:37.572436479Z app=neurona-app-dlm source=config.go:852 msg="Watching for 'Writable' configuration changes has stopped"
level=INFO ts=2025-03-26T08:55:37.572443286Z app=neurona-app-dlm source=manager.go:110 msg="Exited Metrics Manager Run..."
level=INFO ts=2025-03-26T08:55:37.572442284Z app=neurona-app-dlm source=config.go:852 msg="Watching for 'Writable' configuration changes has stopped"
level=INFO ts=2025-03-26T08:55:37.572452654Z app=neurona-app-dlm source=config.go:790 msg="Watching for 'Writable' configuration changes has stopped"
level=INFO ts=2025-03-26T08:55:37.57252818Z app=neurona-app-dlm source=messaging.go:95 msg="Disconnected from MessageBus"
level=INFO ts=2025-03-26T08:55:37.57254277Z app=neurona-app-dlm source=bootstrap.go:146 msg="Un-Registering service from the Registry"
level=INFO ts=2025-03-26T08:55:37.572823053Z app=neurona-app-dlm source=messaging.go:155 msg="Disconnecting from the message bus"
level=ERROR ts=2025-03-26T08:55:37.572836888Z app=neurona-app-dlm source=messaging.go:158 msg="Unable to disconnect from the message bus: Unable to disconnect client(s): Unable to disconnect publish client: redis: client is closed"
Error 2: Call to Stop() and calling again to Run() (to restart) causes a panic:
In addition, we are trying to run and stop the application service several times. We do this as we have an arquitecture loosely coupled to EdgeX. Our service makes use of the EdgeX application service and SDK only when needed. When we need to receive events from devices we run it to obtain the events via the message bus, otherwise it is stopped to save resources.
Considering that when the Stop()
is done, we receive the error mentioned above, calling Run()
again fails.
level=WARN ts=2025-03-26T10:22:20.605621125Z app=neurona-app-dlm source=triggermessageprocessor.go:88 msg="MessagesReceived metric failed to register and will not be reported: duplicate metric: MessagesReceived"
level=WARN ts=2025-03-26T10:22:20.60564425Z app=neurona-app-dlm source=triggermessageprocessor.go:94 msg="InvalidMessagesReceived metric failed to register and will not be reported: duplicate metric: InvalidMessagesReceived"
level=INFO ts=2025-03-26T10:22:20.605654636Z app=neurona-app-dlm source=triggerfactory.go:51 msg="EdgeX MessageBus trigger selected"
level=INFO ts=2025-03-26T10:22:20.605664827Z app=neurona-app-dlm source=messaging.go:64 msg="Initializing EdgeX Message Bus Trigger for 'redis'"
level=INFO ts=2025-03-26T10:22:20.605678609Z app=neurona-app-dlm source=messaging.go:84 msg="Subscribing to topic: edgex/events/#"
level=INFO ts=2025-03-26T10:22:20.605687555Z app=neurona-app-dlm source=messaging.go:92 msg="Publishing to topic: edgex/neurona-app-dlm"
level=ERROR ts=2025-03-26T10:22:20.605700379Z app=neurona-app-dlm source=service.go:231 msg="failed to subscribe to topic(s) '': subscription for 'edgex/events/#' topic already exists, must be unique"
level=INFO ts=2025-03-26T10:22:20.605708829Z app=neurona-app-dlm source=server.go:168 msg="Starting HTTP Web Server on address neurona-app-dlm:48002"
level=INFO ts=2025-03-26T10:22:20.605718955Z app=neurona-app-dlm source=messaging.go:123 msg="Exiting waiting for MessageBus errors and background publishing"
level=INFO ts=2025-03-26T10:22:20.605734228Z app=neurona-app-dlm source=messaging.go:102 msg="Waiting for messages from the MessageBus on the 'edgex/events/#' topic"
level=INFO ts=2025-03-26T10:22:20.605744213Z app=neurona-app-dlm source=messaging.go:107 msg="Exiting waiting for MessageBus 'edgex/events/#' topic messages"
panic: send on closed channel
goroutine 164 [running]:
github.com/edgexfoundry/app-functions-sdk-go/v3/internal/webserver.(*WebServer).listenAndServe(0xc0004b1830, 0x12a05f200, 0xc0004d48a0)
github.com/edgexfoundry/app-functions-sdk-go/[email protected]/internal/webserver/server.go:169 +0x34d
github.com/edgexfoundry/app-functions-sdk-go/v3/internal/webserver.(*WebServer).StartWebServer.func1()
github.com/edgexfoundry/app-functions-sdk-go/[email protected]/internal/webserver/server.go:104 +0xa6
created by github.com/edgexfoundry/app-functions-sdk-go/v3/internal/webserver.(*WebServer).StartWebServer in goroutine 261
github.com/edgexfoundry/app-functions-sdk-go/[email protected]/internal/webserver/server.go:100 +0x67
The error is:
level=ERROR ts=2025-03-26T10:22:20.605700379Z app=neurona-app-dlm source=service.go:231 msg="failed to subscribe to topic(s) '': subscription for 'edgex/events/#' topic already exists, must be unique"
Seems to be due the Stop()
don't cleaning properly all the resources. On the other hand, the panic seems to be a bug due to the order of execution (no the error directly). If we check the Run()
implementation:
func (svc *Service) Run() error {
config := container.ConfigurationFrom(svc.dic.Get)
err := initializeStoreClient(config, svc)
if err != nil {
return err
}
runCtx, stop := context.WithCancel(context.Background())
svc.ctx.stop = stop
httpErrors := make(chan error)
defer close(httpErrors)
svc.webserver.StartWebServer(httpErrors)
// determine input type and create trigger for it
t := svc.setupTrigger(svc.config)
if t == nil {
return errors.New("failed to create Trigger")
}
// Initialize the trigger (i.e. start a web server, or connect to message bus)
deferred, err := t.Initialize(svc.ctx.appWg, svc.ctx.appCtx, svc.backgroundPublishChannel)
if err != nil {
svc.lc.Error(err.Error())
return errors.New("failed to initialize Trigger")
}
// deferred is a function that needs to be called when services exits.
svc.addDeferred(deferred)
if svc.config.GetWritableInfo().StoreAndForward.Enabled {
svc.startStoreForward()
} else {
svc.lc.Info("StoreAndForward disabled. Not running retry loop.")
}
svc.lc.Info(svc.config.Service.StartupMsg)
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
select {
case httpError := <-httpErrors:
svc.lc.Info("Http error received: ", httpError.Error())
err = httpError
case signalReceived := <-signals:
svc.lc.Info("Terminating signal received: " + signalReceived.String())
case <-runCtx.Done():
svc.lc.Info("Terminating: svc.Stop called")
}
svc.ctx.stop = nil
if svc.config.GetWritableInfo().StoreAndForward.Enabled {
svc.ctx.storeForwardCancelCtx()
svc.ctx.storeForwardWg.Wait()
}
svc.ctx.appCancelCtx() // Cancel all long-running go funcs
svc.ctx.appWg.Wait()
// Call all the deferred funcs that need to happen when exiting.
// These are things like un-register from the Registry, disconnect from the Message Bus, etc
for _, deferredFunc := range svc.deferredFunctions {
deferredFunc()
}
return err
}
The origin of the panic are these lines:
httpErrors := make(chan error)
defer close(httpErrors)
svc.webserver.StartWebServer(httpErrors)
The server (inside StartWebServer
) is started in a different Go Routine, and it uses the channel to report errors to Run()
. However, Run()
exits and closes the channel. While the server routine tries to send an error via the channel that is closed (causing the panic).
What causes the Run()
to return early is this line (called after the server is started).
deferred, err := t.Initialize(svc.ctx.appWg, svc.ctx.appCtx, svc.backgroundPublishChannel)
if err != nil {
svc.lc.Error(err.Error())
return errors.New("failed to initialize Trigger")
}
We have found that launching the server right before the select
statement doesn't cause the panic (after all the possible early returns)
🌍 Your Environment
Deployment Environment: Ubuntu 20.04
EdgeX Version [REQUIRED]: 3.1
Anything else relevant?
We haven't tried if this was fixed on version 4.0, however, we have checked the code and seen no changes.
Thank you!
Metadata
Metadata
Assignees
Labels
Type
Projects
Status