Skip to content

Scheduler kills whole cluster with 10 instances #1

@philipgiuliani

Description

@philipgiuliani

Hi,
we have a Cluster running 10 instances that gets killed when using quantum-swarm. With just 2 instances it was working fine.

08:12:56.435 [info] [swarm on A] [tracker:ensure_swarm_started_on_remote_node] nodeup B
08:12:56.435 [info] [swarm on A] [tracker:handle_topology_change] topology change complete
08:13:18.898 [info] GenStage consumer MyProject.Scheduler.ExecutorSupervisor is stopping after receiving cancel from producer #PID<61039.8804.0> with reason: :shutdown
08:13:18.898 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094427137.33179>, :process, #PID<61039.8804.0>, :shutdown}
08:13:18.899 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094427137.33175>, :process, #PID<61039.8802.0>, :shutdown}
08:13:18.901 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094427137.33171>, :process, #PID<61039.8801.0>, :shutdown}
08:13:18.901 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094427137.33167>, :process, #PID<61039.8799.0>, :shutdown}
08:13:18.902 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094427137.33163>, :process, #PID<61039.8797.0>, :shutdown}
08:13:18.903 [warn] [swarm on [email protected]] [tracker:handle_replica_event] received track event for MyProject.Scheduler.NodeSelectorBroadcaster, mismatched pids, local clock conflicts with remote clock, event unhandled
08:13:18.906 [warn] [swarm on [email protected]] [tracker:handle_replica_event] received track event for MyProject.Scheduler.JobBroadcaster, mismatched pids, local clock conflicts with remote clock, event unhandled
08:13:18.907 [warn] [swarm on [email protected]] [tracker:handle_replica_event] received track event for MyProject.Scheduler.ExecutionBroadcaster, mismatched pids, local clock conflicts with remote clock, event unhandled
08:13:18.911 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094164994.181688>, :process, #PID<61061.8615.0>, :noproc}
08:13:18.911 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094164994.181719>, :process, #PID<61043.8615.0>, :noproc}
08:13:18.911 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094164994.181711>, :process, #PID<61061.8621.0>, :shutdown}
08:13:18.911 [info] GenStage consumer MyProject.Scheduler.ExecutorSupervisor is stopping after receiving cancel from producer #PID<61043.8615.0> with reason: :noproc
08:13:18.912 [error] GenServer MyProject.Scheduler.ExecutorSupervisor terminating
** (stop) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
Last message: {:DOWN, #Reference<0.981856171.4094164994.181723>, :process, #PID<61043.8615.0>, :noproc}
08:13:18.912 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094164994.181702>, :process, #PID<61061.8619.0>, :shutdown}
08:13:18.912 [error] Supervisor received unexpected message: {:DOWN, #Reference<0.981856171.4094164994.181698>, :process, #PID<61061.8617.0>, :shutdown}
08:13:18.913 [warn] [swarm on [email protected]] [tracker:handle_replica_event] received track event for MyProject.Scheduler.TaskRegistry, mismatched pids, local clock conflicts with remote clock, event unhandled
08:13:18.914 [warn] [swarm on [email protected]] [tracker:handle_replica_event] received track event for MyProject.Scheduler.NodeSelectorBroadcaster, mismatched pids, local clock conflicts with remote clock, event unhandled
08:13:18.919 [error] GenServer #PID<0.8135.0> terminating
** (stop) 'stopping because dependent process <0.8127.0> died: shutdown'
Last message: {:EXIT, #PID<0.8127.0>, :shutdown}
08:13:18.919 [error] GenServer #PID<0.8138.0> terminating
** (stop) 'stopping because dependent process <0.8128.0> died: shutdown'
Last message: {:EXIT, #PID<0.8128.0>, :shutdown}
08:13:18.919 [error] GenServer #PID<0.8131.0> terminating
** (stop) 'stopping because dependent process <0.8126.0> died: shutdown'
Last message: {:EXIT, #PID<0.8126.0>, :shutdown}
08:13:18.927 [info] Application my_project exited: shutdown
"Kernel pid terminated (application_controller) ({application_terminated,my_project,shutdown})
"
"{"Kernel pid terminated",application_controller,"{application_terminated,my_project,shutdown}"}
"

Crash dump is being written to: erl_crash.dump...done

I am not sure what other information I could supply you that will help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions