Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

weave-npc blocking connections with valid network policy after a period of time (2.6.0) #3764

Closed
@naemono

Description

What you expected to happen?

Similar to #3761, we are seeing traffic being blocked by weave-npc, but we are using network policies. I would expect traffic to not be blocked by NPC with valid network policy in place

What happened?

We have seen now, consistently (about once every 1-2 weeks) traffic gets blocked between pods inside of a namespace where traffic was working fine earlier. After we debugged the issue, and saw the ipset's on the host to not have valid entries for the pods, we restart weave on the host, and the ipsets become populated, and traffic continues to flow.

How to reproduce it?

I wish we had an easy way to consistently reproduce this issue, but we are beginning to see this issue nearly every week within one specific cluster.

Anything else we need to know?

cloud provider: aws
custom built cluster using in house automation.

Versions:

# ./weave version
weave script 2.6.0
# docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea838
 Built:             Wed Nov 13 07:29:52 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea838
  Built:            Wed Nov 13 07:28:22 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
# uname -a
Linux ip-10-0-173-150 4.15.0-1056-aws #58-Ubuntu SMP Tue Nov 26 15:14:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T16:54:35Z", GoVersion:"go1.12.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.6", GitCommit:"7015f71e75f670eb9e7ebd4b5749639d42e20079", GitTreeState:"clean", BuildDate:"2019-11-13T11:11:50Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Logs:

Unfortunately, these logs do not show the weave logs before restart, but when we run into this issue again (in a week or so), we will get those logs and update this issue

https://gist.github.com/naemono/31df744c7ee6b48dba7b554e06553f4b

When this issue is happening, we begin to see a spike in weavenpc_blocked_connections_total from prometheus:
image

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions