Skip to content

Panics when opa running as a server #7117

Open
@alam-chime

Description

@alam-chime

Short description

We're running opa server as a sidecar in kubernetes. At the time of the issue, both memory and cpu usage were well below the defined limits. For the majority of requests, we are receiving the expected outcomes. But there are a few instances where we're seeing HTTP 502s and 504s from opa. There are no differences between the inputs of the failing requests and those that succeed.

  • OPA version: 0.66.0, but we're seeing this behavior with 0.69.0 also
  • Policy bundle created using
                "opa", 
                "build",
		"--bundle", "./source",
		"--optimize", "2",
		"--output", "policy-bundle.tar.gz",
  • Server commands
                "opa",
		"run",
		"--server",
		"--addr", "localhost:8181",
		"--diagnostic-addr", ":8282",
		"--log-level", "error",
		"--config-file", "/config.yaml",
		"--authorization=basic",
		"--bundle", "policy-bundle.tar.gz",
  • Server error logs
2024/10/12 15:21:01 http: panic serving 127.0.0.1:46936: runtime error: invalid memory address or nil pointer dereference
goroutine 415141 [running]:
net/http.(*conn).serve.func1()
/usr/local/go/src/net/http/server.go:1898 +0xbe
panic({0x114a0e0?, 0x2102610?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/open-policy-agent/opa/internal/deepcopy.Map(...)
/src/internal/deepcopy/deepcopy.go:28
github.com/open-policy-agent/opa/internal/deepcopy.DeepCopy({0x1135cc0?, 0xc000ed5cb0})
/src/internal/deepcopy/deepcopy.go:19 +0x173
github.com/open-policy-agent/opa/internal/deepcopy.Map(...)
/src/internal/deepcopy/deepcopy.go:28
github.com/open-policy-agent/opa/internal/deepcopy.DeepCopy({0x1135cc0?, 0xc0007529f0})
/src/internal/deepcopy/deepcopy.go:19 +0x14e
github.com/open-policy-agent/opa/internal/deepcopy.Map(...)
/src/internal/deepcopy/deepcopy.go:28
github.com/open-policy-agent/opa/internal/deepcopy.DeepCopy({0x1135cc0?, 0xc000b0b230})
/src/internal/deepcopy/deepcopy.go:19 +0x14e
github.com/open-policy-agent/opa/internal/deepcopy.Map(...)
/src/internal/deepcopy/deepcopy.go:28
github.com/open-policy-agent/opa/internal/deepcopy.DeepCopy({0x1135cc0?, 0xc000b0b1d0})
/src/internal/deepcopy/deepcopy.go:19 +0x14e
github.com/open-policy-agent/opa/plugins/logs.maskRuleSet.Mask({0xc00096ab20, {0xc0007284e0, 0x3, 0x4}, 0x0}, 0xc001136668)
/src/plugins/logs/mask.go:339 +0x11d
github.com/open-policy-agent/opa/plugins/logs.(*Plugin).maskEvent(0xc0001d1ce0, {0x1804978, 0xc000de66f0}, {0x17fa720, 0xc000de6ae0}, {0x1807280, 0xc001250000}, 0xc001136668)
/src/plugins/logs/plugin.go:1039 +0x25b
github.com/open-policy-agent/opa/plugins/logs.(*Plugin).Log(0xc0001d1ce0, {0x1804978, 0xc000de66f0}, 0xc000158c60)
/src/plugins/logs/plugin.go:704 +0x4b8
github.com/open-policy-agent/opa/runtime.(*Runtime).decisionLogger(0x17f4710?, {0x1804978, 0xc000de66f0}, 0xc000158c60)
/src/runtime/runtime.go:789 +0x6a
github.com/open-policy-agent/opa/server.decisionLogger.Log({0xc000de6b40?, {0x0?, 0xf?}, 0xc0002e1510?}, {0x1804978, 0xc000de66f0}, {0x17fa720, 0xc000de6ae0}, {0xc00036803e, 0xd}, ...)
/src/server/server.go:2992 +0x7b0
github.com/open-policy-agent/opa/server.(*Server).v1DataPost(0xc0001cc6c8, {0x1802e40, 0xc0001ecbd0}, 0xc000ba30e0)
/src/server/server.go:1796 +0xf0f
net/http.HandlerFunc.ServeHTTP(0x0?, {0x1802e40?, 0xc0001ecbd0?}, 0xc000def1c0?)
/usr/local/go/src/net/http/server.go:2166 +0x29
github.com/open-policy-agent/opa/internal/prometheus.(*Provider).InstrumentHandler.func1({0x7f21a71208a0, 0xc000de6600}, 0xc000ba30e0)
/src/internal/prometheus/prometheus.go:89 +0x136
net/http.HandlerFunc.ServeHTTP(0x1802e40?, {0x7f21a71208a0?, 0xc000de6600?}, 0xc000def2c0?)
/usr/local/go/src/net/http/server.go:2166 +0x29
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func1({0x1802e40, 0xc0001ec018}, 0xc000ba30e0)
/src/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:97 +0xfd
net/http.HandlerFunc.ServeHTTP(0xc000ba2fc0?, {0x1802e40?, 0xc0001ec018?}, 0x0?)
/usr/local/go/src/net/http/server.go:2166 +0x29
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000000000, {0x1802e40, 0xc0001ec018}, 0xc000ba2480)
/src/vendor/github.com/gorilla/mux/mux.go:212 +0x1e2
github.com/open-policy-agent/opa/server/authorizer.(*Basic).ServeHTTP(0xc000789620, {0x1802e40, 0xc0001ec018}, 0xc0001ec018?)
/src/server/authorizer/authorizer.go:129 +0x5a4
net/http.HandlerFunc.ServeHTTP(0x0?, {0x1802e40?, 0xc0001ec018?}, 0xc0006a3690?)
/usr/local/go/src/net/http/server.go:2166 +0x29
github.com/open-policy-agent/opa/internal/prometheus.(*Provider).InstrumentHandler.func1({0x7f21a71208a0, 0xc000b941e0}, 0xc000ba2000)
/src/internal/prometheus/prometheus.go:89 +0x136
net/http.HandlerFunc.ServeHTTP(0x1802150?, {0x7f21a71208a0?, 0xc000b941e0?}, 0x41af91?)
/usr/local/go/src/net/http/server.go:2166 +0x29
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func1({0x1802150, 0xc000b9a0a0}, 0xc000ba2000)
/src/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:97 +0xfd
net/http.HandlerFunc.ServeHTTP(0xc000b94150?, {0x1802150?, 0xc000b9a0a0?}, 0x12de5b5?)
/usr/local/go/src/net/http/server.go:2166 +0x29
github.com/open-policy-agent/opa/server.(*Server).initHandlerCompression.CompressHandler.func1({0x1802150, 0xc000b9a0a0}, 0xc000ba2000)
/src/server/handlers/compress.go:41 +0x175
net/http.HandlerFunc.ServeHTTP(0xc0006a3ad8?, {0x1802150?, 0xc000b9a0a0?}, 0x11?)
/usr/local/go/src/net/http/server.go:2166 +0x29
github.com/open-policy-agent/opa/runtime.(*LoggingHandler).ServeHTTP(0xc0005ebd10, {0x1801be0, 0xc00078e0e0}, 0xc000ba2000)
/src/runtime/logging.go:116 +0xad2
net/http.serverHandler.ServeHTTP({0x17fe1f0?}, {0x1801be0?, 0xc00078e0e0?}, 0x6?)
/usr/local/go/src/net/http/server.go:3137 +0x8e
net/http.(*conn).serve(0xc000b96000, {0x1804978, 0xc0003ae0f0})
/usr/local/go/src/net/http/server.go:2039 +0x5e8
created by net/http.(*Server).Serve in goroutine 111
/usr/local/go/src/net/http/server.go:3285 +0x4b4

Steps To Reproduce

We haven't been able to reproduce this issue locally, but we'll provide an update if we're successful.

Expected behavior

there shouldn't be a panic and opa server should respond back with the decision.

Additional context

This issue happens randomly, with no difference in the input between the requests that panic and the ones that succeed. The policy and data files are too big to share here, but I can create a smaller example if needed. Maybe the error logs are helpful for now?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions