Skip to content

Commit 0fdf105

Browse files
Fix webhook error when metrics service address uses env var expansion (#3531)
* Use a naive approach to parse port before env var expansion * Refactor how service metrics endpoint parsing works Now, when there would be an error it gets logged and the default values are returned. With this refactor the method encapsulates all defaulting logic that was slightly spread around different places. * Add more tests to `Service.MetricsEndpoint` and fix them * Remove unused code * Make Service.MetricsEndpoint fail when can't parse port * Update documentation regarding examination of the collector config file * Fix documentation regarding configured receivers and their ports * Remove unrelated/confusion doc line * Handle review feedback
1 parent c344c2b commit 0fdf105

File tree

8 files changed

+231
-62
lines changed

8 files changed

+231
-62
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
2+
change_type: bug_fix
3+
4+
# The name of the component, or a single word describing the area of concern, (e.g. collector, target allocator, auto-instrumentation, opamp, github action)
5+
component: operator
6+
7+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
8+
note: Fix the admission webhook to when metrics service address host uses env var expansion
9+
10+
# One or more tracking issues related to the change
11+
issues: [3513]
12+
13+
# (Optional) One or more lines of additional information to render under the primary note.
14+
# These lines will be padded with 2 spaces and then inserted directly into the document.
15+
# Use pipe (|) for multiline entries.
16+
subtext: |
17+
This should allow the metrics service address to have the host portion expanded from an environment variable,
18+
like `$(env:POD_IP)` instead of using `0.0.0.0`, which is the [recommended by the Collector](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks).

README.md

+6-2
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,16 @@ This will create an OpenTelemetry Collector instance named `simplest`, exposing
7272

7373
The `config` node holds the `YAML` that should be passed down as-is to the underlying OpenTelemetry Collector instances. Refer to the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) documentation for a reference of the possible entries.
7474

75-
> 🚨 **NOTE:** At this point, the Operator does _not_ validate the contents of the configuration file: if the configuration is invalid, the instance will still be created but the underlying OpenTelemetry Collector might crash.
75+
> 🚨 **NOTE:** At this point, the Operator does _not_ validate the whole contents of the configuration file: if the configuration is invalid, the instance might still be created but the underlying OpenTelemetry Collector might crash.
7676
7777
> 🚨 **Note:** For private GKE clusters, you will need to either add a firewall rule that allows master nodes access to port `9443/tcp` on worker nodes, or change the existing rule that allows access to port `80/tcp`, `443/tcp` and `10254/tcp` to also allow access to port `9443/tcp`. More information can be found in the [Official GCP Documentation](https://cloud.google.com/load-balancing/docs/tcp/setting-up-tcp#config-hc-firewall). See the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules) on adding rules and the [Kubernetes issue](https://github.com/kubernetes/kubernetes/issues/79739) for more detail.
7878
79-
The Operator does examine the configuration file to discover configured receivers and their ports. If it finds receivers with ports, it creates a pair of kubernetes services, one headless, exposing those ports within the cluster. The headless service contains a `service.beta.openshift.io/serving-cert-secret-name` annotation that will cause OpenShift to create a secret containing a certificate and key. This secret can be mounted as a volume and the certificate and key used in those receivers' TLS configurations.
79+
The Operator does examine the configuration file for a few purposes:
8080

81+
- To discover configured receivers and their ports. If it finds receivers with ports, it creates a pair of kubernetes services, one headless, exposing those ports within the cluster. If the port is using environment variable expansion or cannot be parsed, an error will be returned. The headless service contains a `service.beta.openshift.io/serving-cert-secret-name` annotation that will cause OpenShift to create a secret containing a certificate and key. This secret can be mounted as a volume and the certificate and key used in those receivers' TLS configurations.
82+
83+
- To check if Collector observability is enabled (controlled by `spec.observability.metrics.enableMetrics`). In this case, a Service and ServiceMonitor/PodMonitor are created for the Collector instance. As a consequence, if the metrics service address contains an invalid port or uses environment variable expansion for the port, an error will be returned. A workaround for the environment variable case is to set `enableMetrics` to `false` and manually create the previously mentioned objects with the correct port if you need them.
84+
8185
### Upgrades
8286

8387
As noted above, the OpenTelemetry Collector format is continuing to evolve. However, a best-effort attempt is made to upgrade all managed `OpenTelemetryCollector` resources.

apis/v1beta1/collector_webhook_test.go

+12-2
Original file line numberDiff line numberDiff line change
@@ -555,7 +555,7 @@ func TestCollectorDefaultingWebhook(t *testing.T) {
555555
ctx := context.Background()
556556
err := cvw.Default(ctx, &test.otelcol)
557557
if test.expected.Spec.Config.Service.Telemetry == nil {
558-
assert.NoError(t, test.expected.Spec.Config.Service.ApplyDefaults(), "could not apply defaults")
558+
assert.NoError(t, test.expected.Spec.Config.Service.ApplyDefaults(logr.Discard()), "could not apply defaults")
559559
}
560560
assert.NoError(t, err)
561561
assert.Equal(t, test.expected, test.otelcol)
@@ -588,7 +588,17 @@ func TestOTELColValidatingWebhook(t *testing.T) {
588588
five := int32(5)
589589
maxInt := int32(math.MaxInt32)
590590

591-
cfg := v1beta1.Config{}
591+
cfg := v1beta1.Config{
592+
Service: v1beta1.Service{
593+
Telemetry: &v1beta1.AnyConfig{
594+
Object: map[string]interface{}{
595+
"metrics": map[string]interface{}{
596+
"address": "${env:POD_ID}:8888",
597+
},
598+
},
599+
},
600+
},
601+
}
592602
err := yaml.Unmarshal([]byte(cfgYaml), &cfg)
593603
require.NoError(t, err)
594604

apis/v1beta1/config.go

+46-23
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ import (
1818
"bytes"
1919
"encoding/json"
2020
"fmt"
21-
"net"
2221
"reflect"
22+
"regexp"
2323
"sort"
2424
"strconv"
2525
"strings"
@@ -269,7 +269,7 @@ func (c *Config) getEnvironmentVariablesForComponentKinds(logger logr.Logger, co
269269

270270
// applyDefaultForComponentKinds applies defaults to the endpoints for the given ComponentKind(s).
271271
func (c *Config) applyDefaultForComponentKinds(logger logr.Logger, componentKinds ...ComponentKind) error {
272-
if err := c.Service.ApplyDefaults(); err != nil {
272+
if err := c.Service.ApplyDefaults(logger); err != nil {
273273
return err
274274
}
275275
enabledComponents := c.GetEnabledComponents()
@@ -427,37 +427,60 @@ type Service struct {
427427
Pipelines map[string]*Pipeline `json:"pipelines" yaml:"pipelines"`
428428
}
429429

430-
// MetricsEndpoint gets the port number and host address for the metrics endpoint from the collector config if it has been set.
431-
func (s *Service) MetricsEndpoint() (string, int32, error) {
432-
defaultAddr := "0.0.0.0"
433-
if s.GetTelemetry() == nil {
434-
// telemetry isn't set, use the default
435-
return defaultAddr, 8888, nil
436-
}
437-
host, port, netErr := net.SplitHostPort(s.GetTelemetry().Metrics.Address)
438-
if netErr != nil && strings.Contains(netErr.Error(), "missing port in address") {
439-
return defaultAddr, 8888, nil
440-
} else if netErr != nil {
441-
return "", 0, netErr
442-
}
443-
i64, err := strconv.ParseInt(port, 10, 32)
430+
const (
431+
defaultServicePort int32 = 8888
432+
defaultServiceHost = "0.0.0.0"
433+
)
434+
435+
// MetricsEndpoint attempts gets the host and port number from the host address without doing any validation regarding the
436+
// address itself.
437+
// It works even before env var expansion happens, when a simple `net.SplitHostPort` would fail because of the extra colon
438+
// from the env var, i.e. the address looks like "${env:POD_IP}:4317", "${env:POD_IP}", or "${POD_IP}".
439+
// In cases which the port itself is a variable, i.e. "${env:POD_IP}:${env:PORT}", this returns an error. This happens
440+
// because the port is used to generate Service objects and mappings.
441+
func (s *Service) MetricsEndpoint(logger logr.Logger) (string, int32, error) {
442+
telemetry := s.GetTelemetry()
443+
if telemetry == nil || telemetry.Metrics.Address == "" {
444+
return defaultServiceHost, defaultServicePort, nil
445+
}
446+
447+
// The regex below matches on strings that end with a colon followed by the environment variable expansion syntax.
448+
// So it should match on strings ending with: ":${env:POD_IP}" or ":${POD_IP}".
449+
const portEnvVarRegex = `:\${[env:]?.*}$`
450+
isPortEnvVar := regexp.MustCompile(portEnvVarRegex).MatchString(telemetry.Metrics.Address)
451+
if isPortEnvVar {
452+
errMsg := fmt.Sprintf("couldn't determine metrics port from configuration: %s",
453+
telemetry.Metrics.Address)
454+
logger.Info(errMsg)
455+
return "", 0, fmt.Errorf(errMsg)
456+
}
457+
458+
// The regex below matches on strings that end with a colon followed by 1 or more numbers (representing the port).
459+
const explicitPortRegex = `:(\d+$)`
460+
explicitPortMatches := regexp.MustCompile(explicitPortRegex).FindStringSubmatch(telemetry.Metrics.Address)
461+
if len(explicitPortMatches) <= 1 {
462+
return telemetry.Metrics.Address, defaultServicePort, nil
463+
}
464+
465+
port, err := strconv.ParseInt(explicitPortMatches[1], 10, 32)
444466
if err != nil {
467+
errMsg := fmt.Sprintf("couldn't determine metrics port from configuration: %s",
468+
telemetry.Metrics.Address)
469+
logger.Info(errMsg, "error", err)
445470
return "", 0, err
446471
}
447472

448-
if host == "" {
449-
host = defaultAddr
450-
}
451-
452-
return host, int32(i64), nil
473+
host, _, _ := strings.Cut(telemetry.Metrics.Address, explicitPortMatches[0])
474+
return host, int32(port), nil
453475
}
454476

455477
// ApplyDefaults inserts configuration defaults if it has not been set.
456-
func (s *Service) ApplyDefaults() error {
457-
telemetryAddr, telemetryPort, err := s.MetricsEndpoint()
478+
func (s *Service) ApplyDefaults(logger logr.Logger) error {
479+
telemetryAddr, telemetryPort, err := s.MetricsEndpoint(logger)
458480
if err != nil {
459481
return err
460482
}
483+
461484
tm := &AnyConfig{
462485
Object: map[string]interface{}{
463486
"metrics": map[string]interface{}{

apis/v1beta1/config_test.go

+144-30
Original file line numberDiff line numberDiff line change
@@ -216,47 +216,157 @@ func TestGetTelemetryFromYAMLIsNil(t *testing.T) {
216216
assert.Nil(t, cfg.Service.GetTelemetry())
217217
}
218218

219-
func TestConfigToMetricsPort(t *testing.T) {
220-
219+
func TestConfigMetricsEndpoint(t *testing.T) {
221220
for _, tt := range []struct {
222221
desc string
223222
expectedAddr string
224223
expectedPort int32
224+
expectedErr bool
225225
config Service
226226
}{
227227
{
228-
"custom port",
229-
"0.0.0.0",
230-
9090,
231-
Service{
228+
desc: "custom port",
229+
expectedAddr: "localhost",
230+
expectedPort: 9090,
231+
config: Service{
232+
Telemetry: &AnyConfig{
233+
Object: map[string]interface{}{
234+
"metrics": map[string]interface{}{
235+
"address": "localhost:9090",
236+
},
237+
},
238+
},
239+
},
240+
},
241+
{
242+
desc: "custom port ipv6",
243+
expectedAddr: "[::]",
244+
expectedPort: 9090,
245+
config: Service{
246+
Telemetry: &AnyConfig{
247+
Object: map[string]interface{}{
248+
"metrics": map[string]interface{}{
249+
"address": "[::]:9090",
250+
},
251+
},
252+
},
253+
},
254+
},
255+
{
256+
desc: "missing port",
257+
expectedAddr: "localhost",
258+
expectedPort: 8888,
259+
config: Service{
260+
Telemetry: &AnyConfig{
261+
Object: map[string]interface{}{
262+
"metrics": map[string]interface{}{
263+
"address": "localhost",
264+
},
265+
},
266+
},
267+
},
268+
},
269+
{
270+
desc: "missing port ipv6",
271+
expectedAddr: "[::]",
272+
expectedPort: 8888,
273+
config: Service{
274+
Telemetry: &AnyConfig{
275+
Object: map[string]interface{}{
276+
"metrics": map[string]interface{}{
277+
"address": "[::]",
278+
},
279+
},
280+
},
281+
},
282+
},
283+
{
284+
desc: "env var and missing port",
285+
expectedAddr: "${env:POD_IP}",
286+
expectedPort: 8888,
287+
config: Service{
288+
Telemetry: &AnyConfig{
289+
Object: map[string]interface{}{
290+
"metrics": map[string]interface{}{
291+
"address": "${env:POD_IP}",
292+
},
293+
},
294+
},
295+
},
296+
},
297+
{
298+
desc: "env var and missing port ipv6",
299+
expectedAddr: "[${env:POD_IP}]",
300+
expectedPort: 8888,
301+
config: Service{
302+
Telemetry: &AnyConfig{
303+
Object: map[string]interface{}{
304+
"metrics": map[string]interface{}{
305+
"address": "[${env:POD_IP}]",
306+
},
307+
},
308+
},
309+
},
310+
},
311+
{
312+
desc: "env var and with port",
313+
expectedAddr: "${POD_IP}",
314+
expectedPort: 1234,
315+
config: Service{
316+
Telemetry: &AnyConfig{
317+
Object: map[string]interface{}{
318+
"metrics": map[string]interface{}{
319+
"address": "${POD_IP}:1234",
320+
},
321+
},
322+
},
323+
},
324+
},
325+
{
326+
desc: "env var and with port ipv6",
327+
expectedAddr: "[${POD_IP}]",
328+
expectedPort: 1234,
329+
config: Service{
232330
Telemetry: &AnyConfig{
233331
Object: map[string]interface{}{
234332
"metrics": map[string]interface{}{
235-
"address": "0.0.0.0:9090",
333+
"address": "[${POD_IP}]:1234",
236334
},
237335
},
238336
},
239337
},
240338
},
241339
{
242-
"bad address",
243-
"0.0.0.0",
244-
8888,
245-
Service{
340+
desc: "port is env var",
341+
expectedErr: true,
342+
config: Service{
246343
Telemetry: &AnyConfig{
247344
Object: map[string]interface{}{
248345
"metrics": map[string]interface{}{
249-
"address": "0.0.0.0",
346+
"address": "localhost:${env:POD_PORT}",
250347
},
251348
},
252349
},
253350
},
254351
},
255352
{
256-
"missing address",
257-
"0.0.0.0",
258-
8888,
259-
Service{
353+
desc: "port is env var ipv6",
354+
expectedErr: true,
355+
config: Service{
356+
Telemetry: &AnyConfig{
357+
Object: map[string]interface{}{
358+
"metrics": map[string]interface{}{
359+
"address": "[::]:${env:POD_PORT}",
360+
},
361+
},
362+
},
363+
},
364+
},
365+
{
366+
desc: "missing address",
367+
expectedAddr: "0.0.0.0",
368+
expectedPort: 8888,
369+
config: Service{
260370
Telemetry: &AnyConfig{
261371
Object: map[string]interface{}{
262372
"metrics": map[string]interface{}{
@@ -267,24 +377,23 @@ func TestConfigToMetricsPort(t *testing.T) {
267377
},
268378
},
269379
{
270-
"missing metrics",
271-
"0.0.0.0",
272-
8888,
273-
Service{
380+
desc: "missing metrics",
381+
expectedAddr: "0.0.0.0",
382+
expectedPort: 8888,
383+
config: Service{
274384
Telemetry: &AnyConfig{},
275385
},
276386
},
277387
{
278-
"missing telemetry",
279-
"0.0.0.0",
280-
8888,
281-
Service{},
388+
desc: "missing telemetry",
389+
expectedAddr: "0.0.0.0",
390+
expectedPort: 8888,
282391
},
283392
{
284-
"configured telemetry",
285-
"1.2.3.4",
286-
4567,
287-
Service{
393+
desc: "configured telemetry",
394+
expectedAddr: "1.2.3.4",
395+
expectedPort: 4567,
396+
config: Service{
288397
Telemetry: &AnyConfig{
289398
Object: map[string]interface{}{
290399
"metrics": map[string]interface{}{
@@ -296,9 +405,14 @@ func TestConfigToMetricsPort(t *testing.T) {
296405
},
297406
} {
298407
t.Run(tt.desc, func(t *testing.T) {
408+
logger := logr.Discard()
299409
// these are acceptable failures, we return to the collector's default metric port
300-
addr, port, err := tt.config.MetricsEndpoint()
301-
assert.NoError(t, err)
410+
addr, port, err := tt.config.MetricsEndpoint(logger)
411+
if tt.expectedErr {
412+
assert.Error(t, err)
413+
} else {
414+
assert.NoError(t, err)
415+
}
302416
assert.Equal(t, tt.expectedAddr, addr)
303417
assert.Equal(t, tt.expectedPort, port)
304418
})

0 commit comments

Comments
 (0)