Skip to content

[Bug]: Operation based sampling strategies are not being returned for ratelimiting service strategies #5270

Open
@kuujis

Description

@kuujis

Edit by @yurishkuro

  1. (bug) When service has rate limiting strategy at the service level, and defines no per-operation strategies, the global per-operation strategies were also ignored
  2. (change) When service has probabilistic service-level strategy, and defines no per-operation strategies, the global per-operation strategies were used, but the default sampling rate was taken from global settings, whereas arguably service-level setting should take priority.

Example: given this config file where services A and B do not define their own per-operation strategies:

{ 
      "service_strategies": [ 
        { 
          "service": "ServiceA", 
          "type": "probabilistic", 
          "param": 1.0 
        }, 
        { 
         "service": "ServiceB", 
         "type": "ratelimiting", 
         "param": 3 
       } 
      ], 
      "default_strategy": { 
        "type": "probabilistic", 
        "param": 0.2, 
        "operation_strategies": [ 
          { 
            "operation": "/health", 
            "type": "probabilistic", 
            "param": 0.1
          } 
        ] 
      } 
    } 

We expect ServiceA to have the following strategy:

  • it inherits per-operation strategies from global config
  • except for defaultSamplingProbability which is kept the same 1.0 as at the service-level
{
  "probabilisticSampling": {
    "samplingRate": 1
  },
  "operationSampling": {
    "defaultSamplingProbability": 1,
    "perOperationStrategies": [
      {
        "operation": "/health",
        "probabilisticSampling": {
          "samplingRate": 0.1
        }
      }
    ]
  }
}

We expect ServiceB to have the following strategy:

  • it inherits per-operation strategies from global config as is
{
  "strategyType": 1,
  "rateLimitingSampling": {
    "maxTracesPerSecond": 3
  },
  "operationSampling": {
    "defaultSamplingProbability": 0.2,
    "perOperationStrategies": [
      {
        "operation": "/health",
        "probabilisticSampling": {
          "samplingRate": 0.1
        }
      }
    ]
  }
}

Original ticket below:

What happened?

While investigating open-telemetry/opentelemetry-java#5504 I was not able to identify any issues with handling of the responses received from jaeger for sampling strategies related to a given service name. However - I found that if service sampling strategy is "ratelimiting" then default operation sampling strategies were not being returned.

Steps to reproduce

  1. Run plugin\sampling\strategystore\static\strategy_store_test.go\TestServiceNoPerOperationStrategies unit test after replacing its code with the snippet below.
  2. The strategy store file used implies that ServiceB should have probablistic operation sampling of 0.0 for operation "/health", which is not the case, as there are no associated operation sampling strategies for ServiceB.

Expected behavior

As implied in https://www.jaegertracing.io/docs/1.28/sampling/#collector-sampling-configuration I would expect the following changes to plugin\sampling\strategystore\static\strategy_store_test.go\TestServiceNoPerOperationStrategies unit test case should result in a passing test:

func TestServiceNoPerOperationStrategies(t *testing.T) {
	store, err := NewStrategyStore(Options{StrategiesFile: "fixtures/service_no_per_operation.json"}, zap.NewNop())
	require.NoError(t, err)

	s, err := store.GetSamplingStrategy(context.Background(), "ServiceA")
	require.NoError(t, err)
	require.NotNil(t, s.OperationSampling)
	os := s.OperationSampling
	assert.EqualValues(t, 1, os.DefaultSamplingProbability)
	require.Len(t, os.PerOperationStrategies, 1)
	assert.Equal(t, "/health", os.PerOperationStrategies[0].Operation)
	assert.EqualValues(t, 0.0, os.PerOperationStrategies[0].ProbabilisticSampling.SamplingRate)
	expected := makeResponse(api_v2.SamplingStrategyType_PROBABILISTIC, 1.0)
	assert.Equal(t, *expected.ProbabilisticSampling, *s.ProbabilisticSampling)

	s, err = store.GetSamplingStrategy(context.Background(), "ServiceB")
	require.NoError(t, err)
	require.NotNil(t, s.OperationSampling)
	os = s.OperationSampling
	assert.EqualValues(t, 0.2, os.DefaultSamplingProbability)
	require.Len(t, os.PerOperationStrategies, 1)
	assert.Equal(t, "/health", os.PerOperationStrategies[0].Operation)
	assert.EqualValues(t, 0.0, os.PerOperationStrategies[0].ProbabilisticSampling.SamplingRate)
	expected = makeResponse(api_v2.SamplingStrategyType_RATE_LIMITING, 3)
	assert.Equal(t, *expected.RateLimitingSampling, *s.RateLimitingSampling)
}

Relevant log output

N/A

Screenshot

N/A

Additional context

I believe this behaviour was introduced as part of #2230 as an outcome of #2171 discussion.
this issue is created as per contribution guidelines.
I'm working on setting up a test to verify the fix I made (kuujis#1) with some basic service setup and work through other requirements (first time contributing).

I would also appreciate feedback on the solution itself, since depending on how one looks at it - it might be considered breaking contract, especially since the code was not changed for a while.
The alternative I was considering, but decided against was to call jaeger twice from OTEL library in order to get both "service" specific and the "default" configurations and then merge them, but it felt wrong, since I felt this is incorrect behavior on Jager side.

Jaeger backend version

v1.35.0

SDK

io.opentelemetry:opentelemetry-bom:1.35.0

Pipeline

OTEL SDK -> Jaeger.GetSamplingStrategy -> exporter to console

Stogage backend

No response

Operating system

No response

Deployment model

No response

Deployment configs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugchangelog:breaking-changeChange that is breaking public APIs or established behaviordeprecationRemoval of some public-facing featuresgood first issueGood for beginnershelp wantedFeatures that maintainers are willing to accept but do not have cycles to implement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions