Skip to content

[TT-11125] Add trace_id to traffic logs#899

Closed
lghiur wants to merge 1 commit intomasterfrom
TT-11125-traffic-logs-trace-id
Closed

[TT-11125] Add trace_id to traffic logs#899
lghiur wants to merge 1 commit intomasterfrom
TT-11125-traffic-logs-trace-id

Conversation

@lghiur
Copy link
Copy Markdown
Collaborator

@lghiur lghiur commented Oct 2, 2025

TT-11125

Description

Added trace_id to the AnalyticsRecord struct, so it places it in the database after picking it up from redis

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). If pulling from your own
    fork, don't request your master!
  • Make sure you are making a pull request against the master branch (left side). Also, you should start
    your branch off our latest master.
  • My change requires a change to the documentation.
    • If you've changed APIs, describe what needs to be updated in the documentation.
  • I have updated the documentation accordingly.
  • Modules and vendor dependencies have been updated; run go mod tidy && go mod vendor
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • Check your code additions will not fail linting checks:
    • go fmt -s
    • go vet

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Oct 2, 2025

🔍 Code Analysis Results

Pull Request Analysis: TT-11125 Add trace_id to traffic logs

This analysis provides a comprehensive overview of the pull request, focusing on its impact, technical changes, and architectural implications.

1. Change Impact Analysis

What this PR accomplishes

This pull request enhances the observability of API traffic by embedding a distributed tracing identifier (trace_id) into the core analytics data. By adding TraceID to the AnalyticsRecord struct, Tyk Pump can now collect, process, and store this identifier alongside other request metrics. This allows developers and operators to correlate a specific API request log with a complete distributed trace, making it significantly easier to debug issues, monitor performance, and understand the end-to-end journey of a request across multiple services.

Key Technical Changes

  1. Data Model Extension: The AnalyticsRecord struct in analytics/analytics.go has been updated with a new TraceID field. This field is tagged for proper serialization to JSON (json:"trace_id") and for persistence in SQL databases via GORM (gorm:"column:traceid").
  2. Data Export Integration: The GetFieldNames() and GetLineValues() methods have been updated to include the new TraceID field. This ensures that pumps exporting data to formats like CSV will automatically include the trace identifier in their output.
  3. Comprehensive Testing: New unit tests were added in analytics/analytics_test.go to validate the functionality of the TraceID field. These tests cover:
    • Storing and retrieving the TraceID.
    • Inclusion in field name and value lists for data export.
    • Correct handling by the RemoveIgnoredFields function, ensuring it can be selectively excluded from the record if needed.

Affected System Components

  • Data Ingestion: Any part of the system that creates or deserializes AnalyticsRecord objects will now handle the TraceID.
  • Data Persistence (Pumps): All persistence-related pumps are affected. The TraceID will be automatically stored in any backend that relies on the fields of the AnalyticsRecord struct. This includes:
    • SQL Databases (e.g., PostgreSQL): The gorm tag ensures the traceid column is populated.
    • NoSQL/Document Databases (e.g., MongoDB, Elasticsearch): The json tag ensures the trace_id field is included in the stored document.
    • File-based Pumps (e.g., CSV): The changes to GetFieldNames and GetLineValues ensure the TraceID is written to files.
  • Testing Framework: The test suite has been expanded to ensure the new field is correctly handled.

2. Architecture Visualization

The following diagram illustrates how the AnalyticsRecord, now enriched with TraceID, flows through the Tyk Pump ecosystem. The change ensures that the tracing information captured at the gateway is propagated to all configured data sinks.

I was unable to complete your request due to reaching the maximum number of tool iterations.
Loading

Powered by Visor from Probelabs

Last updated: 2025-10-02T15:14:43.003Z | Triggered by: opened | Commit: 8941d7f

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Oct 2, 2025

🔍 Code Analysis Results

Security Issues (1)

Severity Location Issue
🟢 Info analytics/analytics.go:78
The `TraceID` field is added to the `AnalyticsRecord` struct and populated from an external source (Redis). While existing data sinks like GORM prevent SQL injection, the raw `TraceID` is not validated for format or content. As a defense-in-depth measure, it's best practice to validate all external input upon receipt.
💡 SuggestionConsider adding validation to ensure the `TraceID` conforms to the expected OpenTelemetry format (e.g., a 32-character hexadecimal string) when the `AnalyticsRecord` is created. This would prevent malformed data from propagating through the system and provide an early defense against potential injection attacks in other consumers of this data that may not be as secure as GORM.

Performance Issues (3)

Severity Location Issue
🟢 Info analytics/analytics.go:78
The addition of the `TraceID` string field to the `AnalyticsRecord` struct increases its memory footprint. While the per-record increase is small, the cumulative effect can be significant in a high-throughput environment, leading to higher overall memory consumption for the application.
💡 SuggestionThis is an acceptable trade-off for the new functionality, but the impact on memory usage should be considered for capacity planning in high-volume deployments.
🟢 Info analytics/analytics_test.go:179-185
The new test for `RemoveIgnoredFields` highlights a pre-existing performance issue. This function uses reflection in a nested loop for every analytics record, which is inefficient and can cause CPU overhead on a hot path.
💡 SuggestionConsider optimizing `RemoveIgnoredFields` by building a map of JSON tags to struct fields upon pump initialization. This would change the lookup from a slow O(N) iteration to a fast O(1) map lookup per field, reducing CPU usage for users of the field-ignoring feature.
🟡 Warning analytics/analytics.go:78
The new `TraceID` field, mapped to the `traceid` database column, is a high-cardinality field that will likely be used for filtering. Queries against this column without an index will cause full table scans, leading to significant performance degradation on large datasets.
💡 SuggestionAdd a database index to the `traceid` column for all supported SQL databases. This should be noted in documentation and included in any database migration scripts.

Quality Issues (3)

Severity Location Issue
🟢 Info analytics/analytics.go:78
The addition of the `TraceID` string field increases the memory footprint of each `AnalyticsRecord`. In a high-throughput environment like Tyk Pump, this small increase per record can lead to a significant rise in overall memory consumption.
💡 SuggestionThis is an acceptable trade-off for the added functionality. However, the impact on memory should be documented and considered for capacity planning in production environments.
🟢 Info analytics/analytics_test.go:173-185
The new test for `RemoveIgnoredFields` covers a function that uses an inefficient, reflection-based nested loop to remove fields from records. This function runs on a hot path for every record when field exclusion is configured, which can create significant CPU overhead.
💡 SuggestionWhile this is a pre-existing issue, it's worth noting. To optimize, consider caching a map of JSON tags to struct fields upon pump initialization. This would change the lookup from a slow O(N) iteration to a fast O(1) map lookup per field, significantly improving performance for users of this feature.
🟡 Warning analytics/analytics.go:78
The new `TraceID` field is mapped to a `traceid` database column. Trace IDs are commonly used for lookups, and querying this column without an index will lead to slow, full-table-scan queries, severely degrading performance on large datasets.
💡 SuggestionEnsure that a database index is created for the `traceid` column in all supported SQL databases. This should be documented as a requirement for this feature, and migration scripts should be provided if applicable.

Style Issues (2)

Severity Location Issue
🟢 Info analytics/analytics.go:78
The GORM column name `traceid` is inconsistent with the JSON tag `trace_id` and the Go field name `TraceID`. While not a functional issue, using a consistent naming convention like snake_case (`trace_id`) for database columns improves readability and aligns with common database practices.
💡 SuggestionFor consistency with the JSON tag and common database naming conventions, change the GORM column tag from `column:traceid` to `column:trace_id`.
🔧 Suggested Fix
TraceID string `json:"trace_id" gorm:"column:trace_id"`
🟢 Info analytics/analytics_test.go:185
The test case for `RemoveIgnoredFields` uses the JSON field name `trace_id` to test its functionality. This works correctly but relies on the internal implementation detail of how `RemoveIgnoredFields` maps JSON tags to struct fields. The test would be more robust and clearer if it directly tested the behavior without this implicit dependency.
💡 SuggestionWhile the test is functionally correct, consider adding a comment to clarify that `RemoveIgnoredFields` operates on JSON tags, explaining why `"trace_id"` is used instead of the Go field name `"TraceID"`. This improves maintainability for future developers.

Powered by Visor from Probelabs

Last updated: 2025-10-02T15:14:43.978Z | Triggered by: opened | Commit: 8941d7f

@buger
Copy link
Copy Markdown
Member

buger commented Oct 2, 2025

/visor what if we try to implement it as tag with some unique value and not as new field? Does pump has protection over high cardinality tags? Do we already have examples of it? I mean our aggregate analytics probably aggregate by tags?

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Oct 2, 2025

🔍 Code Analysis Results

The search results confirm my initial hypothesis. Here's a summary of the findings:
  1. Tags are used for aggregation and metrics:

    • dogstatsd.go: Explicitly warns against using unbounded sources (like request IDs) for tags because it can "infinitely increase the number of metrics" and affect billing. This is a strong piece of evidence.
    • statsd.go, influx.go, influx2.go: These pumps use tags to create metric dimensions. Adding a unique trace_id here would create a new time series for every single request, which is exactly the high cardinality problem.
    • mongo_aggregate.go, sql_aggregate.go, hybrid.go: These pumps use analytics.AggregateData, which likely aggregates data based on dimensions that can include tags.
  2. Some protection/awareness of high cardinality exists:

    • mongo_aggregate.go: Has a ThresholdLenTagList and a printAlert function that warns the user if a document has too many tag entries, stating it "may cause performance issues with aggregate logs." It also has an IgnoreTagPrefixList to filter out problematic tags. This shows the developers are aware of the performance implications of having too many tags.
    • splunk.go: Also has a FilterTags function using IgnoreTagPrefixList.
  3. Tags are passed as-is to logging/storage pumps:

    • elasticsearch.go, kafka.go, kinesis.go: These pumps seem to pass the tags field directly into the record being stored. For these systems, adding trace_id as a tag might be less problematic than for aggregation systems, but it's still not ideal as it mixes dimensions with unique identifiers.

Conclusion:
The evidence is clear. The codebase is designed with the assumption that tags are low-cardinality dimensions used for aggregation. There are explicit warnings and even some mitigation mechanisms (IgnoreTagPrefixList, alerts) for cases where tag cardinality might become an issue.

Adding trace_id as a tag would go directly against the intended use of tags in this system and would cause significant performance and cost issues, especially in the aggregation and metrics pumps (dogstatsd, statsd, mongo_aggregate, etc.).

Therefore, the original author's implementation of adding trace_id as a separate, dedicated field is the correct architectural choice. It avoids polluting the tag namespace with a high-cardinality identifier and allows storage systems to index it separately for efficient lookups without breaking metric aggregation.

I have enough information to construct a detailed and helpful response. I will now use attempt_completion.Hello @lghiur, thank you for your contribution!

@buger raises an excellent point about the implementation strategy for trace_id. Let's explore the suggestion of using a tag versus a dedicated field based on how Tyk Pump currently operates.

Analysis of Tags vs. a Dedicated Field

After reviewing the codebase, it's clear that implementing trace_id as a dedicated field is the correct approach. Here’s why:

  1. Tags Are Used for Aggregation and Metrics:
    Pumps like dogstatsd, statsd, influx, mongo_aggregate, and sql_aggregate use tags as dimensions for grouping and aggregating data. For example, in metrics systems, tags become labels that define a time series.

  2. The High Cardinality Problem:
    A trace_id is a unique identifier for each request, meaning it has extremely high cardinality. If we were to add trace_id as a tag, it would cause a "cardinality explosion" in any aggregation pump:

    • Performance Degradation: Each request would create a unique metric series, making aggregation queries slow and computationally expensive.
    • Increased Costs: Systems that bill based on the number of unique time series (like Datadog) would see a massive cost increase.
    • System Instability: Monitoring systems like Prometheus are not designed to handle high-cardinality labels and can become unstable.
  3. Existing Code Acknowledges This Risk:
    The codebase already contains warnings and safeguards against this exact problem. The dogstatsd pump includes a comment explicitly advising against using unbounded sources like request IDs for tags:

    // pumps/dogstatsd.go:195
    /*
     * Tags shouldn’t originate from unbounded sources, such as EPOCH timestamps, user IDs, or request IDs. Doing
     * so may infinitely increase the number of metrics for your organization and impact your billing.
     */

    Furthermore, the mongo_aggregate pump has a configurable threshold (ThresholdLenTagList) and an alerting mechanism to warn users when documents have too many tags, as this can cause performance issues.

Recommendation

Given that trace_id is a high-cardinality identifier, adding it as a tag would violate the intended use of tags within Tyk Pump and lead to significant performance, cost, and stability issues.

The current implementation of adding TraceID as a separate, dedicated field is the best practice. It correctly isolates this unique identifier from the low-cardinality dimensions used for aggregation, allowing pumps to handle it appropriately—indexing it for fast lookups in storage backends (like Elasticsearch or SQL) without breaking metric aggregation.

Great discussion! This highlights a critical architectural consideration. The current approach in the PR is sound, and I recommend proceeding with it.


Powered by Visor from Probelabs

Last updated: 2025-10-02T16:35:35.586Z | Triggered by: unknown

@buger
Copy link
Copy Markdown
Member

buger commented Oct 2, 2025

/visor do the IgnoreTagPrefixList has any "defaults", or we add there anything else?

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Oct 2, 2025

🔍 Code Analysis Results

Hello @buger! That's a great question about the IgnoreTagPrefixList.

Based on my analysis of the codebase, here’s how it works:

  1. Configuration: The IgnoreTagPrefixList is a user-configurable list that defaults to an empty slice ([]). You can add any string prefixes to this list in your pump configuration to filter out specific tags.

  2. Hardcoded Default Behavior: Interestingly, there is also a hardcoded rule that is always applied, regardless of your configuration. The system automatically ignores any tag that begins with the prefix "key-".

This logic is implemented in the ignoreTag function within analytics/aggregate.go:

// analytics/aggregate.go:602-615
func ignoreTag(tag string, ignoreTagPrefixList []string) bool {
	// ignore tag added for key by gateway
	if strings.HasPrefix(tag, "key-") {
		return true
	}

	for _, prefix := range ignoreTagPrefixList {
		if strings.HasPrefix(tag, prefix) {
			return true
		}
	}

	return false
}

In summary:

  • Yes, there is a default: The prefix "key-" is always ignored by default.
  • It's extensible: You can add any other prefixes you need to the ignore_tag_prefix_list in your configuration to filter out additional tags. This is especially useful for preventing high-cardinality tags from causing performance issues in aggregation pumps.

I hope this clarifies things! Let me know if you have any other questions.


Powered by Visor from Probelabs

Last updated: 2025-10-02T16:54:44.268Z | Triggered by: unknown

@lghiur lghiur closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants