Tt 16809 pump generate mcp analytics by lghiur · Pull Request #954 · TykTechnologies/tyk-pump

lghiur · 2026-03-18T13:17:00Z

Extends pump support to handle MCP (Model Context Protocol) analytics records across all major storage backends.

Changes

MongoDB (mcp_mongo.go, mcp_mongo_aggregate.go): Fixed MCP record writes and aggregate handling
PostgreSQL/SQL (mcp_sql.go, mcp_sql_aggregate.go): Fixed MCP record and aggregate writes
Hybrid (hybrid.go): Added MCP record passthrough support
Prometheus (prometheus.go): Extended metrics to include MCP-specific labels/counters
Elasticsearch (elasticsearch.go): Added MCP record indexing support
Protobuf serializer (serializer/protobuf.go): Extended to handle MCP analytics records

probelabs · 2026-03-18T13:17:55Z

This PR introduces comprehensive support for MCP (Model Context Protocol) analytics records within Tyk Pump. It enables the processing, storage, and aggregation of analytics for MCP-based APIs by adding new data structures, a dedicated aggregation pipeline, and support across all major storage backends.

Files Changed Analysis

This is a significant feature addition, reflected in the 30 files changed with 4,168 additions and only 147 deletions. The changes are predominantly additive, indicating new functionality.

New Functionality (12 files): A dozen new files implement the core MCP logic. This includes dedicated pumps for MongoDB and SQL (pumps/mcp_mongo.go, pumps/mcp_sql.go, and their aggregate counterparts), new analytics logic (analytics/aggregate_mcp.go, analytics/mcp_record.go), and extensive corresponding tests.
Modifications to Existing Systems: Key existing pumps (pumps/elasticsearch.go, pumps/hybrid.go, pumps/mongo.go) and the Protobuf serializer have been modified to recognize and handle MCP records. Standard pumps like mongo and mongo_aggregate are updated to explicitly ignore MCP records, delegating them to the new dedicated pumps.
Core Analytics Changes: The main data structures in analytics/analytics.go and the Protobuf definition in analytics/analytics.proto have been extended to include MCP-specific fields.
Refactoring: A new OpenGormDB function was added to pumps/common.go to centralize GORM database connection logic, reducing boilerplate in the SQL-based pumps.

Architecture & Impact Assessment

What this PR accomplishes

This PR enables Tyk Pump to process a new type of analytics record generated by MCP-based APIs. Previously, the pump was limited to REST and GraphQL analytics. This change provides feature parity for analytics collection for this new API protocol, allowing operators to monitor MCP API traffic, performance, and errors.

Key technical changes introduced

New Data Structures: The core analytics.AnalyticsRecord is extended with a new MCPStats struct. A dedicated MCPRecord is introduced for specialized storage in SQL/Mongo to allow for efficient querying on MCP-specific fields (JSONRPCMethod, PrimitiveType, PrimitiveName).
Segregated Aggregation: A new, separate aggregation pipeline (analytics.AggregateMCPData) is created exclusively for MCP records. The existing analytics.AggregateData function is modified to explicitly ignore MCP records, ensuring a clean separation between MCP and REST/GraphQL analytics.
Dedicated Pumps: Four new pumps are introduced to handle storing raw and aggregated MCP data in MongoDB and SQL backends: pumps/mcp_mongo.go, pumps/mcp_mongo_aggregate.go, pumps/mcp_sql.go, and pumps/mcp_sql_aggregate.go.
Enhancements to Existing Pumps:
- Elasticsearch: Can now route MCP records to a separate index (mcp_index_name) and includes MCP-specific fields in documents.
- Hybrid: The pump now calls a new RPC endpoint (PurgeAnalyticsDataMCPAggregated) to send aggregated MCP data to MDCB.
Serialization: The Protobuf schema and generated code are updated to handle MCPStats.

Affected system components

Analytics Core (/analytics): The fundamental data structures and aggregation logic are significantly expanded.
Data Pumps (/pumps): This is the most impacted area, with new pumps added and major existing pumps (Elasticsearch, Hybrid, Mongo) updated.
Serialization (analytics/proto): The Protobuf serializer is updated to support the new data fields.

Component Interaction Flow

graph TD
    subgraph Tyk Gateway
        A[Gateway generates AnalyticsRecord]
    end

    subgraph Tyk Pump
        A --> B{Pump receives records}
        B --> C{"record.IsMCPRecord()?"}
        C -- Yes --> D[MCP Analytics Pipeline]
        C -- No --> E[Standard REST/GraphQL Pipeline]

        subgraph D [MCP Analytics Pipeline]
            direction LR
            D_Agg(AggregateMCPData) --> D_MongoAgg[MCPMongoAggregatePump]
            D_Agg --> D_SQLAgg[MCPSQLAggregatePump]
            D_Agg --> D_Hybrid[HybridPump]

            D_Raw(Store Raw MCP Data) --> D_Mongo[MCPMongoPump]
            D_Raw --> D_SQL[MCPSQLPump]
            D_Raw --> D_ES[ElasticsearchPump]
        end
    end

    subgraph Storage Backends
        D_MongoAgg & D_Mongo --> DB1[MongoDB]
        D_SQLAgg & D_SQL --> DB2[SQL Database]
        D_ES --> DB3[Elasticsearch]
        D_Hybrid --> DB4[MDCB]
    end

Scope Discovery & Context Expansion

The changes are systemic within the tyk-pump repository, touching the entire data pipeline from ingestion to storage for the new MCP record type.
The modification of analytics/analytics.proto is a critical detail. It implies that this work is part of a larger, cross-repository feature. Upstream components, such as the Tyk Gateway, will need corresponding changes to generate and serialize the new MCPStats data. Without those upstream changes, this new functionality in the pump will remain unused.
The implementation for Mongo and SQL creates new, dedicated pump files, duplicating a significant amount of connection and batching logic from the standard pumps. This contrasts with the approach for Elasticsearch, where the new logic is integrated into existing files. This architectural decision isolates the new functionality but increases code duplication and potential maintenance overhead.

Metadata

Review Effort: 5 / 5
Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-04-17T08:03:33.319Z | Triggered by: pr_updated | Commit: 98ca874

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2026-03-18T13:18:56Z

Security Issues (1)

Severity	Location	Issue
🟡 Warning	`analytics/aggregate_mcp.go:153-163`	The MCP analytics aggregation logic uses fields like `JSONRPCMethod`, `PrimitiveType`, and `PrimitiveName` from incoming analytics records as keys for in-memory maps. These records originate from an upstream source (e.g., Tyk Gateway). If an attacker can craft requests that generate a high number of unique values for these fields (high cardinality), it can lead to unbounded growth of these maps. This can cause excessive memory consumption in the Tyk Pump, leading to performance degradation or a denial-of-service (DoS) crash. 💡 Suggestion To mitigate this, consider implementing a limit on the cardinality of each dimension within a single aggregation window. A configurable threshold could be introduced for the maximum number of unique methods, primitives, and names to be tracked per API. Once the threshold is reached, subsequent new values could be ignored, logged, or grouped into a generic "other" category. This would prevent uncontrolled memory growth and protect the pump from resource exhaustion attacks.

Security Issues (1)

Severity	Location	Issue
🟡 Warning	`analytics/aggregate_mcp.go:153-163`	The MCP analytics aggregation logic uses fields like `JSONRPCMethod`, `PrimitiveType`, and `PrimitiveName` from incoming analytics records as keys for in-memory maps. These records originate from an upstream source (e.g., Tyk Gateway). If an attacker can craft requests that generate a high number of unique values for these fields (high cardinality), it can lead to unbounded growth of these maps. This can cause excessive memory consumption in the Tyk Pump, leading to performance degradation or a denial-of-service (DoS) crash. 💡 Suggestion To mitigate this, consider implementing a limit on the cardinality of each dimension within a single aggregation window. A configurable threshold could be introduced for the maximum number of unique methods, primitives, and names to be tracked per API. Once the threshold is reached, subsequent new values could be ignored, logged, or grouped into a generic "other" category. This would prevent uncontrolled memory growth and protect the pump from resource exhaustion attacks.

\n\n ### Architecture Issues (2)

Severity	Location	Issue
🟠 Error	`pumps/mcp_mongo.go:1-178`	The introduction of dedicated MCP pumps for MongoDB and SQL has resulted in significant code duplication. The new files (`mcp_mongo.go`, `mcp_mongo_aggregate.go`, `mcp_sql.go`, `mcp_sql_aggregate.go`) are near-copies of their existing non-MCP counterparts, duplicating boilerplate logic for configuration, connection management, data batching, and writing. This approach violates the DRY principle and increases the long-term maintenance burden, as bug fixes or improvements will need to be applied in multiple places. 💡 Suggestion A more maintainable architecture would involve refactoring the common logic from the base pumps into shared components. For example, a base pump struct could handle the generic mechanics, and be configured with specific "processor" logic (e.g., a filter function and a data transformation function) for each data type (standard, GraphQL, MCP). This is inconsistent with the approach taken for the Elasticsearch pump (`pumps/elasticsearch.go`), where the new logic was integrated into the existing file rather than duplicated. A consistent architectural pattern should be applied across all pumps.
🟡 Warning	`analytics/aggregate_mcp.go:120-142`	The `AggregateMCPData` function duplicates the high-level structure and initialization logic found in the existing `AggregateData` function in `analytics/aggregate.go`. While the core `incrementAggregate` function is correctly reused, the surrounding boilerplate for iterating data, managing the aggregate map, and initializing new aggregate records is repeated. 💡 Suggestion Refactor the common aggregation workflow into a single, generic function. This function could accept a strategy or configuration object that defines the type-specific logic, such as how to filter records, how to initialize a new aggregate struct, and how to increment type-specific dimensions. This would reduce code duplication and make the aggregation logic easier to extend in the future.

Performance Issues (2)

Severity	Location	Issue
🟡 Warning	`pumps/mcp_mongo.go:161-175`	The data processing pipeline involves multiple iterations over the analytics data and the creation of several intermediate slices, which increases memory allocations and garbage collector pressure. The `WriteData` function first calls `filterMCPData` (creating slice 1), then `AccumulateSet` (creating batch slices), and finally `insertMCPDataSet` calls `convertToMCPObjects` (creating slice 3). This pattern can degrade performance in a high-throughput environment. 💡 Suggestion Refactor the `WriteData` function to use a single-pass approach. Iterate over the input data once, filtering MCP records, converting them, and adding them directly to size-aware batches. This will reduce memory churn and improve overall throughput.
🟡 Warning	`pumps/mcp_sql.go:152-173`	The logic for writing to sharded SQL tables iterates through the record list to find date boundaries. This implementation is sensitive to the order of the input data. If records are not perfectly sorted by date (e.g., `[day1_rec1, day2_rec1, day1_rec2]`), it can result in numerous small, inefficient write batches to the database, increasing transaction overhead. 💡 Suggestion To make the batching more robust and performant, first group records by their target shard (date) using a map (e.g., `map[string][]*analytics.MCPRecord`). Then, iterate over the map and write the records for each shard in a single, efficient batch. This ensures optimal batching regardless of the input data order.

Quality Issues (1)

Severity	Location	Issue
🟡 Warning	`pumps/mcp_mongo.go:1-178`	The new MCP-specific pumps for MongoDB and SQL (`mcp_mongo.go`, `mcp_mongo_aggregate.go`, `mcp_sql.go`, `mcp_sql_aggregate.go`) introduce significant code duplication from their existing counterparts (`mongo.go`, `mongo_aggregate.go`, etc.). Core logic for database connection, batching, sharding, and data insertion/upserting is largely copied. This architectural approach increases the maintenance burden, as future bug fixes or enhancements to this common logic will need to be applied in multiple places. 💡 Suggestion Consider refactoring the pumps to be more data-type agnostic. The existing pumps could be extended to handle different analytics record types (REST, GraphQL, MCP) by using interfaces and abstracting the type-specific logic (e.g., aggregation, data model conversion). The implementation for the Elasticsearch pump in this same PR (`pumps/elasticsearch.go`) serves as a good example, where MCP handling was integrated into the existing pump with minimal duplication.

Powered by Visor from Probelabs

Last updated: 2026-04-17T08:02:49.324Z | Triggered by: pr_updated | Commit: 98ca874

💡 TIP: You can chat with Visor using /visor ask <your question>

… and elasticsearch

tbuchaillot · 2026-03-19T16:06:43Z

+		mcpOnly:    true,
+	}
+
+	p.allMetrics = append(p.allMetrics, totalStatusMetric, pathStatusMetrics, keyStatusMetrics, oauthStatusMetrics, totalLatencyMetrics, mcpCallsMetric, mcpLatencyMetric)


I wouldn't add this as a default metric

so what you're saying is to only append to slice if the mcp logic is enabled?

I'm saying is not to add this as a default prometheus metric (https://github.com/TykTechnologies/tyk-pump?tab=readme-ov-file#prometheus) but document + expose McpOnly in a configuration so users can set their custom metrics around it (https://github.com/TykTechnologies/tyk-pump?tab=readme-ov-file#custom-prometheus-metrics).

The current problem with default metrics is that users cannot modify them and they need to use extra steps ( e.g. TYK_PMP_PUMPS_PROMETHEUS_META_DISABLEDMETRICS) to disable them if not needed.

I'd make it simple here and expose + document with some example.

tbuchaillot · 2026-03-19T16:07:45Z


+	// mcpOnly marks a metric as MCP-specific: it is only processed for records where IsMCPRecord() is true.
+	// This is an internal field and is not user-configurable.
+	mcpOnly bool


Maybe this should be a public config so user can set their custom metric around mcp.

can you please explain what is the end goal here?

same as #954 (comment) @andrei-tyk

github-actions · 2026-03-26T14:29:32Z

🚨 Jira Linter Failed

Commit: e5ac48a
Failed at: 2026-03-26 14:29:29 UTC

The Jira linter failed to validate your PR. Please check the error details below:

🔍 Click to view error details

failed to validate branch and PR title rules: PR title must contain the Jira ticket ID 'TT-16809'

Next Steps

Ensure your branch name contains a valid Jira ticket ID (e.g., ABC-123)
Verify your PR title matches the branch's Jira ticket ID
Check that the Jira ticket exists and is accessible

This comment will be automatically deleted once the linter passes.

…n config flag

…y_breakdown options

…d latency_breakdown options" This reverts commit 9f340b3b31019492f43b18dc1550513048cef6a5.

MCPSQLAggregatePump was only calling AutoMigrate without creating the (dimension, timestamp, org_id, dimension_value) composite index that SQLAggregatePump creates for tyk_aggregated. Without this index all MCP analytics SQL queries perform full table scans. Adds ensureTable and ensureIndex methods matching the regular aggregate pump, including PostgreSQL CONCURRENTLY support and the omit_index_creation config flag.

MCPSQLAggregatePump was not migrating existing sharded tables on startup, unlike SQLAggregatePump which calls HandleTableMigration and MigrateAllShardedTables. If the schema changes, existing shards would not be updated.

…ate-mcp-analytics # Conflicts: # .github/workflows/ci-test.yml

sonarqubecloud · 2026-04-17T08:06:11Z

Quality Gate failed

Failed conditions
72.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

lghiur requested a review from a team as a code owner March 18, 2026 13:17

lghiur force-pushed the TT-16809-Pump-generate-mcp-analytics branch 2 times, most recently from e094776 to f1ccfcb Compare March 18, 2026 14:07

[TT-16809] Added MCP pump support for mongo, psql, hybrid, prometheus…

1ec8a10

… and elasticsearch

lghiur force-pushed the TT-16809-Pump-generate-mcp-analytics branch from f1ccfcb to 1ec8a10 Compare March 18, 2026 14:12

tbuchaillot requested changes Mar 19, 2026

View reviewed changes

andrei-tyk and others added 8 commits March 25, 2026 14:20

Merge branch 'master' into TT-16809-Pump-generate-mcp-analytics

1a2d024

TT-16809, covereage and refactoring

a299536

TT-16809, covereage and refactoring

a2a4e09

TT-16809, covereage and refactoring3

238656d

trigger CI

ea6c04a

trigger CI2

35d30df

TT-16809,excluded proto file via CI

9bb3229

trigger CI6

e5ac48a

andrei-tyk and others added 13 commits April 9, 2026 11:25

Merge branch 'master' into TT-16809-Pump-generate-mcp-analytics

d4b5281

TT-16809, gate hybrid pump MCP aggregation behind EnableMCPAggregatio…

d41b925

…n config flag

TT-16809, document MCP custom Prometheus metrics, mcp_only and latenc…

3993156

…y_breakdown options

Revert "TT-16809, document MCP custom Prometheus metrics, mcp_only an…

dd13241

…d latency_breakdown options" This reverts commit 9f340b3b31019492f43b18dc1550513048cef6a5.

TT-16809, addrssed comments

de94185

TT-16809, fix CI

2c16cbc

TT-16809, bump gotestsum to v1.13.0 for Go 1.25 compatibility

369bd17

TT-16809, fix for duplicate logging on requests made to non-mcp apis

34b8ee9

TT-16809, fixed mcp records not being excluded in some non-mcp pumps

f2c2563

TT-16809,feature creep cleanup

193b797

TT-16809, fixed logs

9643c89

TT-16809, fixed raw mongo pump

3f5d5ce

skip MCP records in MongoAggregatePump WriteData

04403a3

andrei-tyk added 2 commits April 16, 2026 09:43

add sharded table migration for MCP SQL aggregate pump

10bc0d2

MCPSQLAggregatePump was not migrating existing sharded tables on startup, unlike SQLAggregatePump which calls HandleTableMigration and MigrateAllShardedTables. If the schema changes, existing shards would not be updated.

tbuchaillot approved these changes Apr 16, 2026

View reviewed changes

andrei-tyk and others added 2 commits April 16, 2026 17:17

Merge remote-tracking branch 'origin/master' into TT-16809-Pump-gener…

568b199

…ate-mcp-analytics # Conflicts: # .github/workflows/ci-test.yml

Merge branch 'master' into TT-16809-Pump-generate-mcp-analytics

98ca874

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tt 16809 pump generate mcp analytics#954

Tt 16809 pump generate mcp analytics#954
lghiur wants to merge 26 commits intomasterfrom
TT-16809-Pump-generate-mcp-analytics

lghiur commented Mar 18, 2026

Uh oh!

probelabs bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

probelabs bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

tbuchaillot Mar 19, 2026

Uh oh!

andrei-tyk Mar 30, 2026

Uh oh!

tbuchaillot Mar 30, 2026

Uh oh!

andrei-tyk Apr 9, 2026

Uh oh!

tbuchaillot Mar 19, 2026

Uh oh!

andrei-tyk Mar 30, 2026

Uh oh!

tbuchaillot Mar 30, 2026

Uh oh!

andrei-tyk Apr 9, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

sonarqubecloud bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lghiur commented Mar 18, 2026

Uh oh!

probelabs bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files Changed Analysis

Architecture & Impact Assessment

What this PR accomplishes

Key technical changes introduced

Affected system components

Component Interaction Flow

Scope Discovery & Context Expansion

Uh oh!

probelabs bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Issues (1)

Security Issues (1)

Performance Issues (2)

Quality Issues (1)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

🚨 Jira Linter Failed

Next Steps

Uh oh!

sonarqubecloud bot commented Apr 17, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

probelabs bot commented Mar 18, 2026 •

edited

Loading

probelabs bot commented Mar 18, 2026 •

edited

Loading