Skip to content

feat(keeper): migrate from main to 3.3.8 and add metrics v2 support#34698

Open
sheyanjie-qq wants to merge 2 commits into3.3.8from
feat/6622579928-338
Open

feat(keeper): migrate from main to 3.3.8 and add metrics v2 support#34698
sheyanjie-qq wants to merge 2 commits into3.3.8from
feat/6622579928-338

Conversation

@sheyanjie-qq
Copy link
Contributor

@sheyanjie-qq sheyanjie-qq commented Mar 6, 2026

Description

  • Add metrics v2 support with in-memory cache mode
  • Add prometheus config for cache management
  • Refactor config into modular files
  • Keep audit implementation from 3.3.8 for compatibility
  • Update dependencies and fix bugs

Issue(s)

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

  - Add metrics v2 support with in-memory cache mode
  - Add prometheus config for cache management
  - Refactor config into modular files
  - Keep audit implementation from 3.3.8 for compatibility
  - Update dependencies and fix bugs

Close: https://project.feishu.cn/taosdata_td/feature/detail/6622579928
Copilot AI review requested due to automatic review settings March 6, 2026 07:25
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the keeper's metrics infrastructure by introducing a new Prometheus v2 API that utilizes an in-memory caching mechanism. This change aims to provide more immediate and performant access to metrics data, reducing reliance on direct database queries for frequently accessed information. Concurrently, the configuration system has been reorganized into a modular structure, enhancing clarity and ease of management. These updates, alongside critical dependency upgrades and bug fixes, contribute to a more modern, efficient, and secure monitoring solution.

Highlights

  • Metrics V2 Support: Introduced a new Prometheus v2 API endpoint (/metrics/v2) that leverages an in-memory cache for real-time metric collection, offering improved performance and flexibility over the traditional database-backed approach.
  • In-Memory Metric Caching: Implemented a robust in-memory store with configurable Time-To-Live (TTL) for metrics, along with a Gin middleware to intercept and parse incoming metric data for caching.
  • Modular Configuration Refactoring: The application's configuration has been modularized into separate files (e.g., env.go, log.go, metrics.go, prometheus.go, ssl.go, tdengine.go) for better organization and maintainability.
  • Enhanced Database Connectivity: Added support for bearer tokens in database connections and improved handling of special characters in user credentials, enhancing security and compatibility.
  • Dependency Updates and Bug Fixes: Updated Go version to 1.24.13 and several Go module dependencies, along with minor bug fixes such as quoting SQL keywords to prevent conflicts.
  • HTTPS Support for Keeper Server: Added the capability to run the keeper server with HTTPS, configurable via SSL certificate and key files, improving secure communication.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • tools/keeper/api/adapter2_test.go
    • Updated metrics configuration type from MetricsConfig to Metrics.
  • tools/keeper/api/exporter_test.go
    • Modified NewNodeExporter instantiation to include memoryStore and reporter parameters.
    • Initialized and deferred closing of memoryStore.
    • Adjusted logger message for clarity.
  • tools/keeper/api/metric_middleware.go
    • Added new Gin middleware MetricCacheMiddleware to intercept POST requests for specific metric paths.
    • Implemented logic to read and re-buffer request bodies for synchronous parsing.
    • Defined shouldCachePath to filter paths for caching.
  • tools/keeper/api/metric_middleware_test.go
    • Added unit tests for MetricCacheMiddleware covering path caching, non-POST request skipping, non-matching path skipping, and request body preservation.
  • tools/keeper/api/metric_parser.go
    • Added MetricParser to handle parsing and storing various metric types (general, cluster basic, adapter reports) into the in-memory store.
    • Implemented isSupportedTable logic to filter tables based on prefixes and configurable exclusions.
    • Included specific parsing functions for different metric endpoints (parseGeneralMetric, parseClusterBasic, parseAdapterReport).
    • Added parseSlowSqlDetail to explicitly skip caching slow SQL details.
  • tools/keeper/api/metric_parser_test.go
    • Added comprehensive unit tests for MetricParser covering general metrics, cluster basic info, adapter reports, slow SQL detail skipping, unsupported table filtering, invalid JSON handling, and configurable write metrics caching.
  • tools/keeper/api/nodeexporter.go
    • Updated NodeExporter struct to include memoryStore and reporter.
    • Modified NewNodeExporter constructor to accept memoryStore and reporter.
    • Introduced a new /metrics/v2 endpoint for memory-cached metrics and real-time performance schema data.
    • Implemented serveMetricsV2 to register MemoryStoreCollector and PerformanceCollector and handle max_age query parameter.
  • tools/keeper/api/performance_collector.go
    • Added PerformanceCollector to collect real-time slow query metrics from performance_schema.perf_queries.
    • Implemented Describe and Collect methods for Prometheus integration.
    • Included functions to query total slow queries and very slow queries (exec_time > 300ms).
    • Added toInt64 helper function for type conversion.
  • tools/keeper/api/performance_collector_test.go
    • Added unit tests for PerformanceCollector's Describe method and the toInt64 helper function.
  • tools/keeper/api/report.go
    • Added tryGetConn method to attempt database connection without infinite retries, suitable for metrics collection.
  • tools/keeper/api/tables.go
    • Quoted the role column name in CreateMnodeSql to prevent SQL keyword conflicts.
  • tools/keeper/api/v2_integration_test.go
    • Added end-to-end integration tests for the /metrics/v2 endpoint, verifying data flow from submission to caching and retrieval.
    • Included tests for default exclusion and explicit inclusion of tables like taosd_write_metrics.
    • Verified that expired data is not returned by the collector based on max_age.
  • tools/keeper/cmd/command.go
    • Quoted the role column name in SQL queries for TransferTaosdMnodesInfo and TransferTaosdVnodesInfo.
  • tools/keeper/config/taoskeeper.toml
    • Added a new [prometheus] section for configuring metrics v2, including includeTables and cacheTTL.
  • tools/keeper/config/taoskeeper_enterprise.toml
    • Added a new [prometheus] section for configuring metrics v2, including includeTables and cacheTTL.
  • tools/keeper/db/connector.go
    • Introduced NewConnectorWithDbAndToken to support bearer token authentication for database connections.
    • Modified NewConnectorWithDb to call NewConnectorWithDbAndToken.
    • Added NewConnectorWithDbAndTokenWithRetryForever for token-based connections with retry logic.
  • tools/keeper/db/connector_test.go
    • Added tests for NewConnectorWithDbAndToken to verify token-based authentication and handling of special characters in user passwords.
  • tools/keeper/go.mod
    • Updated Go version to 1.24.13.
    • Added github.com/agiledragon/gomonkey/v2 module.
    • Updated github.com/kardianos/service to v1.2.4.
    • Updated github.com/sirupsen/logrus to v1.8.3.
    • Updated github.com/taosdata/driver-go/v3 to v3.7.8-0.20251226061849-bfa42e87e9da.
  • tools/keeper/go.sum
    • Updated module checksums to reflect changes in go.mod.
  • tools/keeper/infrastructure/config/config.go
    • Updated Config struct to use new modular configuration types (Metrics, Prometheus, SSL).
    • Refactored init() function to call dedicated initialization functions for each configuration section (initTDengine, initMetrics, initPrometheus, initEnvironment, initSSL, initLog, initAudit).
    • Removed inline definitions of TDengineRestful, MetricsConfig, Environment structs, moving them to their respective new files.
  • tools/keeper/infrastructure/config/env.go
    • Added new file defining the Environment struct and initEnvironment function for environment-related configurations.
  • tools/keeper/infrastructure/config/log.go
    • Moved the initLog function and Log struct definition from config.go to this dedicated file.
  • tools/keeper/infrastructure/config/metric_test.go
    • Updated comments for clarity.
  • tools/keeper/infrastructure/config/metrics.go
    • Moved the Metrics struct and initMetrics function from config.go to this dedicated file.
    • Removed unused TaosAdapter and Metric structs.
  • tools/keeper/infrastructure/config/prometheus.go
    • Added new file defining the Prometheus struct and initPrometheus function for Prometheus v2 specific configurations like includeTables and cacheTTL.
  • tools/keeper/infrastructure/config/ssl.go
    • Added new file defining the SSL struct and initSSL function for SSL-related configurations (enable, cert file, key file).
  • tools/keeper/infrastructure/config/ssl_test.go
    • Added unit tests for SSL configuration parsing from TOML and environment variables.
  • tools/keeper/infrastructure/config/tdengine.go
    • Added new file defining the TDengineRestful struct and initTDengine function for TDengine connection configurations.
  • tools/keeper/main.go
    • Renamed the main server variable from r to server for improved readability.
  • tools/keeper/process/builder.go
    • Updated ExpandMetricsFromConfig function signature to accept *config.Metrics instead of *config.MetricsConfig.
  • tools/keeper/process/builder_test.go
    • Updated cfg initialization in tests to use the new config.Metrics type.
  • tools/keeper/process/memory_collector.go
    • Added MemoryStoreCollector to implement the Prometheus Collector interface for the in-memory store.
    • Implemented Collect method to expose metrics from the MemoryStore, filtering by maxAge.
  • tools/keeper/process/memory_collector_test.go
    • Added unit tests for MemoryStoreCollector covering metric collection, maxAge filtering, and empty store scenarios.
  • tools/keeper/process/memory_store.go
    • Added MemoryStore for in-memory caching of metric data with a configurable TTL.
    • Introduced MetricData struct to hold table name, tags, metrics, and timestamp.
    • Implemented cleanupLoop for periodic removal of expired data.
    • Provided GetAllFiltered to retrieve metrics within a specified time range.
    • Added SetWithTimestamp to store or update metric data with an explicit timestamp.
    • Included GetStats for store diagnostics and Close for graceful shutdown.
  • tools/keeper/process/memory_store_test.go
    • Added comprehensive unit tests for MemoryStore covering data setting, retrieval, updates, filtering, key building, statistics, concurrent access, and TTL expiration.
  • tools/keeper/system/program.go
    • Declared a global memoryStore variable for managing the in-memory cache.
    • Initialized memoryStore and metricParser during system initialization.
    • Integrated api.MetricCacheMiddleware into the Gin router.
    • Updated api.NewNodeExporter call to pass the memoryStore and reporter.
    • Modified the server startup logic to support both HTTP and HTTPS based on SSL configuration.
    • Ensured memoryStore.Close() is called during program shutdown to prevent goroutine leaks.
    • Removed the enableGzip parameter from CreateRouter.
  • tools/keeper/system/program_test.go
    • Cleaned up database drop statements in TestInit.
    • Added Test_program_Start_HTTP and Test_program_Start_HTTPS to verify server startup with different protocols.
    • Included a generateTestCert helper function for creating self-signed certificates for HTTPS tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates taoskeeper toward the 3.3.8 baseline while adding Prometheus metrics v2 support via an in-memory cache (with TTL-based cleanup), plus modularized configuration (Prometheus/SSL/TDengine/env/log).

Changes:

  • Add in-memory metric cache pipeline (middleware + parser + store) and expose /metrics/v2 with optional max_age.
  • Introduce Prometheus/SSL/TDengine/env modular config + TOML updates.
  • Extend DB connector to support bearer token authentication and update related tests/deps.

Reviewed changes

Copilot reviewed 35 out of 37 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tools/keeper/system/program_test.go Adds HTTP/HTTPS start tests and adjusts init cleanup behavior.
tools/keeper/system/program.go Wires v2 memory cache + middleware, adds HTTPS support, and shuts down cache on stop.
tools/keeper/process/memory_store_test.go Adds unit tests for in-memory TTL cache behavior and concurrency.
tools/keeper/process/memory_store.go Implements in-memory metric store with TTL cleanup loop and stats.
tools/keeper/process/memory_collector_test.go Adds tests for Prometheus collector emitting cached metrics.
tools/keeper/process/memory_collector.go Prometheus collector for exporting cached metrics with max-age filtering.
tools/keeper/process/builder_test.go Updates config type rename in tests.
tools/keeper/process/builder.go Updates config type rename for metrics expansion.
tools/keeper/main.go Minor variable rename for clarity.
tools/keeper/infrastructure/config/tdengine.go Extracts TDengine config initialization into a module.
tools/keeper/infrastructure/config/ssl_test.go Adds SSL config parsing/env tests.
tools/keeper/infrastructure/config/ssl.go Adds SSL config module for HTTPS serving.
tools/keeper/infrastructure/config/prometheus.go Adds Prometheus v2 cache config (includeTables/cacheTTL).
tools/keeper/infrastructure/config/metrics.go Refactors metrics config and registers flags/env defaults.
tools/keeper/infrastructure/config/metric_test.go Minor comment formatting changes in test helper.
tools/keeper/infrastructure/config/log.go Refactors log config initialization into a module.
tools/keeper/infrastructure/config/env.go Extracts environment config initialization into a module.
tools/keeper/infrastructure/config/config.go Integrates new modular config structs (Metrics/Prometheus/SSL) and init calls.
tools/keeper/go.sum Updates dependency checksums for new/updated modules.
tools/keeper/go.mod Updates Go version and dependencies (service/logrus/driver-go, adds gomonkey).
tools/keeper/db/connector_test.go Adds enterprise token-connector tests and reorganizes special-char tests.
tools/keeper/db/connector.go Adds NewConnectorWithDbAndToken* and bearerToken DSN parameter.
tools/keeper/config/taoskeeper_enterprise.toml Adds [prometheus] v2 cache configuration block.
tools/keeper/config/taoskeeper.toml Adds [prometheus] v2 cache configuration block.
tools/keeper/cmd/command.go Quotes role field in SQL to avoid reserved-word conflicts.
tools/keeper/api/v2_integration_test.go Adds end-to-end v2 cache → collector → HTTP output tests.
tools/keeper/api/tables.go Quotes role column in table DDL.
tools/keeper/api/report.go Adds non-forever-retry DB connect helper for collectors.
tools/keeper/api/performance_collector_test.go Adds unit tests for performance collector and toInt64.
tools/keeper/api/performance_collector.go Adds performance_schema real-time Prometheus collector.
tools/keeper/api/nodeexporter.go Adds /metrics/v2 endpoint and registers memory+performance collectors.
tools/keeper/api/metric_parser_test.go Adds tests for parsing/caching metrics for v2 mode.
tools/keeper/api/metric_parser.go Implements parser for caching selected endpoints into MemoryStore.
tools/keeper/api/metric_middleware_test.go Adds middleware tests ensuring caching + body preservation.
tools/keeper/api/metric_middleware.go Adds middleware to intercept metric POST bodies and cache synchronously.
tools/keeper/api/exporter_test.go Updates NodeExporter construction to include memory store + reporter.
tools/keeper/api/adapter2_test.go Updates metrics config type rename in adapter tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: metrics v2 with an in-memory cache, which improves performance by avoiding database queries for metrics. However, the implementation contains several Denial of Service (DoS) vulnerabilities, including a potential process panic due to lack of validation for metric names used in Prometheus descriptors, and memory exhaustion risks due to unlimited request body reading and an unbounded in-memory store on unauthenticated endpoints. Beyond these critical security concerns, the review also focuses on improving the robustness and maintainability of the new components, despite the positive refactoring of configuration and dependency updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants