Skip to content

Releases: pganalyze/collector

v0.70.2

07 May 03:29
a66ce25

Choose a tag to compare

  • Aurora: Improve disk size reporting
    • Use the VolumeBytesUsed CloudWatch metric at the cluster level
    • Previously, the collector reported OS-level filesystem values that were
      inaccurate regardless of the actual cluster storage in use
  • AlloyDB/Cloud SQL: Fix IAM authentication by removing prepared statements
    • Previously, explicit prepared statements broke when AlloyDB/Cloud SQL was
      configured with IAM authentication if the proxy was not configured for
      prepared statements
    • This removes all prepared statement usage from collector queries
      independent of the provider in use, allowing proxy use without changes
  • Docker image: Raise alpine/musl stack size from 2MB to 8MB
    • This matches what glibc would typically use, and fixes a crash on a very
      complex query making heavy use of UNION
  • Log processing: Support log_error_verbosity = verbose
    • Previously using Log Insights was not supported when log_error_verbosity
      was set to "verbose", as log messages get the SQLSTATE inlined into them,
      confusing the parser. This is now special-cased and supported, and the
      checks for the setting are removed

v0.70.1

16 Apr 02:00
134c380

Choose a tag to compare

  • Fix compatibility with existing pganalyze Enterprise Server releases
    • Version 0.70.0 added tracking of nested query statistics using the toplevel field,
      which broke compatibility with Enterprise Server when using pg_stat_statements.track = all.
      This change detects whether the pganalyze server supports nested query statistics
      so that new collector versions can be used without upgrading the server.

v0.70.0

04 Apr 01:53
18a06b4

Choose a tag to compare

  • Track nested query statistics separately based on toplevel field
    • This matters when pg_stat_statements.track is set to "all", and allows pganalyze
      to consider query activity from inside functions, or other nested cases (e.g. EXPLAIN)
      separately from top level activity (direct query execution)
  • Allow resetting pg_stat_statements when nearly full
    • Due to pg_stat_statements deallocating 5% of the least used queries when full
      (i.e. number of entries hits the pg_stat_statements.max), certain workloads can
      experience a high rate of <query text unavailable> in pganalyze, due to very old
      queries with high call counts taking priority over more recent query activity.
    • In such situations, a recurring pg_stat_statements_reset() call can avoid the
      situation by clearing 100% of entries, so that there is more space for fresh entries
    • This reworks the existing reset mechanism to reset pg_stat_statements when
      (1) it has utilized most of its entries, and a dealloc is likely occurring soon
      (2) the returned query text exceeds 250MB
    • Resets are optional and turned off by default. When the helper function exists,
      and the reset interval is configured through pganalyze, it is now taken as the
      highest permitted reset frequency, i.e. with this change resets will likely occur
      less often than before (previously it was a fixed interval that would always reset)
  • Keep per-query information on whether a query / query sample was normalized
    • This lets the pganalyze application be informed whether PII filtering was
      applied to a particular snapshot being submitted.
  • Allow multiplexed use of OpenTelemetry logs server
    • Multiple servers can now share the same db_log_otel_server configuration
    • This is safe to do in certain circumstances, specifically when a
      Kubernetes pod or label filter is in place, and the log message has sufficient
      details (i.e. is annotated with Kubernetes metadata)
    • If configured without a pod/label filter the collector will emit a warning,
      and send received logs to each server sharing the same db_log_otel_server
  • Kubernetes label matching: Ensure selected labels are present for equality
    • Previously a label specified in the selector, but not actually present
      in the data would lead to that part of the selector being skipped, not
      counting it as a mismatch. Instead count it as a mismatch for equality,
      but a match for inequality.
  • Diff pg_stat_statements_info dealloc counter correctly on initial collector start
  • Rework OpenTelemetry logs handler to be OTLP spec compliant
  • Crunchy Bridge: Don't crash when metrics retrieval fails
  • AlloyDB/Cloud SQL:
    • Allow IAM authentication to use private service connect endpoint
      • This is enabled by setting the new gcp_use_psc / GCP_USE_PSC to true
    • Avoid accidental use of prepared statements when IAM authentication is in use
  • AlloyDB: Add back support for follower statistics
    • This revises the change added in 0.69.0 to more specifically skip
      the problematic function on a replica

v0.69.0

26 Feb 10:09
c499797

Choose a tag to compare

  • Improve query text collection to reduce missing query texts
    • Add a fingerprint cache that retains known query IDs across collection
      cycles, increasing memory use by up to about 9 MB per server whilst reducing
      CPU overhead
    • For data from pg_stat_activity (Connections page in pganalyze), this reduces
      the occurrence of <truncated query>
    • For data from pg_stat_statements (Query Performance page in pganalyze), this
      reduces the likelihood of "query text unavailable" by re-using previous
      query texts already stored on the server side
  • PlanetScale: Paginate log fetching to ensure freshness
    • Previously, only a single page of up to 1,000 log entries was fetched,
      which proved insufficient for high-traffic servers. The collector now
      paginates through all available entries
  • AlloyDB: Skip collecting replication stats
  • Helm chart: Allow setting pod labels and pulling images by digest
  • Syslog handler: Add UDP listener
  • Add packages for Debian 13 (Trixie)

v0.68.2

03 Feb 17:06
42b6df1

Choose a tag to compare

  • Allow explicitly setting Cluster ID
    • The new api_cluster_id / API_CLUSTER_ID setting enables overriding the
      automatically detected cluster ID, or set it on platforms that don't have
      built-in detection.
    • This value is stored by pganalyze for each server, and used for grouping
      servers together. Alternate mechanisms to create server groups in pganalyze
      will be offered in a future release.

v0.68.1

28 Jan 09:56
1bb74d7

Choose a tag to compare

  • Fix infrequent crash in query normalization when encountering certain utility statements
  • Do not fail snapshot collection when pg_stat_statements is missing with Postgres 14+
    • In such situations we now report a snapshot with an error state, as was the case
      before release 0.65.0
  • Add missing mappings for wait event names

v0.68.0

21 Jan 06:58
8bcab47

Choose a tag to compare

  • Add support for PlanetScale databases
    • This adds PlanetScale as a new supported server type, including log fetching
      from PlanetScale's log services
    • New configuration settings: planetscale_org / PLANETSCALE_ORG,
      planetscale_database / PLANETSCALE_DATABASE, and
      planetscale_branch / PLANETSCALE_BRANCH for identifying instances
  • Update wait event name list to support Postgres 18
    • Previously most LWLock-type wait events were not appearing correctly
  • Improve collector log analysis
    • Add matching for autovacuum messages produced by newer AlloyDB versions
      and Postgres 18
    • Handle terminology changes in Postgres 14+ where "misses" became "reads"
      in buffer usage reporting
    • Capture SLRU and LSN information from checkpoint completion messages
    • Add new log event type for connection authenticated messages
    • Expand patterns for connection failure scenarios
  • Add Vector OTLP export configuration example to contrib/vector
    • This supports using Vector instead of Fluentbit for forwarding log events
      via OpenTelemetry Protocol

v0.67.0

09 Dec 02:53
f2578d9

Choose a tag to compare

  • Add support for capturing plan statistics with pg_stat_plans
  • Avoid skipping query stats in case of slow query text/schema collection
    • This reworks how pg_stat_statements data is retrieved, to ensure query
      statistics are continuously collected at 1 minute intervals, even if the
      full snapshot collection is slow (e.g. due to query text file size or many
      tables/indexes)
  • Improve the correctness of replication/server/backend count metrics
    • Previously these metrics were collected after potentially slow operations in
      the full snapshot, causing different collection times instead of staying
      close to the intended 10 minute interval
  • GCP Pub/Sub: Allow multiple collectors subscribing to the same topic
    • This adds the new gcp_pubsub_max_age / GCP_PUBSUB_MAX_AGE setting,
      default 0, maximum 24 hours. Keeping the age limit low is recommended,
      10 minutes (gcp_pubsub_max_age = 10m) is a good value to use
    • If set to a non-zero value, the collector returns a GCP PubSub message
      that's not for a configured server to the topic (by sending a "Nack"
      message), allowing a different collector to pick it up. This enables an
      architecture where one topic can be shared across multiple collectors
    • However, if the message is older than the max age limit, it is always
      acknowledged (discarded)
  • Improve buffer cache connection handling to avoid connection leak
    • Previously this may have blocked on an unconsumed channel, which could have
      caused the database connection to stay open longer than planned
  • Disable buffer cache collection for Aurora Serverless
    • We've had a report of a Postgres segfault caused by scanning the buffer
      cache during an Aurora Serverless scaling event, which was confirmed by AWS
      to be a current Aurora bug. For now collection of pg_buffercache is turned
      off until a bugfix is available
  • Helper: Don't require "locate" to determine the pg_controldata path
    • This fixes cluster ID detection (via the Postgres system identifier)
      for a typical PGDG-based Debian/Ubuntu install
  • Prune stray temporary files on collector start
    • Previously an out of memory crash, typically due to systemd limits, could
      cause temporary files to stay around and never be deleted
    • Now the collector will remove such files when it starts back up, avoiding
      potential disk space issues with the temporary file directory
    • Collector temporary files are now prefixed with "pganalyze_collector_" to
      support this functionality, and make identification easier
  • Introduce new "api_require_websocket" / "API_REQUIRE_WEBSOCKET" setting
    • This setting (currently defaulting to off) can be used to require that
      the more efficient WebSocket connection method is used for submitting
      snapshots to the pganalyze server
    • We intend to migrate fully to WebSockets in the future, and a future
      collector release will change the defaulf of this setting to true, i.e.
      require their use by default (with this setting remaining for a while
      after that as a fallback, until it is fully required)
  • OTel Log Server: Support receiving JSON log format in addition to Protobuf
    • This supports using Vector instead of Fluentbit for forwarding log events
      from a CloudNativePG (CNPG) installation on Kubernetes
    • With thanks to Rauan Mayemir for the initial contribution of this fix
  • Log event redaction: Fix off-by-one error when redacting full lines
    • This caused log lines where the full line should be redacted (as is
      typically the case with STATEMENT lines when using log filtering) to not
      be redacted correctly.
  • Fix nap logic of "--text-explain" to actually wait the configured time
  • Include database name in warning when pg_stat_statements is out of date
  • Handle test cancellation correctly during log tests (avoid hangs)

v0.66.3

14 Aug 08:26
3f04974

Choose a tag to compare

  • Log Insights: Support for multiple OpenTelemetry log receiving endpoints
    • Previously, only a single HTTP server (port) could receive logs. With this
      change, an OTEL HTTP server can be specified per monitored server section
      using db_log_otel_server / LOG_OTEL_SERVER
    • The Helm chart has been updated to support this as well. Within the service
      section, multiple ports can be specified using the new ports value
  • Add --very-verbose flag
    • Enables very verbose logging (also implicitly enables verbose logging)
    • When used with the Log Insights integration for syslog, OpenTelemetry, Azure,
      or Google Cloud, this flag will output incoming logs from these services
  • Show error when duplicate server definitions are found in the collector config
    during test run
  • Install script: Add Oracle Linux support

v0.66.2

25 Jun 03:11
b3ccc14

Choose a tag to compare

  • Amazon Aurora: Add support for Postgres 17
    • The collector now supports Amazon Aurora with Postgres 17
    • Previously, Amazon Aurora users on Postgres 17 that also had plan statistics
      enabled were unable to collect query statistics due to a column "blk_read_time"
      does not exist error
  • Log Insights: Improve parsing Heroku auto_explain logs using JSON format
    • Support newlines in the middle of the EXPLAIN query with the JSON format