Skip to content

Commit 7ad0609

Browse files
Update documentation and add new diagnostic scripts for cache hit rates and job errors
1 parent cf5b591 commit 7ad0609

7 files changed

Lines changed: 158 additions & 45 deletions

File tree

CONTRIBUTING.md

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,28 @@
11
# Contributing to pgtools
22

3-
We welcome contributions from the PostgreSQL community! This document provides comprehensive guidelines for contributing to the pgtools project.
3+
We welcome contributions from the PostgreSQL and TimescaleDB communities! This document provides guidelines for contributing to the `pgtools` "First Responder" Toolbelt.
44

5-
## 📋 Quick Start
5+
The goal of this project is to create a safe, reliable, and easy-to-use set of diagnostic scripts for support engineers and DBAs to use when triaging production database issues.
66

7-
### Prerequisites
8-
- PostgreSQL knowledge (administration, performance tuning, or development)
9-
- Basic understanding of SQL and shell scripting
10-
- Familiarity with Git and GitHub workflows
11-
- Access to PostgreSQL test environment for script validation
7+
## Core Principles
8+
9+
All contributions must adhere to a strict **"Zero-Harm"** policy.
10+
11+
1. **Read-Only**: Scripts must *never* perform write operations. No `CREATE TABLE`, `ALTER`, `UPDATE`, `DELETE`, or `TRUNCATE`. Temporary tables should be avoided unless absolutely necessary and lightweight.
12+
2. **Lightweight**: Queries must be efficient and avoid expensive operations that could impact a heavily loaded customer system.
13+
3. **No Heavy Dependencies**: We stick to `bash` and `psql`. No Python, Go, or other languages that require complex installation.
14+
4. **Safety First**: All queries are executed via the `pgtools.sh` wrapper, which enforces a short `statement_timeout` and `lock_timeout`.
15+
16+
## Getting Started
1217

13-
### Getting Started
1418
```bash
1519
# Fork the repository on GitHub
1620
# Clone your fork
1721
git clone https://github.com/your-username/pgtools.git
1822
cd pgtools
1923

20-
# Create development branch
24+
# Create a feature branch
2125
git checkout -b feature/your-feature-name
22-
23-
# Test current scripts in your environment
24-
./automation/test_pgtools.sh --database your_test_db
25-
26-
# Optional: run the full local pre-commit bundle
27-
./scripts/precommit_checks.sh --database your_test_db
2826
```
2927

3028
## Types of Contributions

GETTING-STARTED.md

Lines changed: 12 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,29 @@
1-
# Getting Started with pgtools
1+
# Getting Started with pgtools: The First Responder Toolbelt
22

3-
Welcome to **pgtools** - the comprehensive PostgreSQL administration toolkit! This guide will help you get up and running quickly, whether you're a database administrator, developer, or DevOps engineer working with PostgreSQL.
3+
Welcome to **pgtools**, the "First Responder" Support Toolbelt for safely diagnosing PostgreSQL and TimescaleDB databases. This guide will help you get up and running quickly.
44

55
## 📋 Table of Contents
66

77
- [Prerequisites](#prerequisites)
8-
- [Quick Installation](#quick-installation)
9-
- [First Steps](#first-steps)
10-
- [Essential Scripts](#essential-scripts)
11-
- [Common Workflows](#common-workflows)
12-
- [Setting Up Automation](#setting-up-automation)
13-
- [Troubleshooting](#troubleshooting)
14-
- [Next Steps](#next-steps)
8+
- [Installation](#installation)
9+
- [Core Usage](#core-usage)
10+
- [Example Workflows](#example-workflows)
11+
- [Available Commands](#available-commands)
1512

1613
## Prerequisites
1714

18-
### System Requirements
19-
- **PostgreSQL**: Version 14 or higher (tested against 14, 15, 16, 17, and 18)
20-
- **Operating System**: Linux, macOS, or Windows with appropriate shell environment
21-
- **Shell**: Bash, Zsh, or compatible shell for automation scripts
22-
- **Tools**: `psql` command-line client, Git (for installation)
15+
- **Shell**: A `bash`-compatible shell.
16+
- **Tools**: `psql` (the PostgreSQL command-line client) and `git`.
17+
- **Database Access**: A valid PostgreSQL connection string or service name to connect to the target database. Most scripts require privileges equivalent to the `pg_monitor` role.
2318

24-
### Database Access
25-
- **Privileges**: Most scripts require `pg_monitor` role or superuser privileges
26-
- **Connection**: Ability to connect to your PostgreSQL database(s)
27-
- **Extensions**: Some scripts benefit from `pg_stat_statements` and `pg_buffercache`
28-
29-
### Knowledge Level
30-
- **Basic SQL**: Understanding of PostgreSQL queries and administration
31-
- **Command Line**: Comfort with terminal/command prompt usage
32-
- **PostgreSQL Concepts**: Familiarity with databases, tables, indexes, and basic administration
33-
34-
## Quick Installation
35-
36-
### Method 1: Git Clone (Recommended)
19+
## Installation
3720
```bash
3821
# Clone the repository
3922
git clone https://github.com/thepostgresguy/pgtools.git
4023
cd pgtools
4124

42-
# Make scripts executable
43-
chmod +x automation/*.sh maintenance/*.sh integration/*.sh configuration/*.sh
25+
# Make the main script executable
26+
chmod +x pgtools.sh
4427
```
4528

4629
### Method 2: Download ZIP

cache-hit.sql

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
/*
2+
* Script: sql/diagnostics/cache-hit.sql
3+
*
4+
* Purpose: Calculates and displays table and index cache hit rates.
5+
*
6+
* Description:
7+
* This query provides a crucial performance metric: the percentage of blocks
8+
* read from the PostgreSQL buffer cache versus from disk. High cache hit
9+
* rates (typically > 90-95%) indicate efficient memory usage and reduced
10+
* reliance on slower disk I/O.
11+
*
12+
* Red Flags:
13+
* - `table_hit_rate_pct` or `index_hit_rate_pct` consistently below 90-95%:
14+
* Indicates significant I/O pressure, potentially due to insufficient
15+
* `shared_buffers`, inefficient queries, or missing/bad indexes.
16+
* - `total_reads` is very high for a table/index with a low hit rate:
17+
* This object is frequently accessed but rarely found in cache.
18+
*
19+
* Interpretation:
20+
* - `table_hit_rate_pct`: Percentage of table blocks found in cache.
21+
* - `index_hit_rate_pct`: Percentage of index blocks found in cache.
22+
* - `total_reads`: Total number of blocks read (from cache + disk).
23+
* - `total_hits`: Total number of blocks found in cache.
24+
*
25+
* Safety:
26+
* This script is read-only. It queries `pg_stat_user_tables` and
27+
* `pg_stat_user_indexes`, which are standard PostgreSQL statistics views
28+
* designed for efficient diagnostic use. The `statement_timeout` set by
29+
* `pgtools.sh` provides a safety guarantee.
30+
*/
31+
SELECT
32+
relname AS object_name,
33+
CASE WHEN relkind = 'r' THEN 'TABLE' WHEN relkind = 'i' THEN 'INDEX' END AS object_type,
34+
blks_read + blks_hit AS total_reads,
35+
blks_hit AS total_hits,
36+
ROUND((blks_hit * 100.0 / NULLIF(blks_read + blks_hit, 0)), 2) AS hit_rate_pct
37+
FROM pg_stat_all_tables
38+
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
39+
AND (blks_read + blks_hit) > 0 -- Only show objects that have been accessed
40+
UNION ALL
41+
SELECT
42+
relname AS object_name,
43+
'INDEX' AS object_type,
44+
idx_blks_read + idx_blks_hit AS total_reads,
45+
idx_blks_hit AS total_hits,
46+
ROUND((idx_blks_hit * 100.0 / NULLIF(idx_blks_read + idx_blks_hit, 0)), 2) AS hit_rate_pct
47+
FROM pg_stat_all_indexes
48+
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
49+
AND (idx_blks_read + idx_blks_hit) > 0 -- Only show indexes that have been accessed
50+
ORDER BY hit_rate_pct ASC, total_reads DESC
51+
LIMIT 50;

job-errors.sql

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
/*
2+
* Script: sql/timescale/job-errors.sql
3+
*
4+
* Purpose: Shows recent errors from TimescaleDB background jobs.
5+
*
6+
* Description:
7+
* This query retrieves the most recent errors logged by the TimescaleDB
8+
* background worker scheduler. It is the primary tool for debugging why a
9+
* policy (compression, CAGG refresh, retention) is failing.
10+
*
11+
* Red Flags:
12+
* - Any rows returned are a red flag.
13+
* - Repetitive `err_message` for the same `job_id`: Indicates a persistent problem.
14+
* - `sqlerrcode` other than '00000': Provides the specific SQL error code.
15+
*
16+
* Interpretation:
17+
* - `proc_name`: The type of job that failed (e.g., 'policy_compression').
18+
* - `err_message`: The detailed error message, often explaining the root cause
19+
* (e.g., permission denied, constraint violation).
20+
*
21+
* Safety:
22+
* This script is read-only. It queries the `timescaledb_information.job_errors`
23+
* view, which is designed for efficient diagnostic use.
24+
*/
25+
SELECT
26+
job_id,
27+
proc_name,
28+
start_time,
29+
finish_time,
30+
sqlerrcode,
31+
err_message
32+
FROM timescaledb_information.job_errors
33+
ORDER BY start_time DESC
34+
LIMIT 50;

ownership.sql

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
/*
2+
* Script: sql/admin/ownership.sql
3+
*
4+
* Purpose: Displays ownership information for tables, views, and materialized views.
5+
*
6+
* Description:
7+
* This query lists all user-defined tables, views, materialized views, and
8+
* foreign tables, along with their respective owners. It's a key diagnostic
9+
* for "permission denied" errors or when auditing for orphaned objects after
10+
* role changes.
11+
*
12+
* Red Flags:
13+
* - Objects owned by roles that no longer exist (owner will show as an OID).
14+
* - Critical application tables owned by a superuser or an unexpected role.
15+
* - Inconsistent ownership patterns across schemas or applications.
16+
*
17+
* Interpretation:
18+
* - `schema_name`: The schema containing the object.
19+
* - `object_name`: The name of the table, view, etc.
20+
* - `object_type`: Indicates if it's a TABLE, VIEW, MATERIALIZED VIEW, etc.
21+
* - `owner`: The role that owns the object.
22+
*
23+
* Safety:
24+
* This script is read-only. It queries standard `pg_catalog` views (`pg_class`,
25+
* `pg_namespace`) which are designed for efficient diagnostic use. The
26+
* `statement_timeout` set by `pgtools.sh` provides a safety guarantee.
27+
*/
28+
SELECT
29+
n.nspname AS schema_name,
30+
c.relname AS object_name,
31+
pg_catalog.pg_get_userbyid(c.relowner) AS owner,
32+
CASE c.relkind
33+
WHEN 'r' THEN 'TABLE'
34+
WHEN 'v' THEN 'VIEW'
35+
WHEN 'm' THEN 'MATERIALIZED VIEW'
36+
WHEN 'f' THEN 'FOREIGN TABLE'
37+
WHEN 'p' THEN 'PARTITIONED TABLE'
38+
END AS object_type
39+
FROM pg_catalog.pg_class c
40+
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
41+
WHERE c.relkind IN ('r','v','m','f','p')
42+
AND n.nspname NOT IN ('pg_catalog', 'information_schema')
43+
AND n.nspname !~ '^pg_toast'
44+
ORDER BY n.nspname, c.relname;

pgtools.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,11 +67,14 @@ Commands:
6767
bloat Identify table and index bloat. (sql/diagnostics/bloat.sql)
6868
replication Monitor replication lag and status. (sql/diagnostics/replication.sql)
6969
disk-usage Show disk usage by table and index. (sql/diagnostics/disk-usage.sql)
70+
cache-hit Show table and index cache hit rates. (sql/diagnostics/cache-hit.sql)
7071
7172
# TimescaleDB Diagnostics
7273
chunk-stats Show chunk count and size per hypertable. (sql/timescale/chunk-stats.sql)
7374
compression-stats Show compression ratio and job status per hypertable. (sql/timescale/compression-stats.sql)
7475
cagg-stats Show continuous aggregate health and refresh policy status. (sql/timescale/cagg-stats.sql)
76+
job-errors Show recent errors from background jobs. (sql/timescale/job-errors.sql)
77+
uncompressed-chunks Show chunks that are old but not compressed. (sql/timescale/uncompressed-chunks.sql)
7578
7679
# Administration
7780
permissions Audit user and role permissions. (sql/admin/permissions.sql)
@@ -123,7 +126,7 @@ main() {
123126
local connection_string="$2"
124127

125128
case "${command}" in
126-
locks|activity|top-queries|bloat|replication|disk-usage|chunk-stats|compression-stats|cagg-stats|permissions|ownership)
129+
locks|activity|top-queries|bloat|replication|disk-usage|cache-hit|chunk-stats|compression-stats|cagg-stats|job-errors|uncompressed-chunks|permissions|ownership)
127130
run_sql "${command}" "${connection_string}"
128131
;;
129132
*)

uncompressed-chunks.sql

Whitespace-only changes.

0 commit comments

Comments
 (0)