|
1 | | -# pgtools |
2 | | - |
3 | | -A collection of SQL scripts and utilities for monitoring, troubleshooting, and maintaining PostgreSQL databases. |
4 | | - |
5 | | -## 👋 New to pgtools? |
6 | | - |
7 | | -**[👉 Get Started Here - Complete Beginner's Guide](GETTING-STARTED.md)** |
8 | | - |
9 | | -Perfect for new users! This comprehensive guide walks you through installation, first steps, essential workflows, and automation setup. |
10 | | - |
11 | | -## 📋 Table of Contents |
12 | | - |
13 | | -- [Overview](#overview) |
14 | | -- [Script Categories](#script-categories) |
15 | | -- [Usage Examples](#usage-examples) |
16 | | -- [Contributing](#contributing) |
17 | | -- [License](#license) |
18 | | - |
19 | | -### Quick Links |
20 | | - |
21 | | -This toolkit provides battle-tested SQL scripts for PostgreSQL database administrators and developers to: |
22 | | -- Monitor database health and performance |
23 | | -- Troubleshoot common issues |
24 | | -- Maintain database integrity |
25 | | -- Optimize query performance |
26 | | -- Manage replication and WAL files |
27 | | -## Script Categories |
28 | | -### 🔍 Monitoring Scripts |
29 | | -**bloating.sql** |
30 | | -- Detects table and index bloat |
31 | | -- Shows dead tuples and wasted space |
32 | | -- Helps identify tables needing VACUUM |
33 | | - |
34 | | -**buffer_troubleshoot.sql** |
35 | | -- Analyzes shared buffer usage |
36 | | -- Shows buffer cache hit ratios |
37 | | -- Identifies tables with poor caching |
38 | | - |
39 | | -**locks.sql** |
40 | | -- Lists current locks in the database |
41 | | -- Shows lock types and waiting queries |
42 | | -- Essential for deadlock investigation |
43 | | - |
44 | | -**postgres_locking_blocking.sql** |
45 | | -- Advanced lock analysis |
46 | | -- Shows blocking and blocked queries |
47 | | -- Includes query details and wait times |
48 | | - |
49 | | -**replication.sql** |
50 | | -- Monitors replication lag |
51 | | -- Shows replication slot status |
52 | | -- Checks standby server health |
53 | | - |
54 | | -**txid.sql** |
55 | | -- Displays transaction ID information |
56 | | -- Monitors transaction wraparound risk |
57 | | -- Shows age of databases and tables |
58 | | - |
59 | | -**connection_pools.sql** |
60 | | -- Monitors connection pooling health and efficiency |
61 | | -- Analyzes connection patterns and potential leaks |
62 | | -- Provides connection pool optimization recommendations |
63 | | -- Works with PgBouncer, Pgpool-II, and native connections |
64 | | - |
65 | | -### 🔧 Maintenance Scripts |
66 | | -**switch_pg_wal_file.sql** |
67 | | -- Forces WAL file switching |
68 | | -- Useful for archiving and backup operations |
69 | | -- Requires superuser privileges |
70 | | - |
71 | | -**walfile_in_use.sql** |
72 | | -- Shows currently active WAL files |
73 | | -- Displays WAL file location and size |
74 | | -- Helps troubleshoot disk space issues |
75 | | - |
76 | | -**Transaction Wraparound** |
77 | | -- Scripts for monitoring and preventing transaction ID wraparound |
78 | | -- Critical for database availability |
79 | | - |
80 | | -### 🤖 Maintenance Automation |
81 | | -**auto_maintenance.sh** |
82 | | -- Comprehensive automated maintenance operations (VACUUM, ANALYZE, REINDEX) |
83 | | -- Intelligent threshold-based maintenance with configurable parameters |
84 | | -- Parallel processing with safety controls and dry-run mode |
85 | | -- Large table detection and resource management |
86 | | - |
87 | | -**maintenance_scheduler.sql** |
88 | | -- Analysis and scheduling recommendations for maintenance operations |
89 | | -- VACUUM/ANALYZE candidate identification with workload estimation |
90 | | -- Index bloat analysis and autovacuum effectiveness assessment |
91 | | -- Maintenance planning and resource optimization |
92 | | - |
93 | | -**statistics_collector.sql** |
94 | | -- Table and index statistics analysis and optimization |
95 | | -- Statistics quality assessment and freshness analysis |
96 | | -- Column distribution analysis with optimization recommendations |
97 | | -- Extended statistics support for PostgreSQL 15+ |
98 | | - |
99 | | -### 👤 Administration Scripts |
100 | | -**extensions.sql** |
101 | | -- Lists installed PostgreSQL extensions |
102 | | -- Shows extension versions and schemas |
103 | | -- Helps audit database capabilities |
104 | | - |
105 | | -**table_ownership.sql** |
106 | | -- Shows table ownership information |
107 | | -- Useful for permission audits |
108 | | -- Helps with database migrations |
109 | | - |
110 | | -**ForeignConst.sql** |
111 | | -- Lists foreign key constraints |
112 | | -- Shows constraint details and relationships |
113 | | -- Aids in schema documentation |
114 | | - |
115 | | -**NonHypertables.sql** |
116 | | -- Identifies non-hypertables (TimescaleDB specific) |
117 | | -- Useful for TimescaleDB users |
118 | | -- Helps in migration planning |
119 | | - |
120 | | -**partition_management.sql** |
121 | | -- Monitors partition health and performance |
122 | | -- Analyzes partition size distribution and balance |
123 | | -- Provides partition maintenance recommendations |
124 | | -- Supports automated partition management strategies |
125 | | - |
126 | | -### ⚡ Optimization Scripts |
127 | | -**hot_update_optimization_checklist.sql** |
128 | | -- Checks HOT (Heap-Only Tuple) update optimization |
129 | | -- Identifies inefficient table structures |
130 | | -- Suggests fillfactor adjustments |
131 | | - |
132 | | -**missing_indexes.sql** |
133 | | -- Identifies potentially beneficial indexes based on query patterns |
134 | | -- Analyzes sequential scan activity and unused indexes |
135 | | -- Detects foreign key columns missing indexes |
136 | | -- Provides index optimization recommendations |
137 | | - |
138 | | -### 📦 Backup & Recovery Scripts |
139 | | -**backup_validation.sql** |
140 | | -- Validates backup completeness and integrity |
141 | | -- Checks WAL archiving status and health |
142 | | -- Analyzes backup readiness and configuration |
143 | | -- Provides backup strategy recommendations |
144 | | - |
145 | | -### 🔒 Security Scripts |
146 | | -**permission_audit.sql** |
147 | | -- Comprehensive security audit of roles and permissions |
148 | | -- Identifies overprivileged accounts and security risks |
149 | | -- Analyzes database, schema, and table-level access |
150 | | -- Reviews authentication and Row Level Security (RLS) |
151 | | - |
152 | | -### ⚡ Performance Analysis Scripts |
153 | | -**wait_event_analysis.sql** |
154 | | -- Comprehensive analysis of PostgreSQL wait events and performance bottlenecks |
155 | | -- Identifies I/O, locking, and resource contention issues |
156 | | -- Provides detailed wait event categorization and recommendations |
157 | | -- Analyzes connection pooling and background worker efficiency |
158 | | - |
159 | | -**query_performance_profiler.sql** |
160 | | -- Detailed query performance analysis using pg_stat_statements |
161 | | -- Identifies slow queries, I/O intensive operations, and resource usage |
162 | | -- Analyzes query variance and performance degradation patterns |
163 | | -- Provides optimization recommendations for query tuning |
164 | | - |
165 | | -**resource_monitoring.sql** |
166 | | -- Comprehensive system resource utilization monitoring |
167 | | -- Analyzes memory, I/O, connection, and storage usage patterns |
168 | | -- Monitors autovacuum activity and maintenance requirements |
169 | | -- Provides resource optimization recommendations |
170 | | - |
171 | | -### ⚙️ Configuration Management Scripts |
172 | | -**configuration_analysis.sql** |
173 | | -- Comprehensive PostgreSQL configuration analysis and recommendations |
174 | | -- Reviews memory, connection, WAL, and security settings |
175 | | -- Analyzes current parameters against best practices |
176 | | -- Provides workload-specific tuning suggestions |
177 | | - |
178 | | -**parameter_tuner.sh** (automation/configuration/) |
179 | | -- Interactive PostgreSQL parameter tuning assistant |
180 | | -- Generates optimized configurations for different workload types (OLTP, OLAP, Web) |
181 | | -- Provides memory and performance setting recommendations |
182 | | -- Supports configuration validation and analysis modes |
183 | | - |
184 | | -### 🔗 Integration Tools |
185 | | -**grafana_dashboard_generator.sh** (integration/) |
186 | | -- Generates comprehensive Grafana dashboards for PostgreSQL monitoring |
187 | | -- Supports multiple dashboard types: comprehensive, performance, security, connections |
188 | | -- Provides direct Grafana API integration for automatic dashboard deployment |
189 | | -- Creates customizable monitoring visualizations |
190 | | - |
191 | | -**prometheus_exporter.sh** (integration/) |
192 | | -- Custom PostgreSQL metrics exporter for Prometheus |
193 | | -- Exports database statistics, connection metrics, and performance data |
194 | | -- Supports daemon mode for continuous metrics collection |
195 | | -- Provides HTTP endpoint for Prometheus scraping |
196 | | - |
197 | | -### 🩺 Troubleshooting Scripts |
198 | | -**postgres_troubleshooting_queries.sql** |
199 | | -- Collection of diagnostic queries |
200 | | -- Quick health checks |
201 | | -- Performance analysis queries |
202 | | - |
203 | | -**postgres_troubleshooting_query_pack_01.sql** |
204 | | -- First pack of troubleshooting queries |
205 | | -- Focuses on basic diagnostics |
206 | | - |
207 | | -**postgres_troubleshooting_query_pack_02.sql** |
208 | | -- Second pack of troubleshooting queries |
209 | | -- Intermediate level diagnostics |
210 | | - |
211 | | -**postgres_troubleshooting_query_pack_03.sql** |
212 | | -- Third pack of troubleshooting queries |
213 | | -- Advanced diagnostics |
214 | | - |
215 | | -**postgres_troubleshooting_cheat_sheet.txt** |
216 | | -- Quick reference guide |
217 | | -- Common commands and queries |
218 | | -- Best practices and tips |
219 | | - |
220 | | -## Usage Examples |
221 | | -### Check for blocking queries |
222 | | -```bash |
223 | | -psql -U postgres -d mydb -f monitoring/postgres_locking_blocking.sql |
224 | | -``` |
225 | | -### Monitor replication lag |
226 | | -```bash |
227 | | -psql -U postgres -d mydb -f monitoring/replication.sql |
228 | | -``` |
229 | | -### Identify bloated tables |
| 1 | +# pgtools: The First Responder's Toolbelt for PostgreSQL |
| 2 | + |
| 3 | +`pgtools` is a curated collection of safe, read-only diagnostic scripts for PostgreSQL and TimescaleDB, wrapped in a simple command-line interface. It is designed for Support Engineers, DBAs, and developers who need to triage production database issues quickly and without causing harm. |
| 4 | + |
| 5 | +Every script is executed with strict, short timeouts to ensure that diagnostic queries never impact a heavily loaded system. |
| 6 | + |
| 7 | +## Core Principles |
| 8 | + |
| 9 | +- **Zero-Harm Policy**: Every script is read-only and executed with a 5-second `statement_timeout` and 3-second `lock_timeout`. |
| 10 | +- **No Dependencies**: The toolbelt relies only on `bash` and `psql`. No Python, Go, or other complex dependencies are required. |
| 11 | +- **Ticket-Ready Output**: All output is formatted for easy copy-pasting into Zendesk, Jira, or Markdown documents. |
| 12 | +- **Community-Driven**: Built for general PostgreSQL users, with specialized diagnostics for TimescaleDB. |
| 13 | + |
| 14 | +## Getting Started |
| 15 | + |
| 16 | +### Installation |
| 17 | + |
230 | 18 | ```bash |
231 | | -psql -U postgres -d mydb -f monitoring/bloating.sql |
| 19 | +# 1. Clone the repository |
| 20 | +git clone <https://github.com/thepostgresguy/pgtools.git> |
| 21 | +cd pgtools |
| 22 | + |
| 23 | +# 2. Make the wrapper script executable |
| 24 | +chmod +x pgtools.sh |
232 | 25 | ``` |
233 | | -### Check transaction wraparound risk |
| 26 | + |
| 27 | +### Usage |
| 28 | + |
| 29 | +All commands are run through the `pgtools.sh` wrapper. |
| 30 | + |
234 | 31 | ```bash |
235 | | -psql -U postgres -d mydb -f monitoring/txid.sql |
| 32 | +./pgtools.sh <command> "<connection_string>" |
236 | 33 | ``` |
237 | | -### Validate backup readiness |
| 34 | + |
| 35 | +**Example: Check for blocking locks** |
238 | 36 | ```bash |
239 | | -psql -U postgres -d mydb -f backup/backup_validation.sql |
| 37 | +./pgtools.sh locks "postgresql://user:pass@host:port/dbname" |
240 | 38 | ``` |
241 | | -### Analyze connection pooling efficiency |
| 39 | + |
| 40 | +**Example: Check TimescaleDB chunk stats using a service name** |
242 | 41 | ```bash |
243 | | -psql -U postgres -d mydb -f monitoring/connection_pools.sql |
| 42 | +./pgtools.sh chunk-stats "service=my_customer_db" |
244 | 43 | ``` |
245 | 44 |
|
246 | | -### Automation / HOT report verification |
247 | | -```bash |
248 | | -# Quick automation sanity check (connection, syntax, permissions) |
249 | | -./automation/test_pgtools.sh --fast |
| 45 | +## Available Commands |
250 | 46 |
|
251 | | -# Full automation suite with integration tests |
252 | | -./automation/test_pgtools.sh --full --verbose |
| 47 | +Run `./pgtools.sh` with no arguments to see the full list of commands. |
253 | 48 |
|
254 | | -# HOT checklist JSON validation |
255 | | -./automation/run_hot_update_report.sh --format json --database my_database --stdout |
| 49 | +### General Diagnostics |
256 | 50 |
|
257 | | -# HOT checklist text validation |
258 | | -./automation/run_hot_update_report.sh --format text --database my_database --stdout |
| 51 | +* `locks`: Show current lock contention and blocking queries. |
| 52 | +* `activity`: Display current query activity from `pg_stat_activity`. |
| 53 | +* `top-queries`: Show most time-consuming queries (requires `pg_stat_statements`). |
| 54 | +* `bloat`: Identify table and index bloat. |
| 55 | +* `replication`: Monitor replication lag and status. |
| 56 | +* `disk-usage`: Show disk usage by table and index. |
| 57 | +* `cache-hit`: Show table and index cache hit rates. |
259 | 58 |
|
260 | | -# Full local pre-commit bundle |
261 | | -./scripts/precommit_checks.sh --database my_database |
262 | | -``` |
| 59 | +### TimescaleDB Diagnostics |
| 60 | + |
| 61 | +* `chunk-stats`: Show chunk count and size per hypertable. |
| 62 | +* `compression-stats`: Show compression ratio and job status per hypertable. |
| 63 | +* `cagg-stats`: Show continuous aggregate health and refresh policy status. |
| 64 | +* `job-errors`: Show recent errors from background jobs. |
| 65 | +* `uncompressed-chunks`: Show chunks that are old but not compressed. |
| 66 | + |
| 67 | +### Administration |
263 | 68 |
|
264 | | -## Script Categories |
| 69 | +* `permissions`: Audit user and role permissions. |
| 70 | +* `ownership`: Display table and object ownership. |
265 | 71 |
|
266 | | -- **Monitoring** - Database health, locks, replication, bloating |
267 | | -- **Maintenance** - VACUUM, ANALYZE, statistics collection |
268 | | -- **Automation** - Health checks, scheduling, alerting |
269 | | -- **Administration** - Extensions, ownership, constraints, partitions |
270 | | -- **Optimization** - Index recommendations, HOT updates, missing indexes |
271 | | -- **Performance** - Query profiling, wait events, resource monitoring |
272 | | -- **Security** - Permission audits, compliance checks |
273 | | -- **Troubleshooting** - Diagnostic queries and cheat sheets |
274 | | -- **Backup & Recovery** - Backup validation and integrity checks |
275 | | -- **Configuration** - Parameter tuning and analysis |
276 | | -- **Integration** - Grafana dashboards, Prometheus exporters |
| 72 | +## Incident Response Workflow Example |
| 73 | + |
| 74 | +A customer reports "the database is slow." Here's a typical triage flow using `pgtools`: |
| 75 | + |
| 76 | +1. **Check for blocking locks.** This is the most common cause of a sudden slowdown. |
| 77 | + ```bash |
| 78 | + ./pgtools.sh locks "<conn_string>" |
| 79 | + ``` |
| 80 | + |
| 81 | +2. **Check current activity.** See what queries are actively running or waiting. |
| 82 | + ```bash |
| 83 | + ./pgtools.sh activity "<conn_string>" |
| 84 | + ``` |
| 85 | + |
| 86 | +3. **Check top queries.** If `pg_stat_statements` is enabled, find out which queries are consuming the most database time historically. |
| 87 | + ```bash |
| 88 | + ./pgtools.sh top-queries "<conn_string>" |
| 89 | + ``` |
| 90 | + |
| 91 | +4. **Check cache hit rate.** A low hit rate points to I/O bottlenecks. |
| 92 | + ```bash |
| 93 | + ./pgtools.sh cache-hit "<conn_string>" |
| 94 | + ``` |
277 | 95 |
|
278 | 96 | ## Contributing |
279 | 97 |
|
280 | | -Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines. |
| 98 | +Contributions are welcome! Please see CONTRIBUTING.md for detailed guidelines on how to add new diagnostic scripts. |
281 | 99 |
|
282 | 100 | ## License |
283 | 101 |
|
284 | | -See [LICENSE](LICENSE) file for details. |
285 | | - |
286 | | -## Support |
| 102 | +This project is licensed under the MIT License - see the LICENSE file for details. |
287 | 103 |
|
288 | | -For issues, questions, or contributions, please open an issue in the repository. |
0 commit comments