Skip to content

Commit 455f098

Browse files
mfvanekclaude
andauthored
Add detailed information for each check (#860)
* Complete russian doc for unused_indexes check Add reproduction script and "how to fix" section for the unused_indexes check (russian version only). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add english doc for unused_indexes check Translate the russian unused_indexes doc to english. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Complete russian doc for tables_without_primary_key check Add reproduction script and "how to fix" section for the tables_without_primary_key check (russian version only). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add "How to fix" section for intersected_indexes check Fill in the Russian "Как исправить" section and add the English doc/eng/intersected_indexes.md translation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add english doc for invalid_indexes check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add reproduction script and docs for not_valid_constraints check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add How to fix section and english doc for objects_not_following_naming_convention check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Complete docs for possible_object_name_overflow check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Complete docs for primary_keys_that_most_likely_natural_keys check Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add english translations for remaining check docs Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent d72e3e7 commit 455f098

31 files changed

Lines changed: 2283 additions & 3 deletions

doc/eng/bloated_tables.md

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# Check for table bloat
2+
3+
## Why tables become bloated
4+
5+
Frequent UPDATE and DELETE operations can cause a noticeable increase in table size,
6+
because old row versions [are not removed immediately](https://www.postgresql.org/docs/17/routine-vacuuming.html).
7+
Non-blocking cleanup marks these obsolete versions as deleted, and they can later be reused for adding new rows,
8+
but the physical space is returned to the system only if these deleted rows were at the end of the table.
9+
10+
## Why you should keep an eye on table bloat
11+
12+
Although obsolete records are gradually processed by the autovacuum daemon, the table size will remain too large and the table sparse.
13+
This leads to decreased performance, because scanning the table becomes slower.
14+
Therefore, it is important to track sharp changes in table size if the data is updated frequently.
15+
Data about too rapid growth of a table's size may also indicate that autovacuum is configured incorrectly and these settings need to be changed.
16+
17+
## SQL query
18+
19+
- [bloated_tables.sql](https://github.com/mfvanek/pg-index-health-sql/blob/master/sql/bloated_tables.sql)
20+
21+
## Check type
22+
23+
- **runtime** (requires accumulated statistics)
24+
25+
## Support for partitioned tables
26+
27+
Partitioned tables are supported. The bloat percentage is calculated for each partition separately.
28+
29+
## How this check works
30+
31+
To run the query, the user needs read permissions on the tables being checked.
32+
33+
### Principle of operation
34+
35+
A SQL query is executed against the tables of the `pg_catalog` system schema. They contain statistical information about the main objects:
36+
tables, indexes, columns.
37+
38+
First, the query gathers data about the tables. It checks whether statistics are available for the table.
39+
40+
Then, based on this data, it determines the size of a single tuple and the total number of pages used by the table.
41+
Next, it estimates the number of pages that the table should use, and compares it with the actual number of pages.
42+
Finally, it calculates the table's bloat in bytes (the difference in pages multiplied by the block size) and as a percentage.
43+
If it exceeds the specified value (10% by default), the table is considered bloated.
44+
45+
## Reproduction script
46+
47+
```sql
48+
create schema if not exists demo;
49+
50+
-- For ordinary (non-partitioned) tables
51+
52+
create table if not exists demo.orders(
53+
id bigint primary key generated always as identity,
54+
user_id bigint not null,
55+
shop_id bigint not null,
56+
status int not null,
57+
created_at timestamptz not null default current_timestamp
58+
);
59+
60+
create table if not exists demo.order_item(
61+
id bigint primary key generated always as identity,
62+
order_id bigint not null references demo.orders (id),
63+
price decimal(22, 2) not null default 0,
64+
amount int not null default 0,
65+
sku varchar(255) not null,
66+
warehouse_id int
67+
);
68+
69+
create index if not exists idx_order_item_order_id
70+
on demo.order_item (order_id);
71+
72+
create index if not exists idx_order_item_warehouse_id_without_nulls
73+
on demo.order_item (warehouse_id) where warehouse_id is not null;
74+
75+
-- Populating with data
76+
77+
insert into demo.orders (user_id, shop_id, status)
78+
select
79+
(ids.id % 10) + 1 as user_id,
80+
(ids.id % 4) + 1 as shop_id,
81+
1 as status
82+
from generate_series(1, 10000) ids (id);
83+
84+
insert into demo.order_item (order_id, price, amount, sku)
85+
select
86+
id as order_id,
87+
(random() + 1) * 1000.0 as price,
88+
(random() * 10) + 1 as amount,
89+
md5(random()::text) as sku
90+
from demo.orders;
91+
92+
insert into demo.order_item (order_id, price, amount, sku)
93+
select
94+
id as order_id,
95+
(random() + 1) * 2000.0 as price,
96+
(random() * 5) + 1 as amount,
97+
md5((random() + 1)::text) as sku
98+
from demo.orders where id % 2 = 0;
99+
100+
-- collect statistics
101+
vacuum analyze demo.orders, demo.order_item;
102+
103+
-- update the status of several orders
104+
update demo.orders
105+
set status = 2 -- paid order
106+
where
107+
status = 1 -- new order
108+
and id in (
109+
select id from demo.orders where id % 4 = 0 order by id limit 10000);
110+
111+
update demo.order_item
112+
set warehouse_id = case when order_id % 8 = 0 then 1 else 2 end
113+
where
114+
warehouse_id is null
115+
and order_id in (
116+
select id from demo.orders
117+
where
118+
status = 2
119+
and created_at >= current_timestamp - interval '1 day');
120+
121+
-- collect statistics
122+
vacuum analyze demo.orders, demo.order_item;
123+
124+
-- For partitioned tables
125+
126+
create table if not exists demo.orders_partitioned(
127+
id bigint not null generated always as identity,
128+
user_id bigint not null,
129+
shop_id bigint not null,
130+
status int not null,
131+
created_at timestamptz not null default current_timestamp,
132+
primary key (id, created_at)
133+
) partition by range (created_at);
134+
135+
create table if not exists demo.orders_default
136+
partition of demo.orders_partitioned default;
137+
138+
create table if not exists demo.order_item_partitioned(
139+
id bigint generated always as identity,
140+
order_id bigint not null,
141+
created_at timestamptz not null,
142+
price decimal(22, 2) not null default 0,
143+
amount int not null default 0,
144+
sku varchar(255) not null,
145+
warehouse_id int,
146+
primary key (id, created_at),
147+
constraint fk_order_item_order_id foreign key (order_id, created_at)
148+
references demo.orders_partitioned (id, created_at)
149+
) partition by range (created_at);
150+
151+
create index if not exists idx_order_item_partitioned_order_id
152+
on demo.order_item_partitioned (order_id);
153+
154+
create index if not exists idx_order_item_partitioned_warehouse_id_without_nulls
155+
on demo.order_item_partitioned (warehouse_id) where warehouse_id is not null;
156+
157+
create table if not exists demo.order_item_default
158+
partition of demo.order_item_partitioned default;
159+
160+
-- Populating with data
161+
162+
insert into demo.orders_partitioned (user_id, shop_id, status)
163+
select (ids.id % 10) + 1 as user_id,
164+
(ids.id % 4) + 1 as shop_id,
165+
1 as status
166+
from generate_series(1, 10000) ids (id);
167+
168+
insert into demo.order_item_partitioned (order_id, created_at, price, amount, sku)
169+
select id as order_id, created_at,
170+
(random() + 1) * 1000.0 as price,
171+
(random() * 10) + 1 as amount,
172+
md5(random()::text) as sku
173+
from demo.orders_partitioned;
174+
175+
insert into demo.order_item_partitioned (order_id, created_at, price, amount, sku)
176+
select id as order_id, created_at,
177+
(random() + 1) * 2000.0 as price,
178+
(random() * 5) + 1 as amount,
179+
md5((random() + 1)::text) as sku
180+
from demo.orders_partitioned
181+
where id % 2 = 0;
182+
183+
-- collect statistics
184+
vacuum analyze demo.orders_partitioned, demo.order_item_partitioned;
185+
186+
-- update the status of several orders
187+
update demo.orders_partitioned
188+
set status = 2 -- paid order
189+
where status = 1 -- new order
190+
and id in (select id
191+
from demo.orders_partitioned
192+
where id % 4 = 0
193+
order by id
194+
limit 10000);
195+
196+
update demo.order_item_partitioned
197+
set warehouse_id = case when order_id % 8 = 0 then 1 else 2 end
198+
where warehouse_id is null
199+
and order_id in (select id
200+
from demo.orders_partitioned
201+
where status = 2
202+
and created_at >= current_timestamp - interval '1 day');
203+
204+
-- collect statistics
205+
vacuum analyze demo.orders_partitioned, demo.order_item_partitioned;
206+
```
207+
208+
## How to fix
209+
210+
1. Regularly run cleanup (vacuum). Make sure autovacuum is working and its parameters are configured properly.
211+
Cleanup allows space on pages to be freed efficiently and reused for new row versions.
212+
2. If a particular table in the database can be locked for a long time (from several minutes to several hours depending on the table size),
213+
then it is acceptable to fully rebuild the table with the [vacuum full](https://www.postgresql.org/docs/18/sql-vacuum.html) command.
214+
This completely removes the bloat and frees up disk space.
215+
3. If a long lock and the associated downtime are not acceptable,
216+
then consider using the [pg_repack](https://github.com/reorg/pg_repack) extension.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Check for b-tree indexes on columns containing an array of values
2+
3+
## How a b-tree index works on columns with an array of values
4+
5+
A b-tree index on such columns is efficient if you need to compare arrays as a whole,
6+
since it [works with equality conditions](https://www.postgresql.org/docs/17/gin.html).
7+
If you need to check whether elements are contained in the array, it is no longer suitable.
8+
9+
## Why a GIN index is a better fit
10+
11+
A GIN index is implemented as a B-tree built on keys — the elements of the array, [see the documentation for details](https://www.postgresql.org/docs/17/gin.html).
12+
Therefore, it is suitable when you need to compare elements of an array in columns of the array type.
13+
14+
## SQL query
15+
16+
- [btree_indexes_on_array_columns.sql](https://github.com/mfvanek/pg-index-health-sql/blob/master/sql/btree_indexes_on_array_columns.sql)
17+
18+
## Check type
19+
20+
- **static** (can be performed on an empty database in component/integration tests)
21+
22+
## Support for partitioned tables
23+
24+
Partitioned tables are supported.
25+
The check is performed on the partitioned (parent) table itself. Individual partitions (children) are ignored.
26+
27+
## Reproduction script
28+
29+
```sql
30+
create schema if not exists demo;
31+
32+
create table if not exists demo."table_with_b-tree_index_on_array"(
33+
id bigint not null,
34+
login text,
35+
roles text[]
36+
);
37+
38+
create index if not exists roles_btree_idx
39+
on demo."table_with_b-tree_index_on_array"(roles) where roles is not null;
40+
41+
create index if not exists login_roles_btree_idx
42+
on demo."table_with_b-tree_index_on_array"(login, roles);
43+
44+
create table if not exists demo."table_with_b-tree_index_on_array_partitioned"(
45+
id bigint not null,
46+
login text,
47+
roles text[]
48+
) partition by hash (login);
49+
50+
create index if not exists roles_btree_partitioned_idx
51+
on demo."table_with_b-tree_index_on_array_partitioned"(roles) where roles is not null;
52+
53+
create index if not exists login_roles_btree_partitioned_idx
54+
on demo."table_with_b-tree_index_on_array_partitioned"(login, roles);
55+
56+
create table if not exists demo."table_with_b-tree_index_on_array_hash_p0"
57+
partition of demo."table_with_b-tree_index_on_array_partitioned"
58+
for values with (modulus 4, remainder 0);
59+
```
60+
61+
## How to fix
62+
63+
Consider using a GIN index.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Check for columns whose names do not follow the naming convention
2+
3+
The check finds names of columns in database tables that need to be escaped with double quotes in SQL queries.
4+
5+
## Why you should pay attention to this
6+
7+
- [Naming convention](https://www.postgresql.org/docs/17/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS)
8+
9+
You should avoid column names that require wrapping in double quotes.
10+
This is inconvenient and can lead to [non-obvious errors](https://lerner.co.il/2013/11/30/quoting-postgresql/).
11+
See also [wiki.postgresql.org](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_upper_case_table_or_column_names).
12+
13+
## SQL query
14+
15+
- [columns_not_following_naming_convention.sql](https://github.com/mfvanek/pg-index-health-sql/blob/master/sql/columns_not_following_naming_convention.sql)
16+
17+
## Check type
18+
19+
- **static** (can be performed on an empty database in component/integration tests)
20+
21+
## Support for partitioned tables
22+
23+
Partitioned tables are supported.
24+
The check is performed on the partitioned (parent) table itself. Individual partitions (children) are ignored.
25+
26+
## Reproduction script
27+
28+
```sql
29+
create schema if not exists "bad-demo";
30+
31+
create table if not exists "bad-demo"."bad-table"(
32+
"bad-id" serial not null primary key
33+
);
34+
35+
create table if not exists "bad-demo"."bad-table-two"(
36+
"bad-ref-id" int not null primary key,
37+
description text
38+
);
39+
40+
create table if not exists "bad-demo"."one-partitioned"(
41+
"bad-id" bigserial not null primary key
42+
) partition by range ("bad-id");
43+
44+
create table if not exists "bad-demo"."one-default"
45+
partition of "bad-demo"."one-partitioned" default;
46+
```
47+
48+
## How to fix
49+
50+
Carefully rename the columns and bring their names into line with the naming convention.
51+
If your database runs online and downtime is not acceptable,
52+
then instead of renaming a column use the [approach of creating a new column](https://habr.com/ru/companies/karuna/articles/568240/)
53+
and gradually switching over to using it.

0 commit comments

Comments
 (0)