mfvanek
diff --git a/‎doc/eng/bloated_tables.md‎
Lines changed: 216 additions & 0 deletions b/‎doc/eng/bloated_tables.md‎
Lines changed: 216 additions & 0 deletions
diff --git a/‎doc/eng/btree_indexes_on_array_columns.md‎
Lines changed: 63 additions & 0 deletions b/‎doc/eng/btree_indexes_on_array_columns.md‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎doc/eng/columns_not_following_naming_convention.md‎
Lines changed: 53 additions & 0 deletions b/‎doc/eng/columns_not_following_naming_convention.md‎
Lines changed: 53 additions & 0 deletions
@@ -0,0 +1,216 @@
+# Check for table bloat
+
+## Why tables become bloated
+
+Frequent UPDATE and DELETE operations can cause a noticeable increase in table size,
+because old row versions [are not removed immediately](https://www.postgresql.org/docs/17/routine-vacuuming.html).
+Non-blocking cleanup marks these obsolete versions as deleted, and they can later be reused for adding new rows,
+but the physical space is returned to the system only if these deleted rows were at the end of the table.
+
+## Why you should keep an eye on table bloat
+
+Although obsolete records are gradually processed by the autovacuum daemon, the table size will remain too large and the table sparse.
+This leads to decreased performance, because scanning the table becomes slower.
+Therefore, it is important to track sharp changes in table size if the data is updated frequently.
+Data about too rapid growth of a table's size may also indicate that autovacuum is configured incorrectly and these settings need to be changed.
+
+## SQL query
+
+- [bloated_tables.sql](https://github.com/mfvanek/pg-index-health-sql/blob/master/sql/bloated_tables.sql)
+
+## Check type
+
+- **runtime** (requires accumulated statistics)
+
+## Support for partitioned tables
+
+Partitioned tables are supported. The bloat percentage is calculated for each partition separately.
+
+## How this check works
+
+To run the query, the user needs read permissions on the tables being checked.
+
+### Principle of operation
+
+A SQL query is executed against the tables of the `pg_catalog` system schema. They contain statistical information about the main objects:
+tables, indexes, columns.
+
+First, the query gathers data about the tables. It checks whether statistics are available for the table.
+
+Then, based on this data, it determines the size of a single tuple and the total number of pages used by the table.
+Next, it estimates the number of pages that the table should use, and compares it with the actual number of pages.
+Finally, it calculates the table's bloat in bytes (the difference in pages multiplied by the block size) and as a percentage.
+If it exceeds the specified value (10% by default), the table is considered bloated.
+
+## Reproduction script
+
+```sql
+create schema if not exists demo;
+
+-- For ordinary (non-partitioned) tables
+
+create table if not exists demo.orders(
+    id bigint primary key generated always as identity,
+    user_id bigint not null,
+    shop_id bigint not null,
+    status int not null,
+    created_at timestamptz not null default current_timestamp
+);
+
+create table if not exists demo.order_item(
+    id bigint primary key generated always as identity,
+    order_id bigint not null references demo.orders (id),
+    price decimal(22, 2) not null default 0,
+    amount int not null default 0,
+    sku varchar(255) not null,
+    warehouse_id int
+);
+
+create index if not exists idx_order_item_order_id
+    on demo.order_item (order_id);
+
+create index if not exists idx_order_item_warehouse_id_without_nulls
+    on demo.order_item (warehouse_id) where warehouse_id is not null;
+
+-- Populating with data
+
+insert into demo.orders (user_id, shop_id, status)
+select
+    (ids.id % 10) + 1 as user_id,
+    (ids.id % 4) + 1 as shop_id,
+    1 as status
+from generate_series(1, 10000) ids (id);
+
+insert into demo.order_item (order_id, price, amount, sku)
+select
+    id as order_id,
+    (random() + 1) * 1000.0 as price,
+    (random() * 10) + 1 as amount,
+    md5(random()::text) as sku
+from demo.orders;
+
+insert into demo.order_item (order_id, price, amount, sku)
+select
+    id as order_id,
+    (random() + 1) * 2000.0 as price,
+    (random() * 5) + 1 as amount,
+    md5((random() + 1)::text) as sku
+from demo.orders where id % 2 = 0;
+
+-- collect statistics
+vacuum analyze demo.orders, demo.order_item;
+
+-- update the status of several orders
+update demo.orders
+set status = 2 -- paid order
+where
+    status = 1 -- new order
+  and id in (
+    select id from demo.orders where id % 4 = 0 order by id limit 10000);
+
+update demo.order_item
+set warehouse_id = case when order_id % 8 = 0 then 1 else 2 end
+where
+    warehouse_id is null
+  and order_id in (
+    select id from demo.orders
+    where
+        status = 2
+      and created_at >= current_timestamp - interval '1 day');
+
+-- collect statistics
+vacuum analyze demo.orders, demo.order_item;
+
+-- For partitioned tables
+
+create table if not exists demo.orders_partitioned(
+    id         bigint not null generated always as identity,
+    user_id    bigint      not null,
+    shop_id    bigint      not null,
+    status     int         not null,
+    created_at timestamptz not null default current_timestamp,
+    primary key (id, created_at)
+) partition by range (created_at);
+
+create table if not exists demo.orders_default
+    partition of demo.orders_partitioned default;
+
+create table if not exists demo.order_item_partitioned(
+    id           bigint generated always as identity,
+    order_id     bigint         not null,
+    created_at   timestamptz    not null,
+    price        decimal(22, 2) not null default 0,
+    amount       int            not null default 0,
+    sku          varchar(255)   not null,
+    warehouse_id int,
+    primary key (id, created_at),
+    constraint fk_order_item_order_id foreign key (order_id, created_at)
+        references demo.orders_partitioned (id, created_at)
+) partition by range (created_at);
+
+create index if not exists idx_order_item_partitioned_order_id
+    on demo.order_item_partitioned (order_id);
+
+create index if not exists idx_order_item_partitioned_warehouse_id_without_nulls
+    on demo.order_item_partitioned (warehouse_id) where warehouse_id is not null;
+
+create table if not exists demo.order_item_default
+    partition of demo.order_item_partitioned default;
+
+-- Populating with data
+
+insert into demo.orders_partitioned (user_id, shop_id, status)
+select (ids.id % 10) + 1 as user_id,
+       (ids.id % 4) + 1  as shop_id,
+       1                 as status
+from generate_series(1, 10000) ids (id);
+
+insert into demo.order_item_partitioned (order_id, created_at, price, amount, sku)
+select id as order_id, created_at,
+       (random() + 1) * 1000.0 as price,
+       (random() * 10) + 1     as amount,
+       md5(random()::text)     as sku
+from demo.orders_partitioned;
+
+insert into demo.order_item_partitioned (order_id, created_at, price, amount, sku)
+select id as order_id, created_at,
+       (random() + 1) * 2000.0   as price,
+       (random() * 5) + 1        as amount,
+       md5((random() + 1)::text) as sku
+from demo.orders_partitioned
+where id % 2 = 0;
+
+-- collect statistics
+vacuum analyze demo.orders_partitioned, demo.order_item_partitioned;
+
+-- update the status of several orders
+update demo.orders_partitioned
+set status = 2 -- paid order
+where status = 1 -- new order
+  and id in (select id
+             from demo.orders_partitioned
+             where id % 4 = 0
+             order by id
+             limit 10000);
+
+update demo.order_item_partitioned
+set warehouse_id = case when order_id % 8 = 0 then 1 else 2 end
+where warehouse_id is null
+  and order_id in (select id
+                   from demo.orders_partitioned
+                   where status = 2
+                     and created_at >= current_timestamp - interval '1 day');
+
+-- collect statistics
+vacuum analyze demo.orders_partitioned, demo.order_item_partitioned;
+```
+
+## How to fix
+
+1. Regularly run cleanup (vacuum). Make sure autovacuum is working and its parameters are configured properly.
+   Cleanup allows space on pages to be freed efficiently and reused for new row versions.
+2. If a particular table in the database can be locked for a long time (from several minutes to several hours depending on the table size),
+   then it is acceptable to fully rebuild the table with the [vacuum full](https://www.postgresql.org/docs/18/sql-vacuum.html) command.
+   This completely removes the bloat and frees up disk space.
+3. If a long lock and the associated downtime are not acceptable,
+   then consider using the [pg_repack](https://github.com/reorg/pg_repack) extension.
@@ -0,0 +1,63 @@
+# Check for b-tree indexes on columns containing an array of values
+
+## How a b-tree index works on columns with an array of values
+
+A b-tree index on such columns is efficient if you need to compare arrays as a whole,
+since it [works with equality conditions](https://www.postgresql.org/docs/17/gin.html).
+If you need to check whether elements are contained in the array, it is no longer suitable.
+
+## Why a GIN index is a better fit
+
+A GIN index is implemented as a B-tree built on keys — the elements of the array, [see the documentation for details](https://www.postgresql.org/docs/17/gin.html).
+Therefore, it is suitable when you need to compare elements of an array in columns of the array type.
+
+## SQL query
+
+- [btree_indexes_on_array_columns.sql](https://github.com/mfvanek/pg-index-health-sql/blob/master/sql/btree_indexes_on_array_columns.sql)
+
+## Check type
+
+- **static** (can be performed on an empty database in component/integration tests)
+
+## Support for partitioned tables
+
+Partitioned tables are supported.
+The check is performed on the partitioned (parent) table itself. Individual partitions (children) are ignored.
+
+## Reproduction script
+
+```sql
+create schema if not exists demo;
+
+create table if not exists demo."table_with_b-tree_index_on_array"(
+    id bigint not null,
+    login text,
+    roles text[]
+);
+
+create index if not exists roles_btree_idx
+    on demo."table_with_b-tree_index_on_array"(roles) where roles is not null;
+    
+create index if not exists login_roles_btree_idx
+    on demo."table_with_b-tree_index_on_array"(login, roles);
+
+create table if not exists demo."table_with_b-tree_index_on_array_partitioned"(
+    id bigint not null,
+    login text,
+    roles text[]
+) partition by hash (login);
+
+create index if not exists roles_btree_partitioned_idx
+    on demo."table_with_b-tree_index_on_array_partitioned"(roles) where roles is not null;
+
+create index if not exists login_roles_btree_partitioned_idx
+    on demo."table_with_b-tree_index_on_array_partitioned"(login, roles);
+
+create table if not exists demo."table_with_b-tree_index_on_array_hash_p0"
+    partition of demo."table_with_b-tree_index_on_array_partitioned"
+    for values with (modulus 4, remainder 0);
+```
+
+## How to fix
+
+Consider using a GIN index.
@@ -0,0 +1,53 @@
+# Check for columns whose names do not follow the naming convention
+
+The check finds names of columns in database tables that need to be escaped with double quotes in SQL queries.
+
+## Why you should pay attention to this
+
+- [Naming convention](https://www.postgresql.org/docs/17/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS)
+
+You should avoid column names that require wrapping in double quotes.
+This is inconvenient and can lead to [non-obvious errors](https://lerner.co.il/2013/11/30/quoting-postgresql/).  
+See also [wiki.postgresql.org](https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_upper_case_table_or_column_names).
+
+## SQL query
+
+- [columns_not_following_naming_convention.sql](https://github.com/mfvanek/pg-index-health-sql/blob/master/sql/columns_not_following_naming_convention.sql)
+
+## Check type
+
+- **static** (can be performed on an empty database in component/integration tests)
+
+## Support for partitioned tables
+
+Partitioned tables are supported.
+The check is performed on the partitioned (parent) table itself. Individual partitions (children) are ignored.
+
+## Reproduction script
+
+```sql
+create schema if not exists "bad-demo";
+
+create table if not exists "bad-demo"."bad-table"(
+    "bad-id" serial not null primary key
+);
+
+create table if not exists "bad-demo"."bad-table-two"(
+    "bad-ref-id" int not null primary key,
+    description  text
+);
+
+create table if not exists "bad-demo"."one-partitioned"(
+    "bad-id" bigserial not null primary key
+) partition by range ("bad-id");
+
+create table if not exists "bad-demo"."one-default"
+    partition of "bad-demo"."one-partitioned" default;
+```
+
+## How to fix
+
+Carefully rename the columns and bring their names into line with the naming convention.  
+If your database runs online and downtime is not acceptable,
+then instead of renaming a column use the [approach of creating a new column](https://habr.com/ru/companies/karuna/articles/568240/)
+and gradually switching over to using it.