Skip to content
This repository was archived by the owner on Jun 14, 2026. It is now read-only.

Commit 340f1ac

Browse files
committed
Initial commit
0 parents  commit 340f1ac

419 files changed

Lines changed: 61289 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.act.secrets.example

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
GITHUB_TOKEN=
2+
CLOUDFLARE_API_TOKEN=

.actrc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
--container-architecture linux/amd64
2+
-P ubuntu-slim=ghcr.io/catthehacker/ubuntu:act-24.04
3+
-P ubuntu-24.04-arm=ghcr.io/catthehacker/ubuntu:act-24.04
4+
-P ubuntu-latest=ghcr.io/catthehacker/ubuntu:act-24.04
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
name: code-review
3+
description: You **MUST** use this after the implementation changes are validated and ready for review.
4+
---
5+
6+
**CRITICAL:** **Must** be executed by the `code-reviewer` subagent.
7+
8+
You **must** follow the steps below:
9+
10+
1. Review the code very carefully and look for issues without nit-picking.
11+
2. Ensure the changes follow existing coding practices and directory structures.
12+
3. Ensure perfect unit and integration test coverage for all possible scenarios.
13+
4. Ensure no security risks are introduced.
14+
5. Ensure no performance bottlenecks are introduced.
15+
6. Ensure no unnecessary code complexity is introduced.

.agents/skills/design-doc/SKILL.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
name: design-doc
3+
description: You **MUST** use this before implementing any new feature or making significant changes to the codebase. Not needed for small refactors, bug fixes, or minor tweaks.
4+
---
5+
6+
Should be executed by the `design-lead` subagent.
7+
8+
You **must** follow the steps below:
9+
10+
1. Always start by reading the original product brief
11+
`docs/brief/2026_01_31_tokenoverflow.md` and @README.md
12+
2. If provided, read the PRD carefully to understand the requirements.
13+
3. Run `source scripts/src/includes.sh` and
14+
`create_doc design <feature_name>` to create the design document.
15+
4. Find your design document under `docs/design/`.
16+
5. Read the template and understand the structure.
17+
6. Ask clarifying questions if you are not sure about anything.
18+
7. Before starting the design, identify what research you need to do and use
19+
subagents to do deep research online.
20+
8. Fill the design document section by section.
21+
9. Always provide multiple alternatives with clearly defined trade-offs in a
22+
table format. Also give examples of what they would look like in practice.
23+
10. After each section, ask for approval before moving to the next one.
24+
11. Make sure all PRD requirements are satisfied.
25+
12. Once done, ask for a review and keep repeating until you get approval.
26+
27+
List of guidelines you **must** follow:
28+
29+
- Prevent scope creep by sticking to the original requirements.
30+
- Never try to re-invent the wheel. Research best practices using subagents.
31+
- When introducing new libraries, always check if they are well-maintained.
32+
- When iterating, do not mention the changes made to previous iterations.
33+
- Ensure every design follows industry best practices without taking any
34+
shortcuts, or reaching for hacks.
35+
- Ensure the design respects the current architecture and coding standards of
36+
the codebase.
37+
- Use the latest version of dependencies unless there is a strong reason not to.
38+
- Do not edit historical design documents.
39+
40+
**CRITICAL:** Your work is not complete until you fully fill the design document
41+
on disk and save your changes. Do not leave an empty template.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
name: implement-design
3+
description: You **must** use this when implementing the code for an approved design document.
4+
---
5+
6+
**Must** be executed by the `engineer` subagent.
7+
8+
You are required to follow these guidelines:
9+
10+
- Stick to the approved design document requirements. Do **not** deviate.
11+
- Every line of code is a liability and **must** be justified.
12+
- You **must** always use TDD. Have a failing test first before writing code!
13+
- You **must** use three-tier test architecture:
14+
- `unit/`: Pure business logic tests with zero external dependencies
15+
- `integration/`: In-process integration tests with external dependencies
16+
- `e2e/`: Black-box testing of the whole system based on user stories
17+
- Never mix test with source code.
18+
- Test directories should mirror the structure of the source code directories.
19+
- You **must** never use shortcuts/hacks just to get your current task working.
20+
- Avoid writing single big files; prefer splitting into multiple.
21+
- Follow FCIS architectural pattern when implementing services:
22+
- Functional core: Unit testable business logic
23+
- Imperative shell: External dependencies like I/O, use integration tests
24+
- Always finish your work by using the `validate-changes` skill.
25+
- **NEVER** change the code coverage threshold!
26+
- Comment the why, not the what.
27+
- Do not introduce code duplication.
28+
29+
Once you're done, if asked to commit your changes. Otherwise, you should run:
30+
31+
```shell
32+
git add --all
33+
prek run
34+
```
35+
36+
**CRITICAL:** Your work is not done until every single pre-commit hook passes!
37+
E2E tests require full docker environment running, DON'T skip it!

.agents/skills/postgres/SKILL.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
name: postgres
3+
description: PostgreSQL best practices, query optimization, connection troubleshooting, and performance improvement. Load when working with Postgres databases.
4+
license: MIT
5+
metadata:
6+
author: planetscale
7+
version: "1.0.0"
8+
---
9+
10+
# Postgres
11+
12+
| Topic | Reference | Use for |
13+
|------------------------|--------------------------------------------------------------------------------|--------------------------------------------------------------|
14+
| Schema Design | [references/schema-design.md](./references/schema-design.md) | Tables, primary keys, data types, foreign keys |
15+
| Indexing | [references/indexing.md](./references/indexing.md) | Index types, composite indexes, performance |
16+
| Index Optimization | [references/index-optimization.md](./references/index-optimization.md) | Unused/duplicate index queries, index audit |
17+
| Partitioning | [references/partitioning.md](./references/partitioning.md) | Large tables, time-series, data retention |
18+
| Query Patterns | [references/query-patterns.md](./references/query-patterns.md) | SQL anti-patterns, JOINs, pagination, batch queries |
19+
| Optimization Checklist | [references/optimization-checklist.md](./references/optimization-checklist.md) | Pre-optimization audit, cleanup, readiness checks |
20+
| MVCC and VACUUM | [references/mvcc-vacuum.md](./references/mvcc-vacuum.md) | Dead tuples, long transactions, xid wraparound prevention |
21+
| Process Architecture | [references/process-architecture.md](./references/process-architecture.md) | Multi-process model, connection pooling, auxiliary processes |
22+
| Memory Architecture | [references/memory-management-ops.md](./references/memory-management-ops.md) | Shared/private memory layout, OS page cache, OOM prevention |
23+
| MVCC Transactions | [references/mvcc-transactions.md](./references/mvcc-transactions.md) | Isolation levels, XID wraparound, serialization errors |
24+
| WAL and Checkpoints | [references/wal-operations.md](./references/wal-operations.md) | WAL internals, checkpoint tuning, durability, crash recovery |
25+
| Replication | [references/replication.md](./references/replication.md) | Streaming replication, slots, sync commit, failover |
26+
| Storage Layout | [references/storage-layout.md](./references/storage-layout.md) | PGDATA structure, TOAST, fillfactor, tablespaces, disk mgmt |
27+
| Monitoring | [references/monitoring.md](./references/monitoring.md) | pg_stat views, logging, pg_stat_statements, host metrics |
28+
| Backup and Recovery | [references/backup-recovery.md](./references/backup-recovery.md) | pg_dump, pg_basebackup, PITR, WAL archiving, backup tools |
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: Backup and Recovery
3+
description: Logical/physical backups, PITR, WAL archiving, backup tools, and recovery strategies
4+
tags: postgres, backup, recovery, pitr, pg_dump, pg_basebackup, wal-archiving, operations
5+
---
6+
7+
# Backup and Recovery
8+
9+
**FUNDAMENTAL RULE: Backups are useless until you've successfully tested recovery.**
10+
11+
## Logical Backups (pg_dump)
12+
Exports as SQL or custom format; portable across PG versions and architectures. Formats: `-Fp` (plain SQL), `-Fc` (custom compressed, selective restore), `-Fd` (directory, parallel with `-j`), `-Ft` (tar, avoid). Use `-Fd -j 4` for large DBs. Restore: `pg_restore -d dbname file.dump`; add `-j` for parallel restore. Selective table restore: `pg_restore -t tablename`. Slow for large DBs; RPO = backup frequency (typically 24h).
13+
14+
## Physical Backups (pg_basebackup)
15+
Copies raw PGDATA; same major version and platform required; cross-architecture works if same endianness (e.g., x86_64 ↔ ARM64). Faster for large clusters; includes all databases. Flags: `-Ft -z -P` for compressed tar with progress. Manual alternative: `pg_backup_start()` → copy PGDATA → `pg_backup_stop()` (complex; must write returned `backup_label`).
16+
17+
## PITR (Point-in-Time Recovery)
18+
Requires base backup + continuous WAL archiving. Restores to any timestamp, transaction, or named restore point. Without PITR: restore only to backup time (potentially lose hours). With PITR: RPO = minutes. `archive_command` must return 0 ONLY when file is safely stored—premature 0 = data loss risk. `wal_level` must be `replica` or `logical` (not `minimal`).
19+
20+
## WAL Archiving
21+
`archive_mode=on`, `archive_command='test ! -f /archive/%f && cp %p /archive/%f'`. **Test archive command as postgres user** (not root) since permission issues are common. Monitor `pg_stat_archiver` for `failed_count`, `last_archived_time`. Archive failures prevent WAL recycling → disk fills.
22+
23+
## Tool Comparison
24+
| Tool | Use case |
25+
|------|----------|
26+
| pg_dump | Small DBs, migrations, selective restore |
27+
| pg_basebackup | Basic PITR, built-in |
28+
| pgBackRest | Production—parallel, incremental, S3/GCS/Azure, retention |
29+
| Barman | Enterprise PITR, retention policies |
30+
| WAL-G | Cloud-native, S3/GCS/Azure |
31+
32+
## RPO/RTO
33+
Logical only: RPO = backup interval (hours); RTO = hours. PITR: RPO = minutes; RTO = hours. Synchronous replication: RPO = 0; RTO = seconds to minutes (failover).
34+
35+
## Operational Rules
36+
- Verify integrity with `pg_verifybackup` (PG 13+)
37+
- Test recovery / PITR regularly
38+
- Take backups from standby to avoid impacting primary
39+
- Retention: 7 daily, 4 weekly, 12 monthly
40+
- Monitor archive growth and backup age
41+
- **Never assume backups work without testing**
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
title: Index Optimization Queries
3+
description: Index audit queries
4+
tags: postgres, indexes, unused-indexes, duplicate-indexes, optimization
5+
---
6+
7+
# Index Optimization
8+
9+
## Identify Unused Indexes
10+
11+
Query to find unused indexes:
12+
13+
```sql
14+
-- indexes with 0 scans (check pg_stat_reset / pg_postmaster_start_time first)
15+
SELECT
16+
s.schemaname,
17+
s.relname AS table_name,
18+
s.indexrelname AS index_name,
19+
pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size
20+
FROM pg_catalog.pg_stat_user_indexes s
21+
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
22+
WHERE s.idx_scan = 0
23+
AND 0 <> ALL (i.indkey) -- exclude expression indexes
24+
AND NOT i.indisunique -- exclude UNIQUE indexes
25+
AND NOT EXISTS ( -- exclude constraint-backing indexes
26+
SELECT 1 FROM pg_catalog.pg_constraint c
27+
WHERE c.conindid = s.indexrelid
28+
)
29+
ORDER BY pg_relation_size(s.indexrelid) DESC;
30+
```
31+
32+
## Indexes Per Table Guidelines
33+
34+
- **< 5**: Normal
35+
- **5-10**: Monitor (Verify necessity)
36+
- **> 10**: Audit required (High write overhead)
37+
38+
```sql
39+
SELECT relname AS table, count(*) as index_count
40+
FROM pg_stat_user_indexes
41+
GROUP BY relname
42+
ORDER BY count(*) DESC;
43+
```
44+
45+
## Identify Unused Indexes
46+
47+
Indexes with identical definitions (after normalizing names) on the same table are duplicates:
48+
49+
```sql
50+
SELECT
51+
schemaname || '.' || tablename AS table,
52+
array_agg(indexname) AS duplicate_indexes,
53+
pg_size_pretty(sum(pg_relation_size((schemaname || '.' || indexname)::regclass))) AS total_size
54+
FROM pg_indexes
55+
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
56+
GROUP BY schemaname, tablename,
57+
regexp_replace(indexdef, 'INDEX \S+ ON ', 'INDEX ON ')
58+
HAVING count(*) > 1;
59+
```
60+
61+
**Always confirm with a human before dropping or removing any indexes identified by the queries above.** Even indexes with 0 scans may be needed for infrequent but critical queries, and stats may have been reset recently.
62+
63+
## Per-table Index Count Guidelines
64+
65+
| Index Count | Recommendation |
66+
| ----------- | ------------------------------------------- |
67+
| <5 | Normal |
68+
| 5-10 | Review for unused/duplicates |
69+
| >10 | Audit required - significant write overhead |
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
title: Indexing Best Practices
3+
description: Index design guide
4+
tags: postgres, indexes, composite, partial, covering, gin, brin
5+
---
6+
7+
# Indexing Best Practices
8+
9+
## Core Rules
10+
11+
1. **Always index foreign key columns** — PostgreSQL does not auto-create these
12+
2. **Index columns in WHERE, JOIN, and ORDER BY** clauses
13+
3. **Don't over-index** — each index slows writes and uses storage
14+
4. **Verify with EXPLAIN ANALYZE** — confirm indexes are actually used
15+
16+
## Composite Indexes
17+
18+
Put equality columns first, then range/sort columns:
19+
20+
```sql
21+
-- WHERE status = 'active' AND created_at > '2026-01-01'
22+
CREATE INDEX order_status_created_idx ON order (status, created_at);
23+
```
24+
25+
A composite index on `(a, b)` supports queries on `a` + `b` and `a` alone, but not `b` alone.
26+
27+
## Partial Indexes
28+
29+
Reduce index size by filtering to common query patterns.
30+
Only use if index size is problematic but the index is needed for performance.
31+
32+
```sql
33+
CREATE INDEX order_active_idx ON order (customer_id)
34+
WHERE status = 'active';
35+
```
36+
37+
## Covering Indexes
38+
39+
Consider creating covering indexes for commonly executed query patterns that return only 1 or a small number of columns.
40+
41+
## Index Types
42+
43+
| Type | Use Case | Example |
44+
| --- | --- | --- |
45+
| B-tree (default) | Equality, range, sorting | `WHERE id = 1`, `ORDER BY date` |
46+
| GIN | Arrays, JSONB, full-text | `WHERE tags @> ARRAY['x']` |
47+
| GiST | Geometric, range types, full-text | PostGIS, `tsrange`, `tsvector` |
48+
| BRIN | Large sequential/time-series | Append-only logs, events (requires physical row order correlation) |
49+
50+
```sql
51+
CREATE INDEX metadata_idx ON order USING GIN (metadata); -- JSONB
52+
CREATE INDEX event_created_idx ON event USING BRIN (created_at); -- time-series
53+
```
54+
55+
## Guidelines
56+
57+
- Name indexes consistently: `{table}_{column}_idx`
58+
- Review for unused indexes periodically
59+
- **Always confirm with a human before removing or dropping any indexes** — even unused ones may serve a purpose not reflected in recent stats
60+
- Use partial indexes for frequently filtered subsets
61+
- Use covering indexes on hot read paths
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: Memory Architecture and OOM Prevention
3+
description: PostgreSQL shared/private memory layout, OS page cache interaction, and OOM avoidance strategies
4+
tags: postgres, memory, shared_buffers, work_mem, oom, architecture, operations
5+
---
6+
7+
# Memory Architecture and OOM Prevention
8+
9+
## Memory Areas
10+
11+
- **Shared memory**: `shared_buffers` — main data cache, all processes, requires restart to change.
12+
- **Private per backend**: `work_mem` (sorts/hashes/joins, per-operation); `maintenance_work_mem` (VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY); `temp_buffers` (8MB default).
13+
- **Planner hint only**: `effective_cache_size` is NOT allocated — set to ~50–75% of total RAM.
14+
- **Hash multiplier**: `hash_mem_multiplier` (default 2.0) means hash ops use up to 2× `work_mem`.
15+
16+
## Memory Multiplication Danger
17+
18+
Maximum potential: `work_mem × operations_per_query × (parallel_workers + 1) × connections` (leader participates by default via `parallel_leader_participation = on`; hash operations use up to `hash_mem_multiplier × work_mem`, default 2.0). Example: 128MB work_mem, 3 ops (2 sorts + 1 hash join), 2 parallel workers, 100 connections → 2 sorts at 128MB = 256MB, 1 hash join at 128MB × 2.0 = 256MB, per process = 512MB, × 3 processes (2 workers + leader) = 1536MB/query, × 100 connections = **~150GB** worst case. This case is rare.
19+
Not all queries hit limits at once, but high concurrency + large datasets approach it. This is a common cause of OOM in containerized/Kubernetes deployments. Plan capacity with a 1.5–2× safety margin.
20+
21+
## OS Page Cache (Double Buffering)
22+
23+
Data exists in both `shared_buffers` and OS page cache. A miss in shared_buffers can still hit OS cache (avoiding disk I/O). Extremely large shared_buffers can hurt performance: less OS cache, slower startup, heavier checkpoints. Optimal split depends on workload (OLTP vs OLAP).
24+
25+
## OOM Prevention
26+
27+
- Implement connection pooling to reduce total backend count.
28+
- Reduce `work_mem` globally; use per-session overrides for heavy queries only.
29+
- Lower `max_parallel_workers_per_gather` in high-concurrency systems.
30+
- Set `statement_timeout` to kill runaway queries.
31+
- Monitor: `dmesg -T | grep "killed process"` and `temp_blks_written` in pg_stat_statements.
32+
33+
## Operational Rules
34+
35+
- Tune per-session first, global last.
36+
- Suspect OOM when memory spikes during high concurrency, dashboards, or large batch jobs.
37+
- Increase memory only after confirming spill behavior (`temp_blks_written > 0`).
38+
- `maintenance_work_mem` can be set much higher (1–2GB) — fewer processes use it. Cap autovacuum with `autovacuum_work_mem` to avoid `autovacuum_max_workers × maintenance_work_mem` memory spikes.
39+
- `shared_buffers` change requires full restart; `work_mem` is per-session changeable.

0 commit comments

Comments
 (0)