Skip to content

Add schema-agnostic PostgreSQL full-text search layer#220

Open
arup-chauhan wants to merge 3 commits into
saayam-for-all:search_implementation_dblayerfrom
arup-chauhan:search_implementation_dblayer
Open

Add schema-agnostic PostgreSQL full-text search layer#220
arup-chauhan wants to merge 3 commits into
saayam-for-all:search_implementation_dblayerfrom
arup-chauhan:search_implementation_dblayer

Conversation

@arup-chauhan
Copy link
Copy Markdown

@arup-chauhan arup-chauhan commented May 19, 2026

Description

This PR adds the database-side search implementation layer for PostgreSQL-backed search across the Saayam database schemas. Ref Issue: #184 and part of #207

The goal is to provide the Phase 1 search foundation using native PostgreSQL full-text search and pg_trgm fuzzy matching, while keeping the scripts schema-agnostic so they
can run against both the Virginia and Ireland database schemas through search_path.

The fuzzy search scope is intentionally limited to the MVP search entities: requests, users, and organizations. It is not applied globally to every table.

Context

This work follows the Saayam Search Implementation Plan and the database-side search requirements discussed for the search implementation layer.

Current scope:

  • PostgreSQL full-text search
  • fuzzy matching using pg_trgm
  • ranked search results
  • request, user, and organization search functions
  • local validation against Virginia and Ireland-style schema clones

Fuzzy Search Scope

Fuzzy search is supported for the following entities and fields:

Requests:

  • request.req_subj
  • request.req_desc
  • request.req_loc
  • help_categories.cat_name through request-category join

Users:

  • users.full_name
  • users.primary_email_address

Organizations:

  • organizations.org_name
  • organizations.city_name

Notes:

  • req_loc is text-based location matching, not geo-distance filtering.
  • Fuzzy search is intentionally not added to unrelated tables.
  • Authorization-aware filtering is applied inside the search functions so results are scoped by caller context.

Changes

Added search migration scripts under ddl/Search/codes/:

  • 01_enable_fuzzy_search.sql
  • 02_add_request_search.sql
  • 03_add_user_and_volunteer_search.sql
  • 04_add_category_and_advanced_search.sql

Added request search support:

  • weighted search_vector on request
  • GIN full-text index
  • trigram indexes for fuzzy matching on request subject, description, and location text
  • category-backed request search through help_categories.cat_name
  • request location text matching through req_loc
  • search_requests(...) function

Added user search support:

  • weighted search_vector on users
  • GIN full-text index
  • trigram indexes for fuzzy matching on full name and email
  • exact email index
  • search_users(...) function

Added organization search support:

  • trigram index for fuzzy matching on organization name
  • trigram index for fuzzy matching on city name
  • search_organizations(...) function

Added schema-agnostic execution:

  • production scripts no longer hardcode a schema name
  • region targeting is handled through search_path
  • Virginia runner sets search_path to virginia_dev_saayam_rdbms
  • Ireland runner sets search_path to proposed_saayam

Added local validation setup under ddl/Search/tests/:

  • test clones under test_clones/
  • migration and validation runners under runners/
  • Virginia validation checks under virginia_validation/
  • Ireland validation checks under ireland_validation/
  • smoke tests
  • index checks
  • function checks
  • clone validation checks
  • QA validation notes

Behavior / Safety

No application or API behavior is changed in this PR.

The database search functions are additive and are introduced as callable DB-side search entry points.

Compatibility boundary:

  • Existing tables are extended with generated search columns where needed.
  • Existing data remains unchanged.
  • Indexes are added for search performance.
  • Search scripts are idempotent where practical.
  • Region-specific behavior is handled through search_path, not duplicated scripts.

Authorization-aware filtering is included at the function level:

  • admin-level callers can receive broader results
  • non-admin callers require self-scope or allowed ID scope
  • missing non-admin scope returns zero rows

This is DB-side authorization-aware filtering, not the final production RBAC/session model. Final RBAC/session enforcement remains a backend and DevSecOps follow-up.

Risk boundary is moderate-low for local validation:

  • migrations are additive
  • no destructive production data changes are included
  • QA/RDS validation is still required before production rollout

Validation

Executed and passing locally against a fresh Virginia-style test clone:

psql -d <db> -f ddl/Search/tests/test_clones/virginia_search_instance_clone.sql
psql -d <db> -f ddl/Search/tests/runners/run_virginia_search_migrations.sql
psql -d <db> -v ON_ERROR_STOP=1 -f ddl/Search/tests/runners/run_virginia_search_validation.sql

Result:

- smoke test passed
- index check passed
- function check passed
- instance clone check passed

Executed and passing locally against a fresh Ireland-style test clone:

psql -d <db> -f ddl/Search/tests/test_clones/ireland_search_instance_clone.sql
psql -d <db> -f ddl/Search/tests/runners/run_ireland_search_migrations.sql
psql -d <db> -v ON_ERROR_STOP=1 -f ddl/Search/tests/runners/run_ireland_search_validation.sql

Result:

- smoke test passed
- index check passed
- function check passed
- migrated clone check passed

Final local validation databases after test-folder reorganization:

- Virginia: saayam_virginia_reorg_e2e_1779302873
- Ireland: saayam_ireland_reorg_e2e_1779302883

## Follow-ups

Planned next steps after this PR:

- run the same validation against QA/RDS once credentials are available
- capture EXPLAIN / EXPLAIN ANALYZE plans on real QA data
- confirm index usage and latency targets in QA
- finalize backend-to-DB authorization context with DevSecOps
- evaluate RLS / SET LOCAL app.* session context / IAM DB auth for production hardening

…ries

Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>
Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>
@arup-chauhan
Copy link
Copy Markdown
Author

Hello everyone,

This PR adds the database-side search implementation layer using PostgreSQL full-text search and pg_trgm fuzzy matching.

Scope is the DB search foundation: request search, user search, organization search, schema-agnostic migration execution, and local validation against both Virginia and Ireland-style schema clones.

Fuzzy search is limited to the MVP search entities and fields: requests, users, and organizations. It is not applied globally to every table.

The production scripts are shared across schemas through search_path, so we avoid maintaining separate Virginia/Ireland copies.

This PR also includes DB-side authorization-aware filtering in the search functions.
Final RBAC/session enforcement remains a backend and DevSecOps follow-up.

QA/RDS validation is still pending because it requires QA database access, but the local validation runners pass for both schemas.

Signed-off-by: Arup Chauhan <arupchauhan.connect@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants