Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 30 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ Similarity search is for vectorized annotation text. Use `prtags search similar`

`ghreplica` remains the source of truth for mirrored repositories, pull requests, and issues. `PRtags` resolves repo and object identity through `ghreplica`, uses stable GitHub-backed identifiers so renames do not break object identity, and derives write permissions from GitHub repo access. That means `PRtags` is not trying to become a second GitHub mirror. It is a curation layer over a mirror that already exists.

This split is important operationally too. `PRtags` owns its own database, jobs, search documents, and embeddings. It should not share a database with `ghreplica`, and it should not copy full PR or issue content unless it is maintaining a small explicit projection for display or indexing purposes.
This split is important operationally too. `PRtags` owns its own schema, jobs, search documents, and embeddings. It shares a Postgres database with `ghreplica` only so the two systems can join mirrored GitHub data with curation data; it should not copy full PR or issue content unless it is maintaining a small explicit projection for display or indexing purposes.

For group reads, `PRtags` returns refs by default. When metadata is requested, `PRtags` enriches member references from cached target projections. If a projection is missing or stale, `PRtags` returns the cached result it already has, marks the freshness state explicitly, and queues a background refresh from `ghreplica`. `group list` keeps the lighter default shape and returns `member_count` plus `member_counts` by type. The CLI keeps calling only `PRtags`.

Expand Down Expand Up @@ -242,7 +242,10 @@ The CLI resolves auth in this order:

## Local Development

The local development loop is straightforward. Start a Postgres instance, point `PRtags` at a running `ghreplica`, and run the API:
The local development loop needs a Postgres database that contains both the
`ghreplica` mirror schema and the `PRtags` schema. For repo-scoped commands, the
mirror schema must already have the repositories, issues, and pull requests you
want to reference.

```bash
docker run --rm --name prtags-postgres \
Expand All @@ -251,18 +254,27 @@ docker run --rm --name prtags-postgres \
-p 55432:5432 \
pgvector/pgvector:pg16

docker exec prtags-postgres \
psql -U postgres -d prtags -c 'CREATE SCHEMA prtags;'

export DATABASE_URL='postgres://postgres:prtags@127.0.0.1:55432/prtags?sslmode=disable'
export DB_MAX_OPEN_CONNS=5
export DB_MAX_IDLE_CONNS=2
export DB_CONN_MAX_IDLE_TIME=5m
export DB_CONN_MAX_LIFETIME=30m
export GHREPLICA_BASE_URL='https://ghreplica.dutiful.dev'
export PRTAGS_SCHEMA=prtags
export GHREPLICA_SCHEMA=public
export ALLOW_UNAUTH_WRITES=true
go run ./cmd/prtags serve
```

By default the server listens on `:8081`, runs migrations on startup, and starts the background indexing worker.

A fresh database like the one above is enough to test process startup and basic
health checks. To run the repo examples below, first run or restore `ghreplica`
against the same database so the configured mirror schema contains matching
GitHub data.

If you want to test outbound group comments locally, also set `GITHUB_APP_ID`, `GITHUB_APP_INSTALLATION_ID`, and either `GITHUB_APP_PRIVATE_KEY_PEM` or `GITHUB_APP_PRIVATE_KEY_PATH`. In production, prefer the mounted private key path and keep the containing directory readable by the container user only.

Once the server is up, these are the most useful manual operations:
Expand Down Expand Up @@ -328,17 +340,27 @@ This is the simplest local install path when you only need the client.

If you want to run `PRtags` yourself, think of deployment as standing up a second service next to `ghreplica`, not as extending the `ghreplica` process directly.

The deployment uses one shared Postgres database with separate
schemas:

- configured mirror schema for mirrored GitHub data
- `prtags` schema for groups, annotations, projections, and jobs

That shared-database topology is what allows normal SQL joins between `PRtags`
groups and `ghreplica` mirror tables. It deprecates the separate `PRtags`
database deployment shape, but not the `PRtags` tables or data model.

At minimum you need:

- a separate Postgres database for `PRtags`
- network access to a running `ghreplica` instance
- a shared Postgres database with `ghreplica` and `prtags` schemas
- a running `ghreplica` service writing mirror data into that database
- GitHub-authenticated requests for write operations if you want real permission enforcement
- a decision about the embedding provider and model you want to use beyond local development defaults

The basic shape is:

1. create the `PRtags` database
2. point `PRtags` at `ghreplica`
1. create the shared database schemas
2. point `PRtags` at the shared database
3. run migrations
4. start the API
5. verify health, readiness, and a few repo-scoped operations
Expand All @@ -347,7 +369,7 @@ The clean deployment boundary is:

- separate repo
- separate service
- separate database
- separate schema in the shared database
- same VM is fine at first
- separate domain is preferred once you expose it publicly

Expand Down
11 changes: 9 additions & 2 deletions cmd/prtags/access_cli.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,11 @@ func openOpsService() (*core.Service, func(), error) {
return nil, nil, err
}

db, err := database.OpenWithPool(cfg.DatabaseURL, database.PoolConfig{
databaseURL, err := databaseURLWithSearchPath(cfg.DatabaseURL, cfg.PRTagsSchema)
if err != nil {
return nil, nil, err
}
db, err := database.OpenWithPool(databaseURL, database.PoolConfig{
MaxOpenConns: cfg.DBMaxOpenConns,
MaxIdleConns: cfg.DBMaxIdleConns,
ConnMaxIdleTime: cfg.DBConnMaxIdleTime,
Expand All @@ -231,6 +235,9 @@ func openOpsService() (*core.Service, func(), error) {
if err != nil {
return nil, nil, err
}
if err := ensureConfiguredSchema(context.Background(), db, cfg.PRTagsSchema); err != nil {
return nil, nil, err
}
if err := database.RunMigrations(db); err != nil {
return nil, nil, err
}
Expand All @@ -245,6 +252,6 @@ func openOpsService() (*core.Service, func(), error) {
cleanup := func() {
_ = sqlDB.Close()
}
service := core.NewService(db, ghreplica.NewClient(cfg.GHReplicaBaseURL), permissions.AllowAllChecker{}, nil)
service := core.NewService(db, ghreplica.NewSchemaClient(db, cfg.GHReplicaSchema), permissions.AllowAllChecker{}, nil)
return service, cleanup, nil
}
1 change: 0 additions & 1 deletion cmd/prtags/access_cli_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,6 @@ func TestAccessGrantCommandsReachServiceOpen(t *testing.T) {
UserLogin: "dutifulbob",
UserID: 7937614,
})
t.Setenv("GHREPLICA_BASE_URL", "https://ghreplica.example")
t.Setenv("DATABASE_URL", "")

_, _, err := runCLI(t, "https://prtags.dutiful.dev", "access", "grant", "list", "-R", "acme/widgets")
Expand Down
69 changes: 67 additions & 2 deletions cmd/prtags/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"context"
"encoding/json"
"fmt"
"net/url"
"os"
"os/signal"
"strconv"
Expand Down Expand Up @@ -273,7 +274,7 @@ func openServeRuntime() (config.Config, serveRuntime, error) {
if !cfg.AllowUnauthWrites {
checker = permissions.NewGitHubChecker(0)
}
ghClient := ghreplica.NewClient(cfg.GHReplicaBaseURL)
ghClient := ghreplica.NewSchemaClient(db, cfg.GHReplicaSchema)
indexer := core.NewIndexer(db, ghClient, embedding.NewLocalHashProvider(cfg.EmbeddingModel, database.EmbeddingDimensions))
service := core.NewService(db, ghClient, checker, indexer)
commentSync := buildCommentSyncService(db, cfg)
Expand All @@ -294,7 +295,11 @@ func openServeRuntime() (config.Config, serveRuntime, error) {
}

func openConfiguredDatabase(cfg config.Config) (*gorm.DB, error) {
db, err := database.OpenWithPool(cfg.DatabaseURL, database.PoolConfig{
databaseURL, err := databaseURLWithSearchPath(cfg.DatabaseURL, cfg.PRTagsSchema)
if err != nil {
return nil, err
}
db, err := database.OpenWithPool(databaseURL, database.PoolConfig{
MaxOpenConns: cfg.DBMaxOpenConns,
MaxIdleConns: cfg.DBMaxIdleConns,
ConnMaxIdleTime: cfg.DBConnMaxIdleTime,
Expand All @@ -303,6 +308,9 @@ func openConfiguredDatabase(cfg config.Config) (*gorm.DB, error) {
if err != nil {
return nil, err
}
if err := ensureConfiguredSchema(context.Background(), db, cfg.PRTagsSchema); err != nil {
return nil, err
}
if err := database.RunMigrations(db); err != nil {
return nil, err
}
Expand All @@ -312,6 +320,63 @@ func openConfiguredDatabase(cfg config.Config) (*gorm.DB, error) {
return db, nil
}

func ensureConfiguredSchema(ctx context.Context, db *gorm.DB, schema string) error {
if schema == "public" {
return nil
}
sqlDB, err := db.DB()
if err != nil {
return err
}
var exists bool
if err := sqlDB.QueryRowContext(ctx, `
SELECT EXISTS (
SELECT 1
FROM pg_namespace
WHERE nspname = $1
)
`, schema).Scan(&exists); err != nil {
return err
}
if !exists {
return fmt.Errorf("PRTAGS_SCHEMA %q does not exist", schema)
}
return nil
}

func databaseURLWithSearchPath(databaseURL, schema string) (string, error) {
searchPath := schema
if schema != "public" {
searchPath += ",public"
}
trimmedURL := strings.TrimSpace(databaseURL)
if !strings.Contains(trimmedURL, "://") && isPostgresKeywordValueDSN(trimmedURL) {
return trimmedURL + " search_path=" + postgresKeywordValue(searchPath), nil
}

parsed, err := url.Parse(trimmedURL)
if err != nil {
return "", err
}
if parsed.Scheme == "" {
return "", fmt.Errorf("DATABASE_URL must be a URL or PostgreSQL keyword/value DSN")
}
query := parsed.Query()
query.Set("search_path", searchPath)
parsed.RawQuery = query.Encode()
return parsed.String(), nil
}

func isPostgresKeywordValueDSN(databaseURL string) bool {
fields := strings.Fields(databaseURL)
return len(fields) > 0 && strings.Contains(fields[0], "=")
}

func postgresKeywordValue(value string) string {
escaped := strings.NewReplacer(`\`, `\\`, `'`, `\'`).Replace(value)
return "'" + escaped + "'"
}

func buildCommentSyncService(db *gorm.DB, cfg config.Config) *core.CommentSyncService {
if !cfg.HasGitHubApp() {
return nil
Expand Down
73 changes: 71 additions & 2 deletions cmd/prtags/main_more_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,17 @@ import (
"net/http/httptest"
"os"
"path/filepath"
"regexp"
"testing"
"time"

"github.com/DATA-DOG/go-sqlmock"
"github.com/dutifuldev/prtags/internal/auth"
"github.com/dutifuldev/prtags/internal/config"
"github.com/dutifuldev/prtags/internal/jsend"
"github.com/spf13/cobra"
"github.com/stretchr/testify/require"
"gorm.io/driver/postgres"
"gorm.io/driver/sqlite"
"gorm.io/gorm"
"gorm.io/gorm/logger"
Expand Down Expand Up @@ -134,7 +137,8 @@ func TestAuthAndRuntimeHelpers(t *testing.T) {
DBMaxIdleConns: 1,
DBConnMaxIdleTime: time.Minute,
DBConnMaxLifetime: time.Minute,
GHReplicaBaseURL: "https://ghreplica.example",
PRTagsSchema: "public",
GHReplicaSchema: "public",
WorkerPollInterval: time.Second,
EmbeddingModel: "local-hash@1",
})
Expand Down Expand Up @@ -189,7 +193,6 @@ func TestFieldAndAccessHelpers(t *testing.T) {
func TestOpenOpsServiceWithSQLite(t *testing.T) {
tempDir := t.TempDir()
t.Setenv("DATABASE_URL", "sqlite://"+filepath.Join(tempDir, "ops.db"))
t.Setenv("GHREPLICA_BASE_URL", "https://ghreplica.example")
t.Setenv("DB_MAX_OPEN_CONNS", "1")
t.Setenv("DB_MAX_IDLE_CONNS", "1")
t.Setenv("DB_CONN_MAX_IDLE_TIME", "1m")
Expand All @@ -200,3 +203,69 @@ func TestOpenOpsServiceWithSQLite(t *testing.T) {
require.Error(t, err)
require.Nil(t, cleanup)
}

func TestEnsureConfiguredSchemaRejectsMissingSchema(t *testing.T) {
db, mock, cleanup := newMockPostgresDB(t)
defer cleanup()

mock.ExpectQuery(regexp.QuoteMeta("SELECT EXISTS (\n\t\t\tSELECT 1\n\t\t\tFROM pg_namespace\n\t\t\tWHERE nspname = $1\n\t\t)")).
WithArgs("prtags").
WillReturnRows(sqlmock.NewRows([]string{"exists"}).AddRow(false))

err := ensureConfiguredSchema(context.Background(), db, "prtags")
require.ErrorContains(t, err, `PRTAGS_SCHEMA "prtags" does not exist`)
require.NoError(t, mock.ExpectationsWereMet())
}

func TestEnsureConfiguredSchemaAllowsExistingSchema(t *testing.T) {
db, mock, cleanup := newMockPostgresDB(t)
defer cleanup()

mock.ExpectQuery(regexp.QuoteMeta("SELECT EXISTS (\n\t\t\tSELECT 1\n\t\t\tFROM pg_namespace\n\t\t\tWHERE nspname = $1\n\t\t)")).
WithArgs("prtags").
WillReturnRows(sqlmock.NewRows([]string{"exists"}).AddRow(true))

require.NoError(t, ensureConfiguredSchema(context.Background(), db, "prtags"))
require.NoError(t, mock.ExpectationsWereMet())
}

func TestEnsureConfiguredSchemaSkipsPublic(t *testing.T) {
db, mock, cleanup := newMockPostgresDB(t)
defer cleanup()

require.NoError(t, ensureConfiguredSchema(context.Background(), db, "public"))
require.NoError(t, mock.ExpectationsWereMet())
}

func TestDatabaseURLWithSearchPathPreservesURLDSN(t *testing.T) {
out, err := databaseURLWithSearchPath("postgres://user:pass@127.0.0.1:5432/ghreplica?sslmode=disable", "prtags")
require.NoError(t, err)
require.Equal(t, "postgres://user:pass@127.0.0.1:5432/ghreplica?search_path=prtags%2Cpublic&sslmode=disable", out)
}

func TestDatabaseURLWithSearchPathPreservesKeywordValueDSN(t *testing.T) {
out, err := databaseURLWithSearchPath("host=/cloudsql/project:region:instance user=bob dbname=ghreplica sslmode=disable", "prtags")
require.NoError(t, err)
require.Equal(t, "host=/cloudsql/project:region:instance user=bob dbname=ghreplica sslmode=disable search_path='prtags,public'", out)
}

func TestDatabaseURLWithSearchPathRejectsUnsupportedDSN(t *testing.T) {
_, err := databaseURLWithSearchPath("not a dsn", "prtags")
require.ErrorContains(t, err, "DATABASE_URL must be a URL or PostgreSQL keyword/value DSN")
}

func newMockPostgresDB(t *testing.T) (*gorm.DB, sqlmock.Sqlmock, func()) {
t.Helper()

sqlDB, mock, err := sqlmock.New()
require.NoError(t, err)
db, err := gorm.Open(postgres.New(postgres.Config{
Conn: sqlDB,
PreferSimpleProtocol: true,
}), &gorm.Config{Logger: logger.Default.LogMode(logger.Silent)})
require.NoError(t, err)

return db, mock, func() {
_ = sqlDB.Close()
}
}
3 changes: 3 additions & 0 deletions deploy/gcp/SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ GITHUB_APP_PRIVATE_KEY_PATH=/home/bob/prtags/secrets/github-app.private-key.pem
The shared Cloud SQL instance is connection-limited, so keep the default pool settings conservative unless you have intentionally split workers or moved the database:

```env
DB_NAME=ghreplica
PRTAGS_SCHEMA=prtags
GHREPLICA_SCHEMA=public
DB_MAX_OPEN_CONNS=5
DB_MAX_IDLE_CONNS=2
DB_CONN_MAX_IDLE_TIME=5m
Expand Down
3 changes: 2 additions & 1 deletion deploy/gcp/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@ services:
environment:
LISTEN_ADDR: ":8081"
DATABASE_URL: "postgres://${DB_IAM_USER_URLENCODED}@cloudsql-proxy:5432/${DB_NAME}?sslmode=disable"
PRTAGS_SCHEMA: "${PRTAGS_SCHEMA:-public}"
GHREPLICA_SCHEMA: "${GHREPLICA_SCHEMA:-public}"
DB_MAX_OPEN_CONNS: "${DB_MAX_OPEN_CONNS:-5}"
DB_MAX_IDLE_CONNS: "${DB_MAX_IDLE_CONNS:-2}"
DB_CONN_MAX_IDLE_TIME: "${DB_CONN_MAX_IDLE_TIME:-5m}"
DB_CONN_MAX_LIFETIME: "${DB_CONN_MAX_LIFETIME:-30m}"
GHREPLICA_BASE_URL: "${GHREPLICA_BASE_URL:-https://ghreplica.dutiful.dev}"
GITHUB_BASE_URL: "${GITHUB_BASE_URL:-https://api.github.com}"
GITHUB_APP_ID: "${GITHUB_APP_ID:-}"
GITHUB_APP_INSTALLATION_ID: "${GITHUB_APP_INSTALLATION_ID:-}"
Expand Down
11 changes: 6 additions & 5 deletions deploy/gcp/prtags.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,12 @@
# bob-gcloud%40dutiful-20260414.iam
DB_IAM_USER_URLENCODED=

# Database name already provisioned for PRtags.
DB_NAME=prtags
# Shared database name. PRtags tables live under PRTAGS_SCHEMA.
DB_NAME=ghreplica
PRTAGS_SCHEMA=prtags

# ghreplica mirror tables currently live in the configured schema.
GHREPLICA_SCHEMA=public

# Keep these conservative on the shared Cloud SQL instance unless you have
# intentionally moved workers onto a separate process or database.
Expand All @@ -18,9 +22,6 @@ DB_MAX_IDLE_CONNS=2
DB_CONN_MAX_IDLE_TIME=5m
DB_CONN_MAX_LIFETIME=30m

# Base URL for the live ghreplica instance that PRtags reads from.
GHREPLICA_BASE_URL=https://ghreplica.dutiful.dev

# Leave blank to use api.github.com for comment sync.
GITHUB_BASE_URL=

Expand Down
Loading
Loading