Skip to content

Conversation storage: schema extension and per-request scoping #83

Description

@turisanapo

Problem

The ConversationStorage interface currently has a fixed schema (id, created_at, metadata) and no awareness of per-request context. Consumers that need tenant isolation (e.g., scoping conversations to an organization_id) are forced into workarounds:

  • Stuffing system fields into metadata — pollutes the end-user-facing metadata with internal concerns (organization_id is not consumer data)
  • Wrapping storage with AsyncLocalStorage — adds complexity and a non-standard pattern
  • Hacking requests in onRequest — reconstructing Request objects to inject/strip query params and body fields

The core issue is twofold:

  1. No schema extension — consumers can't add custom columns to conversation tables
  2. No per-request context in storage — the storage is a singleton with no access to the request's state bag

Use Case

A multi-tenant platform where each organization has its own conversations stored in the same GreptimeDB instance. Every write must include organization_id, every read must filter by it. This isolation must be enforced at the storage boundary — not at the HTTP level — so it's impossible to forget.

Future consumers may need additional scoping fields (e.g., project_id, environment), so the mechanism should be general-purpose rather than hardcoded to a single field.


Proposed Approaches

Four approaches were evaluated. All share the same additionalFields mechanism for schema extension (DDL) but differ in how runtime behavior is handled.

additionalFields (shared across all approaches)

Following better-auth's pattern, additionalFields declares the schema — what the field is, not how it's stored. Database-level concerns (primary keys, indexes, partitioning) are handled internally by each dialect's migration logic.

additionalFields: {
  conversations: {
    organization_id: {
      type: "string",
      required: true,
    },
  },
}

On migrate(), the dialect adds the column to the DDL. For GreptimeDB, the dialect can automatically include required additional fields in the primary key to optimise partition/search performance — the consumer doesn't need to specify this.


Approach 1: Declarative Fields with resolve

Extend additionalFields with a resolve function that maps state to a value. The gateway auto-injects on writes and auto-filters on reads.

gateway({
  storage: {
    dialect: GrepTimeDialect(client),
    additionalFields: {
      conversations: {
        organization_id: {
          type: "string",
          required: true,
          resolve: (state) => state.organizationId as string,
        },
      },
    },
  },
});

The gateway handles everything internally:

  • DDL: Adds the column on migrate()
  • Writes (create, update): Calls resolve(state) and injects the value
  • Reads (list, get, delete): Calls resolve(state) and adds a WHERE clause

Pros:

  • Simplest consumer API — declare once, isolation is automatic everywhere
  • Impossible to forget a filter — enforced on every operation by the gateway
  • Schema and binding are co-located in one declaration
  • Zero boilerplate

Cons:

  • Rigid — every field follows the same "inject on write, filter on read" pattern
  • No conditional logic per operation (e.g., different behavior for create vs. update)
  • No post-processing or after-read transformation
  • The gateway takes on more responsibility internally

Approach 2: additionalFields + Storage Hooks (by phase)

Schema extension for DDL, with explicit named hooks split by read/write phase.

gateway({
  storage: {
    dialect: GrepTimeDialect(client),
    additionalFields: {
      conversations: {
        organization_id: { type: "string", required: true },
      },
    },
    hooks: {
      onBeforeWrite: ({ operation, resource, data, state }) => {
        return { ...data, organization_id: state.organizationId };
      },
      onBeforeRead: ({ operation, resource, query, state }) => {
        return { ...query, organization_id: state.organizationId };
      },
      onAfterRead: ({ operation, resource, result, state }) => {
        // strip internal fields, enrich, audit, etc.
        return result;
      },
    },
  },
});

Pros:

  • Clean separation — additionalFields handles DDL, hooks handle behavior
  • Flexible — conditional logic per operation/resource, post-processing via onAfterRead
  • Typed, named hooks are easy to reason about

Cons:

  • Consumer writes explicit inject/filter logic — more boilerplate than Approach 1
  • Not auto-enforced — consumer can forget to handle an operation/resource combination
  • Two hooks (onBeforeWrite + onBeforeRead) must stay in sync — source of drift
  • Single hook per slot (not composable like middleware)

Approach 3: additionalFields + Storage Middleware

Schema extension for DDL, with a generic middleware chain that wraps every storage operation.

gateway({
  storage: {
    dialect: GrepTimeDialect(client),
    additionalFields: {
      conversations: {
        organization_id: { type: "string", required: true },
      },
    },
    middleware: async ({ operation, resource, args, state, next }) => {
      const orgId = state.organizationId as string;

      if (resource === "conversation") {
        if (operation === "create" || operation === "update") {
          args.data = { ...args.data, organization_id: orgId };
        }
        if (operation === "list" || operation === "get" || operation === "delete") {
          args.where = { ...args.where, organization_id: orgId };
        }
      }

      return next();
    },
  },
});

Multiple middlewares can be composed as an array:

middleware: [tenantIsolation, auditLog, rateLimiter]

Pros:

  • Maximum flexibility — full control over every operation with before/after semantics
  • Composable — multiple middlewares chain via next() (onion model)
  • Single extension point for all behaviors

Cons:

  • Most complex to implement in the gateway
  • args is untyped per-operation — consumer must know each operation's arg shape
  • Most boilerplate for simple cases
  • Middleware ordering adds cognitive overhead
  • Not auto-enforced — same risk of forgetting a filter as Approach 2

Approach 4: additionalFields + Operation Hooks (Prisma-style) ⭐

Schema extension for DDL, with one hook per storage operation (create, update, delete, list, get). Each hook receives operation-specific args (data, id, params), a query function to call the underlying storage (Prisma naming), and a nested context object with request-level data like state. Calling query returns the result, so the hook can modify inputs before and transform outputs after.

gateway({
  storage: {
    dialect: GrepTimeDialect(client),
    additionalFields: {
      conversations: {
        organization_id: { type: "string", required: true },
      },
    },
    hooks: {
      create: ({ data, context, query }) => {
        return query({ ...data, organization_id: context.state.organizationId });
      },
      update: ({ id, data, context, query }) => {
        return query(id, { ...data, organization_id: context.state.organizationId });
      },
      delete: ({ id, context, query }) => {
        return query(id, { where: { organization_id: context.state.organizationId } });
      },
      list: ({ params, context, query }) => {
        return query({
          ...params,
          where: { ...params.where, organization_id: context.state.organizationId },
        });
      },
      get: async ({ id, context, query }) => {
        const result = await query(id, {
          where: { organization_id: context.state.organizationId },
        });
        // can sanitize/strip internal fields before returning
        return result;
      },
    },
  },
});

Each hook receives:

  • resource"conversation" or "item"
  • context — nested request-level context (request, state, etc.), kept separate from storage-level args (better-auth pattern)
  • query — executes the underlying storage operation; enables post-processing of results
  • Operation-specific argsdata for writes, id for single-entity operations, params for listing

Pros:

  • One hook per operation with typed args — create gets data, delete gets id, list gets params, get gets id
  • Each hook wraps the full operation — modify inputs and transform outputs in one place
  • No branching on operation type — each hook handles exactly one operation
  • Familiar pattern (Prisma $extends({ query }))

Cons:

  • Not auto-enforced — consumer can forget to handle a hook
  • Not composable (single hook per slot, not a chain)
  • Slightly more boilerplate than Approach 1 for simple cases

Comparison

1: Declarative 2: Phase Hooks 3: Middleware 4: Operation Hooks
Consumer effort None Low-Medium Medium Low
Safety (can't leak) Highest Medium Medium Medium
Flexibility Low-Medium High Highest High
Typed per-operation N/A No (branches) No (branches) Yes
Post-processing No Yes (onAfterRead) Yes Yes (after query())
Before + after in one place N/A No (separate hooks) Yes Yes
Composability N/A No Yes (chain) No
Implementation effort Medium Medium High Medium
Closest analogy Prisma auto-inject better-auth hooks Express middleware Prisma $extends({ query })

Shared Prerequisite

All four approaches require the same foundational change: threading the per-request state from gw.handler(request, state) into the storage layer. Today, storage is a context-free singleton. The state bag already flows through hooks — it needs to also reach storage operations.

Open Questions

  1. Should additionalFields also apply to conversation_items, or only conversations?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions