Skip to content

Bug: Subset - polymorphic scope predicates missing OR NOT bypass guard - multi-type refs produce ~0 rows #396

@herrbuerger

Description

@herrbuerger

Summary

When virtual_references defines multiple polymorphic references on the same table (e.g., comments.commentable_idposts and videos with polymorphic_exprs), the generated subset query silently drops nearly all rows. Only rows where the FK column is NULL survive.

Configuration

dump:
  virtual_references:
    - schema: "public"
      name: "comments"
      references:
        - schema: "public"
          name: "posts"
          polymorphic_exprs:
            - "public.comments.commentable_type = 'post'"
          columns:
            - name: "commentable_id"
        - schema: "public"
          name: "videos"
          polymorphic_exprs:
            - "public.comments.commentable_type = 'video'"
          columns:
            - name: "commentable_id"

  transformation:
    - schema: "public"
      name: "posts"
      subset_conds:
        - "public.posts.id in (1)"

Expected Behavior

SELECT "public"."comments".* FROM "public"."comments"
  LEFT JOIN "public"."posts"
    ON "public"."comments"."commentable_id" = "public"."posts"."id"
    AND public.comments.commentable_type = 'post'
  LEFT JOIN "public"."videos"
    ON "public"."comments"."commentable_id" = "public"."videos"."id"
    AND public.comments.commentable_type = 'video'
WHERE
  (("public"."comments"."commentable_id" IS NULL
    OR "public"."posts"."id" IS NOT NULL
    OR NOT (public.comments.commentable_type = 'post')))
  AND
  (("public"."comments"."commentable_id" IS NULL
    OR "public"."videos"."id" IS NOT NULL
    OR NOT (public.comments.commentable_type = 'video')))

The OR NOT(type = 'post') bypass ensures a type = 'video' row passes through the post-guard (and vice versa). AND-combining these guards is correct.

Actual Behavior

In certain traversal paths through the subset graph, the generated scope predicates lack the OR NOT bypass guard:

WHERE
  (("public"."comments"."commentable_id" IS NULL)
    OR ("public"."comments"."commentable_type" = 'post'
        AND "public"."comments"."commentable_id" IN (SELECT ...)))
  AND
  (("public"."comments"."commentable_id" IS NULL)
    OR ("public"."comments"."commentable_type" = 'video'
        AND "public"."comments"."commentable_id" IN (SELECT ...)))

Table for non-NULL commentable_id:

commentable_type Post guard Video guard AND
post type='post' matches type='video' fails
video type='post' fails type='video' matches

Result: Every non-NULL FK row is filtered out. Only NULL FK rows (if any) survive.

Root Cause

generateIntegrityChecksForNullableEdges() in https://github.com/greenmaskio/greenmask/tree/main/internal/db/postgres/subset/graph.go#L0-L32 correctly generates the OR NOT(polymorphic_expr) bypass:

// graph.go ~line 995
polymorphicExpr = fmt.Sprintf(" OR NOT (%s)", strings.Join(e.from.polymorphicExprs, " AND "))
k = fmt.Sprintf(`(%s IS NULL OR %s IS NOT NULL%s)`, leftFK, rightPK, polymorphicExpr)

However, the scope-edge predicate wrapping in generateQueryForTables() (lines 591–600) does not include this bypass:

query = fmt.Sprintf("((%s) IN (%s))", strings.Join(leftTableConds, ", "), query)
if len(exprs) > 0 {
    query = fmt.Sprintf(
        "((%s) AND (%s) IN (%s))",
        strings.Join(leftTableConds, ", "), // FK columns
        strings.Join(exprs, "AND"),         // polymorphic_exprs — no OR NOT bypass
        query,
    )
}

When generateQueriesDfs() AND-combines scopes ("%s AND %s", currentScopeQuery, strings.Join(subQueries, " AND ")), the missing bypass causes mutually exclusive polymorphic_expr conditions to reject every row.

Possible Fix

Either:

  1. Add OR NOT(polymorphic_expr) to each scope predicate (matching the pattern in generateIntegrityChecksForNullableEdges), or
  2. Group polymorphic refs sharing the same FK column into a single OR-union within that scope.

Workaround

Replace multi-polymorphic virtual references with a single, manually composed subset_conds string that OR-combines the type conditions:

transformation:
  - schema: "public"
    name: "comments"
    subset_conds:
      - >-
        (commentable_type = 'post' AND commentable_id IN (SELECT id FROM posts WHERE ...))
        OR
        (commentable_type = 'video' AND commentable_id IN (SELECT id FROM videos WHERE ...))

This bypasses the scope graph system entirely.

Environment

  • greenmask v0.2.15 (ea5784e)
  • PostgreSQL 17

Metadata

Metadata

Assignees

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions