Summary
When virtual_references defines multiple polymorphic references on the same table (e.g., comments.commentable_id → posts and videos with polymorphic_exprs), the generated subset query silently drops nearly all rows. Only rows where the FK column is NULL survive.
Configuration
dump:
virtual_references:
- schema: "public"
name: "comments"
references:
- schema: "public"
name: "posts"
polymorphic_exprs:
- "public.comments.commentable_type = 'post'"
columns:
- name: "commentable_id"
- schema: "public"
name: "videos"
polymorphic_exprs:
- "public.comments.commentable_type = 'video'"
columns:
- name: "commentable_id"
transformation:
- schema: "public"
name: "posts"
subset_conds:
- "public.posts.id in (1)"
Expected Behavior
SELECT "public"."comments".* FROM "public"."comments"
LEFT JOIN "public"."posts"
ON "public"."comments"."commentable_id" = "public"."posts"."id"
AND public.comments.commentable_type = 'post'
LEFT JOIN "public"."videos"
ON "public"."comments"."commentable_id" = "public"."videos"."id"
AND public.comments.commentable_type = 'video'
WHERE
(("public"."comments"."commentable_id" IS NULL
OR "public"."posts"."id" IS NOT NULL
OR NOT (public.comments.commentable_type = 'post')))
AND
(("public"."comments"."commentable_id" IS NULL
OR "public"."videos"."id" IS NOT NULL
OR NOT (public.comments.commentable_type = 'video')))
The OR NOT(type = 'post') bypass ensures a type = 'video' row passes through the post-guard (and vice versa). AND-combining these guards is correct.
Actual Behavior
In certain traversal paths through the subset graph, the generated scope predicates lack the OR NOT bypass guard:
WHERE
(("public"."comments"."commentable_id" IS NULL)
OR ("public"."comments"."commentable_type" = 'post'
AND "public"."comments"."commentable_id" IN (SELECT ...)))
AND
(("public"."comments"."commentable_id" IS NULL)
OR ("public"."comments"."commentable_type" = 'video'
AND "public"."comments"."commentable_id" IN (SELECT ...)))
Table for non-NULL commentable_id:
commentable_type |
Post guard |
Video guard |
AND |
post |
type='post' matches |
type='video' fails |
❌ |
video |
type='post' fails |
type='video' matches |
❌ |
Result: Every non-NULL FK row is filtered out. Only NULL FK rows (if any) survive.
Root Cause
generateIntegrityChecksForNullableEdges() in https://github.com/greenmaskio/greenmask/tree/main/internal/db/postgres/subset/graph.go#L0-L32 correctly generates the OR NOT(polymorphic_expr) bypass:
// graph.go ~line 995
polymorphicExpr = fmt.Sprintf(" OR NOT (%s)", strings.Join(e.from.polymorphicExprs, " AND "))
k = fmt.Sprintf(`(%s IS NULL OR %s IS NOT NULL%s)`, leftFK, rightPK, polymorphicExpr)
However, the scope-edge predicate wrapping in generateQueryForTables() (lines 591–600) does not include this bypass:
query = fmt.Sprintf("((%s) IN (%s))", strings.Join(leftTableConds, ", "), query)
if len(exprs) > 0 {
query = fmt.Sprintf(
"((%s) AND (%s) IN (%s))",
strings.Join(leftTableConds, ", "), // FK columns
strings.Join(exprs, "AND"), // polymorphic_exprs — no OR NOT bypass
query,
)
}
When generateQueriesDfs() AND-combines scopes ("%s AND %s", currentScopeQuery, strings.Join(subQueries, " AND ")), the missing bypass causes mutually exclusive polymorphic_expr conditions to reject every row.
Possible Fix
Either:
- Add
OR NOT(polymorphic_expr) to each scope predicate (matching the pattern in generateIntegrityChecksForNullableEdges), or
- Group polymorphic refs sharing the same FK column into a single OR-union within that scope.
Workaround
Replace multi-polymorphic virtual references with a single, manually composed subset_conds string that OR-combines the type conditions:
transformation:
- schema: "public"
name: "comments"
subset_conds:
- >-
(commentable_type = 'post' AND commentable_id IN (SELECT id FROM posts WHERE ...))
OR
(commentable_type = 'video' AND commentable_id IN (SELECT id FROM videos WHERE ...))
This bypasses the scope graph system entirely.
Environment
- greenmask v0.2.15 (
ea5784e)
- PostgreSQL 17
Summary
When
virtual_referencesdefines multiple polymorphic references on the same table (e.g.,comments.commentable_id→postsandvideoswithpolymorphic_exprs), the generated subset query silently drops nearly all rows. Only rows where the FK column isNULLsurvive.Configuration
Expected Behavior
The
OR NOT(type = 'post')bypass ensures atype = 'video'row passes through the post-guard (and vice versa). AND-combining these guards is correct.Actual Behavior
In certain traversal paths through the subset graph, the generated scope predicates lack the
OR NOTbypass guard:Table for non-NULL
commentable_id:commentable_typeposttype='post'matchestype='video'failsvideotype='post'failstype='video'matchesResult: Every non-NULL FK row is filtered out. Only NULL FK rows (if any) survive.
Root Cause
generateIntegrityChecksForNullableEdges()in https://github.com/greenmaskio/greenmask/tree/main/internal/db/postgres/subset/graph.go#L0-L32 correctly generates theOR NOT(polymorphic_expr)bypass:However, the scope-edge predicate wrapping in
generateQueryForTables()(lines 591–600) does not include this bypass:When
generateQueriesDfs()AND-combines scopes("%s AND %s", currentScopeQuery, strings.Join(subQueries, " AND ")), the missing bypass causes mutually exclusivepolymorphic_exprconditions to reject every row.Possible Fix
Either:
OR NOT(polymorphic_expr)to each scope predicate (matching the pattern ingenerateIntegrityChecksForNullableEdges), orWorkaround
Replace multi-polymorphic virtual references with a single, manually composed
subset_condsstring that OR-combines the type conditions:This bypasses the scope graph system entirely.
Environment
ea5784e)