feat(replay-vision): API validation + lens_result row column#58685
feat(replay-vision): API validation + lens_result row column#58685TueHaulund wants to merge 1 commit into
Conversation
🔍 Migration Risk AnalysisWe've analyzed your migrations for potential risks. Summary: 0 Safe | 1 Needs Review | 1 Blocked ❌ BlockedCauses locks or breaks compatibility
|
Migration SQL ChangesHey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:
|
|
Size Change: -5.16 kB (0%) Total Size: 117 MB 📦 View Changed
ℹ️ View Unchanged
|
79b4022 to
ea712bf
Compare
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
products/replay_vision/backend/temporal/workflow.py:153-156
**`signals_count` is always 0 — field never populated**
`LensResult(model_output=call_output.model_output)` omits `signals_count`, so it always persists as `0` in the database. The field is documented as "Number of PostHog Signals emitted from this observation", but the workflow has no code that counts or passes a non-zero value. Per simplicity rule 4 ("no superfluous parts"), either the counting logic should be wired in here, or the field (and its `LensResultSerializer` counterpart) should be deferred to a future PR to avoid carrying dead state on every row.
### Issue 2 of 2
products/replay_vision/backend/temporal/activities/call_lens_provider.py:39-45
`_strip_lens_type` mutates the dict it receives rather than working on a copy. `model_json_schema()` returns a fresh dict today, but the mutation is invisible at the call site and would silently corrupt a cached schema if the call convention ever changed. Using `dict(schema)` makes the intent explicit and eliminates the hazard.
```suggestion
def _strip_lens_type(schema: dict[str, Any]) -> dict[str, Any]:
"""Hide the `lens_type` discriminator from the LLM — set server-side via the output class default."""
schema = dict(schema)
if "properties" in schema:
schema["properties"] = {k: v for k, v in schema["properties"].items() if k != "lens_type"}
if "required" in schema:
schema["required"] = [r for r in schema["required"] if r != "lens_type"]
return schema
```
Reviews (1): Last reviewed commit: "feat(replay-vision): API validation + le..." | Re-trigger Greptile |
|
Reviews (2): Last reviewed commit: "chore(replay-vision): remove remaining d..." | Re-trigger Greptile |
4f86052 to
d02bc21
Compare
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
products/replay_vision/backend/api/lenses.py:178-183
`to_representation` discards the validated `RecordingsQuery` object and then re-calls `.model_validate` only to throw the result away — but the bigger issue is that the check `if data.get("query")` skips the defensive re-validation entirely when the stored query is an empty dict `{}` (falsy). A stored `{}` is a valid query, but if the schema later gains a required field, that row silently skips the guard and is served raw. Using `is not None` avoids the inconsistency.
```suggestion
if data.get("query") is not None:
try:
RecordingsQuery.model_validate(data["query"])
except PydanticValidationError:
logger.exception("replay_vision.lens.malformed_query", lens_id=str(instance.id))
data["query"] = None
```
Reviews (3): Last reviewed commit: "feat(replay-vision): API validation + le..." | Re-trigger Greptile |
Validation:
- `lens_config` is validated against the per-`lens_type` Pydantic discriminated
union (`AnyLens`) in the serializer — a classifier without `tags`, a scorer
with an inverted scale, or a non-dict config now 400s on save instead of
failing every observation at runtime.
- `query` is validated against `posthog.schema.RecordingsQuery` (strict,
`extra="forbid"`) and typed for OpenAPI via `@extend_schema_field`.
`date_from`/`date_to` are stripped on save — the schedule controls time,
not the user. The persisted blob is re-validated on read via the lens
serializer's `to_representation` (log + null on failure).
- `lens_snapshot` and `lens_result` are also validated on read via Pydantic
at the API serializer boundary (`SerializerMethodField` + `model_validate`),
with graceful degradation (log + null) on failure.
`lens_result` JSONField on `ReplayObservation`:
- Populated by `mark_observation_succeeded_activity`. Typed via a new
`LensResult { model_output: AnyLensOutput, signals_count: int }` Pydantic
model. `AnyLensOutput` is a discriminated union over the five output
classes; each output class gets `lens_type: Literal[LensType.X]` as the
discriminator. The literal constraint forces the LLM's structured output
to emit the correct value.
- Adds `BaseLensOutput.to_event_properties()` for the event flattening
(excludes `lens_type`, which is already a top-level event property via
the snapshot).
- API exposes `lens_result` as null until succeeded.
Cleanup: removes the dead `backend/presentation/` bootstrap scaffold + the
`backend/logic/` placeholder; the product is not isolated and doesn't use
the facade pattern.
d02bc21 to
9a003c8
Compare
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
products/replay_vision/backend/api/lenses.py:169-174
The validated `RecordingsQuery` object is discarded after validation. The stored value is the raw user-supplied dict, which means Pydantic coercions (type normalization, alias resolution, etc.) are never persisted. This is consistent with the explicit intent — "persist exactly what the user sent" — but it also means the read-path re-validation in `to_representation` can encounter values that differ from what a fresh `model_dump` would produce. If you ever add coercing validators to `RecordingsQuery`, the write path and read path will silently diverge. Consider storing the normalized form (`validated.model_dump(mode="json")`) instead, which would make write and read paths equivalent.
```suggestion
try:
validated_query = RecordingsQuery.model_validate(attrs["query"])
except PydanticValidationError as exc:
raise serializers.ValidationError({"query": str(exc)})
# Persist the Pydantic-normalized form, minus the date keys the schedule controls.
attrs["query"] = {
k: v
for k, v in validated_query.model_dump(mode="json").items()
if k not in _QUERY_FIELDS_TO_STRIP
}
```
Reviews (4): Last reviewed commit: "feat(replay-vision): API validation + le..." | Re-trigger Greptile |
Problem
Two API gaps surfaced as workflow-runtime errors instead of 400s, and FE rendering required a CH query for output that should live on the row.
Changes
Validation:
lens_configis validated against the per-lens_typePydantic discriminated union (AnyLens) — a classifier withouttags, a scorer with an inverted scale, or a non-dict config now 400s on save.queryis validated againstRecordingsQuery(strict,extra="forbid") and typed for OpenAPI via@extend_schema_field;date_from/date_toare stripped on save. All three persisted blobs (query,lens_snapshot,lens_result) are re-validated on read at the API serializer boundary — graceful degradation (log + null) on validation failure.lens_resultJSONField onReplayObservation: populated bymark_observation_succeeded_activity. Typed via a newLensResult { model_output: AnyLensOutput, signals_count: int }Pydantic model.AnyLensOutputis a discriminated union over the five output classes; each output class declareslens_type: Literal[LensType.X] = LensType.Xas the discriminator. Gemini's structured-output mode sees the literal constraint and is forced to emit the correct value; Pydantic validation catches any deviation. AddsBaseLensOutput.to_event_properties()helper.Cleanup: removes the dead
backend/presentation/andbackend/logic/bootstrap scaffolds (product is not isolated, no facade pattern).How did you test this code?
Agent-authored. Automated tests only.
160 tests pass under
hogli test products/replay_vision/backend/tests/. Coverage includes valid/invalidlens_configper lens type, lens-type PATCH validation,queryvalidation +date_from/date_tostripping,lens_resultround-trip per type,to_event_propertiesflattening +lens_typeexclusion.Publish to changelog?
no
🤖 Agent context
Authored by Claude (Opus 4.7) via Claude Code. Stacked on
tue/replay-vision-call-provider(PR #58607).