feat(data-warehouse): add multi-schema support for Postgres sources#58694
feat(data-warehouse): add multi-schema support for Postgres sources#58694RishiiGamer2201 wants to merge 4 commits into
Conversation
- Add include_all_schemas field to PostgresSourceConfig - Add SourceFieldSwitchConfig for toggle UI in frontend - Update backend to use all schemas when flag is enabled - Update validation to allow no schema when include_all_schemas is true Closes PostHog#58643
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new "switch" field type for source configs and uses it in the Postgres source to expose an "Include all schemas" toggle. When enabled, schema discovery scans all non-system schemas.
Changes:
- New
SourceFieldSwitchConfigschema/type and frontend renderer usingLemonSwitch. - Postgres source gains an
include_all_schemasboolean config that bypasses the single-schema filter during discovery. - Credential validation relaxed to allow missing
schemawheninclude_all_schemasis true.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| posthog/schema.py | Adds SourceFieldSwitchConfig pydantic model. |
| posthog/temporal/data_imports/sources/common/base.py | Includes switch config in source field type union. |
| posthog/temporal/data_imports/sources/generated_configs.py | Adds include_all_schemas to PostgresSourceConfig. |
| posthog/temporal/data_imports/sources/postgres/source.py | Wires up new switch field and applies it to schema discovery and validation. |
| products/.../forms/SourceForm.tsx | Renders the new switch field type via LemonField + LemonSwitch. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| > | ||
| {({ value, onChange }) => ( | ||
| <LemonSwitch | ||
| checked={value || lastValue?.[field.name] || false} |
| if not schema and not schema_name and not config.include_all_schemas: | ||
| return False, "Schema is required for warehouse imports unless 'Include all schemas' is enabled." |
| effective_schema = None if config.include_all_schemas else config.schema | ||
|
|
| ) | ||
| label: str | ||
| name: str | ||
| type: Literal["switch"] = "switch" |
| SourceFieldSwitchConfig( | ||
| name="include_all_schemas", | ||
| label="Include all schemas", | ||
| caption="Enable to discover and sync tables from all non-system schemas in the database", | ||
| ), |
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
posthog/temporal/data_imports/sources/generated_configs.py:631
Using raw `bool` as the converter is incorrect here. Every other boolean config field in this file uses `config.str_to_bool`. The problem is that `bool("false")` returns `True` — any non-empty string is truthy in Python — so if the config value arrives as the string `"false"`, the flag will be incorrectly treated as enabled and all schemas will always be fetched.
```suggestion
include_all_schemas: bool = config.value(converter=config.str_to_bool, default=False)
```
### Issue 2 of 3
products/data_warehouse/frontend/shared/components/forms/SourceForm.tsx:286-289
The `||` fallback is incorrect for a boolean switch. When the user explicitly turns the toggle **off** (`value === false`), the expression `value || lastValue?.[field.name]` is falsy and falls through to `lastValue`, which may be `true` — causing the UI to show the toggle as on even though the user just switched it off. Use `??` (nullish coalescing) so that only `undefined` / `null` trigger the fallback, not a legitimate `false`.
```suggestion
<LemonSwitch
checked={value ?? lastValue?.[field.name] ?? false}
onChange={(checked) => onChange(checked)}
/>
```
### Issue 3 of 3
posthog/schema.py:4624-4633
`posthog/schema.py` is autogenerated by `pnpm run schema:build` and must not be modified by hand. Changes made directly here will be overwritten the next time the schema is regenerated. The `SourceFieldSwitchConfig` class should instead be defined in the source schema (e.g. `frontend/src/queries/schema/`) and regenerated via the build command.
Reviews (1): Last reviewed commit: "feat(data-warehouse): add multi-schema s..." | Re-trigger Greptile |
| class SourceFieldSwitchConfig(BaseModel): | ||
| model_config = ConfigDict( | ||
| extra="forbid", | ||
| ) | ||
| label: str | ||
| name: str | ||
| type: Literal["switch"] = "switch" | ||
| caption: str | None = None | ||
|
|
||
|
|
There was a problem hiding this comment.
posthog/schema.py is autogenerated by pnpm run schema:build and must not be modified by hand. Changes made directly here will be overwritten the next time the schema is regenerated. The SourceFieldSwitchConfig class should instead be defined in the source schema (e.g. frontend/src/queries/schema/) and regenerated via the build command.
Rule Used: Do not manually modify the posthog/schema.py fil... (source)
Learned From
PostHog/posthog#32620
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/schema.py
Line: 4624-4633
Comment:
`posthog/schema.py` is autogenerated by `pnpm run schema:build` and must not be modified by hand. Changes made directly here will be overwritten the next time the schema is regenerated. The `SourceFieldSwitchConfig` class should instead be defined in the source schema (e.g. `frontend/src/queries/schema/`) and regenerated via the build command.
**Rule Used:** Do not manually modify the `posthog/schema.py` fil... ([source](https://app.greptile.com/review/custom-context?memory=192de143-59d7-412f-bd71-fe5cb8f161dd))
**Learned From**
[PostHog/posthog#32620](https://github.com/PostHog/posthog/pull/32620)
How can I resolve this? If you propose a fix, please make it concise.- Use config.str_to_bool instead of raw bool converter - Use nullish coalescing (??) instead of || for boolean fallback - Add SourceFieldSwitchConfig to TypeScript schema
|
- Use source_schema when include_all_schemas is enabled in source_for_pipeline - Remove accidentally committed draft files (pr-body.md, issue-comment.md)
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
posthog/temporal/data_imports/sources/postgres/source.py:519
**`schema=None` passed to `postgres_source()` when metadata is incomplete**
When `include_all_schemas=True` but `source_schema` is `None` (e.g., an existing `ExternalDataSchema` that was created before this feature was enabled, on a source where a user later toggles the flag on), `schema=source_schema` evaluates to `None`. `postgres_source()` is typed `schema: str` and passes the value directly into `_get_table(cursor, schema, ...)`, which builds SQL via `sql.Identifier(schema)` — passing `None` there will raise a runtime error.
Unlike the `include_all_schemas=False` branch, there is no `or "public"` fallback, so tables with missing `source_schema` metadata silently crash instead of degrading gracefully. Consider `(source_schema or "public") if config.include_all_schemas else (config.schema or source_schema or "public")` as a minimal guard.
Reviews (3): Last reviewed commit: "fix: address remaining bot review commen..." | Re-trigger Greptile |
When include_all_schemas=True but source_schema is None (legacy metadata), fall back to 'public' to avoid passing None to postgres_source().
|
Reviews (4): Last reviewed commit: "fix: add fallback to public when source_..." | Re-trigger Greptile |
| label={field.label} | ||
| help={field.caption ? <LemonMarkdown className="text-xs">{field.caption}</LemonMarkdown> : undefined} | ||
| > | ||
| {({ value, onChange }) => ( | ||
| <LemonSwitch | ||
| checked={value ?? lastValue?.[field.name] ?? false} |
| # When include_all_schemas is enabled, use source_schema from metadata (per-table). | ||
| # Fall back to "public" if source_schema is missing (legacy metadata). | ||
| # Otherwise config.schema wins so warehouse-mode renames flow through without | ||
| # rewriting schema_metadata. Falls back to source_schema for direct mode. | ||
| schema=(source_schema or "public") if config.include_all_schemas else (config.schema or source_schema or "public"), |
| @@ -415,8 +423,8 @@ def validate_credentials_for_access_method( | |||
| ) -> tuple[bool, str | None]: | |||
| if access_method != "direct": | |||
| schema = config.schema.strip() if isinstance(config.schema, str) else "" | |||
| port: int = config.value(converter=int) | ||
| connection_string: str | None = None | ||
| schema: str | None = None | ||
| include_all_schemas: bool = config.value(converter=config.str_to_bool, default=False) |
| SourceFieldSwitchConfig( | ||
| name="include_all_schemas", | ||
| label="Include all schemas", | ||
| caption="Enable to discover and sync tables from all non-system schemas in the database", | ||
| ), |
Summary
Adds support for discovering and syncing tables from multiple schemas in PostgreSQL databases.
Changes
Backend (
posthog/temporal/data_imports/sources/):include_all_schemas: boolfield toPostgresSourceConfigSourceFieldSwitchConfigschema type for toggle UIget_schemas()to fetch all non-system schemas when flag is enabledFrontend (
products/data_warehouse/frontend/):SourceForm.tsxto render the toggleHow it works
When users enable the "Include all schemas" toggle in the PostgreSQL source configuration, the backend passes
Noneto the schema parameter, which fetches all non-system schemas (existing behavior). When disabled, it works as before with a single schema.Testing
To test:
Closes #58643