feat(embed): add query runner #39517

oliverb123 · 2025-10-11T01:36:14Z

First step to accessing the general embeddings table from hogql queries, which unlocks fuzzy search and similarity work across products. There are still details to be worked out:

Exposing product, document_type, document_id, timestamp and distance for user-facing/dynamic use in WHERE and ORDER BY clauses. I think think means a new category of taxonomic property filter, and I simply didn't want to take on that headache right now, so there's a hacky half solution in place now. Callers from inside the django app can of course add whatever filters they like right to the AST prior to joining/calculating.
Add a k8s service to the embedding worker deployment, so talking to it works in prod

Also it lets users use us as a vector DB if we want (we'd need to give them a way to write to it tho):

ok maybe we're cooking

github-actions · 2025-10-11T01:40:23Z

Size Change: +53 B (0%)

Total Size: 3.05 MB

ℹ️ View Unchanged

Filename	Size	Change
`frontend/dist/toolbar.js`	3.05 MB	+53 B (0%)

_{compressed-size-action}

greptile-apps

_{9 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

posthog/hogql_queries/document_embeddings_query_runner.py

daibhin

Left a couple of comments. I'll review the query runner properly on Monday. Might be worth getting the DW folks to look at the HogQL changes

frontend/src/queries/schema/schema-general.ts

rust/cymbal/src/issue_resolution.rs

posthog/hogql/functions/posthog.py

rust/embedding-worker/src/main.rs

posthog/hogql/printer.py

Gilbert09

Just update the table name - the rest of the hogql looks good!

posthog/hogql/database/database.py

posthog/hogql/printer.py

daibhin

Left a few comments, mostly around explaining how the query works given it's relative complexity. The API is pretty intuitive but the query itself is harder to understand. Maybe tests would be the best description

posthog/hogql_queries/document_embeddings_query_runner.py

posthog/hogql/database/database.py

posthog/hogql/database/schema/document_embeddings.py

posthog/hogql/functions/embed_text.py

oliverb123 requested review from a team, ablaszkiewicz, daibhin, hpouillot, ioannisj and marandaneto and removed request for ablaszkiewicz, daibhin, hpouillot, ioannisj and marandaneto October 11, 2025 01:36

oliverb123 requested a review from a team as a code owner October 11, 2025 01:36

posthog-bot requested a review from a team October 11, 2025 01:37

greptile-apps bot reviewed Oct 11, 2025

View reviewed changes

posthog/hogql_queries/document_embeddings_query_runner.py Outdated Show resolved Hide resolved

posthog/hogql_queries/document_embeddings_query_runner.py Outdated Show resolved Hide resolved

daibhin reviewed Oct 11, 2025

View reviewed changes

oliverb123 commented Oct 11, 2025

View reviewed changes

posthog/hogql/printer.py Show resolved Hide resolved

Gilbert09 reviewed Oct 11, 2025

View reviewed changes

posthog/hogql/database/database.py Outdated Show resolved Hide resolved

posthog/hogql/printer.py Show resolved Hide resolved

daibhin reviewed Oct 13, 2025

View reviewed changes

daibhin approved these changes Oct 13, 2025

View reviewed changes

Gilbert09 reviewed Oct 13, 2025

View reviewed changes

posthog/hogql/database/database.py Outdated Show resolved Hide resolved

Gilbert09 reviewed Oct 13, 2025

View reviewed changes

posthog/hogql/database/schema/document_embeddings.py Show resolved Hide resolved

Gilbert09 reviewed Oct 13, 2025

View reviewed changes

posthog/hogql/functions/embed_text.py Outdated Show resolved Hide resolved

Gilbert09 approved these changes Oct 13, 2025

View reviewed changes

oliverb123 and others added 6 commits October 15, 2025 16:04

first pass

7f6cc30

fix worker, fix runner

6db15ad

add ad-hoc endpoint to runner

35bc78a

embed_text() kinda works

a60450c

satisfy clippy

57a99c0

Update query snapshots

96d5677

oliverb123 added 8 commits October 15, 2025 16:05

fixes

b57fd3f

may as well autostart embedding worker

e06b1d7

fixes

a777185

appease linter

93f366d

appease linter

083284f

linter

8505e27

filter the universe

a14346d

drop posthog

44baa96

oliverb123 force-pushed the embed/add-query-runner branch from 9ab5751 to 44baa96 Compare October 15, 2025 13:06

fix embed_text arguments

2fbd888

Gilbert09 approved these changes Oct 15, 2025

View reviewed changes

oliverb123 and others added 5 commits October 16, 2025 01:50

tests and fixes

0d3451d

Update query snapshots

017ce45

appease linter

bf6c7ab

Merge branch 'master' into embed/add-query-runner

73233be

look idk man i said team = Team and mypy said 'no it don't'

f9ad874

oliverb123 merged commit 9471397 into master Oct 16, 2025
254 of 260 checks passed

oliverb123 deleted the embed/add-query-runner branch October 16, 2025 10:20

oliverb123 restored the embed/add-query-runner branch October 16, 2025 11:11

feat(embed): add query runner #39517

feat(embed): add query runner #39517

Uh oh!

Conversation

oliverb123 commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

daibhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gilbert09 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

daibhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oliverb123 commented Oct 11, 2025 •

edited

Loading

github-actions bot commented Oct 11, 2025 •

edited

Loading