[draft] feat: add embeddings field for hybrid search#78
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Code Review
This pull request introduces support for dense-vector embeddings across the airesponses, articles, and replies schemas to facilitate hybrid search. The changes include updates to Zod schemas and Elasticsearch mappings, along with version increments. Feedback suggests extracting the duplicated embeddings schema into a shared utility and strengthening Zod validation—specifically enforcing a vector length of 768 and ensuring offsets are non-negative integers—to maintain consistency with the Elasticsearch mapping requirements.
| embeddings: z | ||
| .array( | ||
| z | ||
| .object({ | ||
| vector: z.array(z.number()), | ||
| startOffsetSec: z.number().optional(), | ||
| endOffsetSec: z.number().optional(), | ||
| }) | ||
| .strict() | ||
| ) | ||
| .optional(), |
There was a problem hiding this comment.
The embeddings schema is duplicated across airesponses.ts, articles.ts, and replies.ts. To improve maintainability and ensure consistency, consider extracting this definition into a shared schema in util/sharedSchema.ts. Additionally, the validation should be strengthened to match the Elasticsearch mapping requirements: 1. Use .length(768) on the vector array to prevent indexing errors (as the dense_vector mapping specifies dims: 768). 2. Use .int().nonnegative() for startOffsetSec and endOffsetSec to align with the integer type in Elasticsearch.
| embeddings: z | |
| .array( | |
| z | |
| .object({ | |
| vector: z.array(z.number()), | |
| startOffsetSec: z.number().optional(), | |
| endOffsetSec: z.number().optional(), | |
| }) | |
| .strict() | |
| ) | |
| .optional(), | |
| embeddings: z | |
| .array( | |
| z | |
| .object({ | |
| vector: z.array(z.number()).length(768), | |
| startOffsetSec: z.number().int().nonnegative().optional(), | |
| endOffsetSec: z.number().int().nonnegative().optional(), | |
| }) | |
| .strict() | |
| ) | |
| .optional(), |
| embeddings: z | ||
| .array( | ||
| z | ||
| .object({ | ||
| vector: z.array(z.number()), | ||
| startOffsetSec: z.number().optional(), | ||
| endOffsetSec: z.number().optional(), | ||
| }) | ||
| .strict() | ||
| ) | ||
| .optional(), |
There was a problem hiding this comment.
This embeddings schema is identical to the one in airesponses.ts and replies.ts. It is recommended to extract it to a shared utility. Also, adding length validation for the vector (768) and integer validation for the offsets will ensure the data conforms to the Elasticsearch mapping.
| embeddings: z | |
| .array( | |
| z | |
| .object({ | |
| vector: z.array(z.number()), | |
| startOffsetSec: z.number().optional(), | |
| endOffsetSec: z.number().optional(), | |
| }) | |
| .strict() | |
| ) | |
| .optional(), | |
| embeddings: z | |
| .array( | |
| z | |
| .object({ | |
| vector: z.array(z.number()).length(768), | |
| startOffsetSec: z.number().int().nonnegative().optional(), | |
| endOffsetSec: z.number().int().nonnegative().optional(), | |
| }) | |
| .strict() | |
| ) | |
| .optional(), |
| embeddings: z | ||
| .array( | ||
| z | ||
| .object({ | ||
| vector: z.array(z.number()), | ||
| startOffsetSec: z.number().optional(), | ||
| endOffsetSec: z.number().optional(), | ||
| }) | ||
| .strict() | ||
| ) | ||
| .optional(), |
There was a problem hiding this comment.
This embeddings schema is identical to the one in airesponses.ts and articles.ts. It is recommended to extract it to a shared utility. Also, adding length validation for the vector (768) and integer validation for the offsets will ensure the data conforms to the Elasticsearch mapping.
| embeddings: z | |
| .array( | |
| z | |
| .object({ | |
| vector: z.array(z.number()), | |
| startOffsetSec: z.number().optional(), | |
| endOffsetSec: z.number().optional(), | |
| }) | |
| .strict() | |
| ) | |
| .optional(), | |
| embeddings: z | |
| .array( | |
| z | |
| .object({ | |
| vector: z.array(z.number()).length(768), | |
| startOffsetSec: z.number().int().nonnegative().optional(), | |
| endOffsetSec: z.number().int().nonnegative().optional(), | |
| }) | |
| .strict() | |
| ) | |
| .optional(), |
No description provided.