-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new notebook for hugging face integration #301
Conversation
Found 1 changed notebook. Review the changes at https://gitnotebooks.com/elastic/elasticsearch-labs/pull/301 |
# Conflicts: # notebooks/integrations/hugging-face/huggingface-integration-millions-of-documents-with-cohere-reranking.ipynb
8621441
to
54dc233
Compare
@@ -0,0 +1,810 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented on notebook notebooks/integrations/hugging-face/huggingface-integration-millions-of-documents-with-cohere-reranking.ipynb
Cell 24 Line 9
index="hf-semantic-text-index",
mappings={
"properties": {
"infer_field": {
"type": "semantic_text",
"inference_id": "my_hf_endpoint_object",
},
"text_field": {"type": "text", "copy_to": "infer_field"},
curious - why have the additional text_field that used to copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats required for semantic text. The strings go into the text_field, and the embeddings get stored in the infer_field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just send it directly to infer_field instead. semantic_text doesn't require a copy_to field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I guess so. But if we need access to the strings and the embeddings, don't we need the text_field? This copy_to field is used all over our semantic_text notebooks from what I've seen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you access that all through semantic field. Behind scenes, semantic field stores the full text into a keyword field and does the chunking into another data structure.
The only time you would need two fields (text and semantic) is if you were doing hybrid search, as the text is stored in a keyword field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, we should probably update our notebooks then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its an interesting take because i agree, the examples with copy_to is confusing. We built semantic to operate like a normal text field and match query clause, just does semantic search instead of lexical.
Ready for final review