Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new notebook for hugging face integration #301

Merged
merged 9 commits into from
Sep 12, 2024

Conversation

maxhniebergall
Copy link
Contributor

@maxhniebergall maxhniebergall commented Jul 24, 2024

Ready for final review

Copy link

gitnotebooks bot commented Jul 24, 2024

Found 1 changed notebook. Review the changes at https://gitnotebooks.com/elastic/elasticsearch-labs/pull/301

@maxhniebergall maxhniebergall changed the base branch from main to spacetime September 11, 2024 19:45
@maxhniebergall maxhniebergall changed the base branch from spacetime to main September 11, 2024 19:45
maxhniebergall and others added 4 commits September 11, 2024 16:01
# Conflicts:
#	notebooks/integrations/hugging-face/huggingface-integration-millions-of-documents-with-cohere-reranking.ipynb
@@ -0,0 +1,810 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on notebook notebooks/integrations/hugging-face/huggingface-integration-millions-of-documents-with-cohere-reranking.ipynb Cell 24 Line 9

    index="hf-semantic-text-index",
    mappings={
        "properties": {
            "infer_field": {
                "type": "semantic_text",
                "inference_id": "my_hf_endpoint_object",
            },
            "text_field": {"type": "text", "copy_to": "infer_field"},

curious - why have the additional text_field that used to copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats required for semantic text. The strings go into the text_field, and the embeddings get stored in the infer_field

Copy link
Member

@joemcelroy joemcelroy Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just send it directly to infer_field instead. semantic_text doesn't require a copy_to field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I guess so. But if we need access to the strings and the embeddings, don't we need the text_field? This copy_to field is used all over our semantic_text notebooks from what I've seen

Copy link
Member

@joemcelroy joemcelroy Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you access that all through semantic field. Behind scenes, semantic field stores the full text into a keyword field and does the chunking into another data structure.

The only time you would need two fields (text and semantic) is if you were doing hybrid search, as the text is stored in a keyword field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, we should probably update our notebooks then.

Copy link
Member

@joemcelroy joemcelroy Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its an interesting take because i agree, the examples with copy_to is confusing. We built semantic to operate like a normal text field and match query clause, just does semantic search instead of lexical.

@maxhniebergall maxhniebergall merged commit 83da04a into main Sep 12, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants