Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What units to use for threshold amount? #26171

Open
2 tasks done
mvirag2000 opened this issue Sep 6, 2024 · 1 comment · May be fixed by #26398
Open
2 tasks done

What units to use for threshold amount? #26171

mvirag2000 opened this issue Sep 6, 2024 · 1 comment · May be fixed by #26398
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder

Comments

@mvirag2000
Copy link

URL

https://python.langchain.com/v0.2/docs/how_to/semantic-chunker/

Checklist

  • I added a very descriptive title to this issue.
  • I included a link to the documentation page I am referring to (if applicable).

Issue with current documentation:

It seems that units for threshold-type = "percentage" are out of a hundred, i.e., 85.0 not 0.85, and this is also unclear for the other threshold types, "gradient," and "interquartile."

Idea or request for content:

Also, Semantic Chunker really needs a min and max chunk size. I am getting chunks of a single word, and chunks that exceed the OpenAI limit. Thanks for all the great work on LangChain.

@dosubot dosubot bot added the 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder label Sep 6, 2024
@tibor-reiss
Copy link
Contributor

@mvirag2000 What do you think about the linked PR?
Re your idea/request: I only introduced min_chunk_size, because the max size of chunks can be adjusted by tuning breakpoint_threshould_amount to a reasonable value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants