You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/getting-started/cli.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -69,7 +69,7 @@ The `chunk` command is where the real magic happens! It's your versatile tool fo
69
69
|`--max-tokens`| Maximum number of tokens per chunk. Applies to all chunking strategies. (Must be >= 12) | None |
70
70
|`--max-sentences`| Maximum number of sentences per chunk. Applies to DocumentChunker. (Must be >= 1) | None |
71
71
|`--max-section-breaks`| Maximum number of section breaks per chunk. Section breaks include Markdown headings (# to ######), horizontal rules (---, ***, ___), and <details> tags. Applies to DocumentChunker. (Must be >= 1) | None |
72
-
|`--overlap-percent`| Percentage of overlap between chunks (0-85). Applies to DocumentChunker. | 20.0 |
72
+
|`--overlap-percent`| Percentage of overlap between chunks (0-75). Applies to DocumentChunker. | 20.0 |
73
73
|`--offset`| Starting sentence offset for chunking. Applies to DocumentChunker. | 0 |
74
74
|`--lang`| Language of the text (e.g., 'en', 'fr', 'auto'). | auto |
75
75
|`--metadata`| Include rich metadata (source, span, chunk num, etc.) in the output. If `--destination` is a directory, metadata is saved as separate `.json` files; otherwise, it's included inline in the output. | False |
Copy file name to clipboardExpand all lines: docs/getting-started/programmatic/document_chunker.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -411,7 +411,7 @@ for i, chunk in enumerate(chunks):
411
411
!!! note "Special Handling for Streaming Processors"
412
412
Some processors work differently due to their streaming nature - they yield content page by page or in blocks rather than all at once. This means they require special care:
413
413
414
-
**Streaming processors** (PDF, EPUB, DOCX, ODT): These beauties process content as they go, so they're designed for `chunk_files` method. Using them with `chunk_file` will throw a [`FileProcessingError`](../../exceptions-and-warnings.md#fileprocessingerror) since `chunk_file` expects all content upfront.
414
+
**Streaming processors** (PDF, EPUB, DOCX, ODT): These beauties process content as they go, so they're designed for `chunk_files` method. Using them with `chunk_file` will throw an [`UnsupportedFileTypeError`](../../exceptions-and-warnings.md#unsupportedfiletypeerror) since `chunk_file` expects all content upfront.
415
415
416
416
**Regular processors** work fine with both `chunk_file` and `chunk_files` methods.
0 commit comments