Late Chunking Implementation #196

wimiam1 · 2026-01-16T21:21:14Z

wimiam1
Jan 16, 2026

I was just wondering exactly how the late chunking method handles documents that are larger than the batch size. Say I have a 16K token document and it gets split in 16 1K token chunks. If I'm using an embedding model with an 8192 token context size and the default 1000 token chunk size and batch size of 8, how does the embedding proceed?
If it simply embeds the first 8 chunks and then pools the token embeddings into the 8 chunk vectors, then the later chunks in that batch will be lacking context from later in the document. Or is there a sliding window happening here where the vectors are only pooled and stored for the center chunks? Maybe including more context from earlier in the document is actually preferred because of how humans tend to write sequentially? I don't know what the best method is, but I'd like to know what the current method is lol

sonam-pankaj95 · 2026-01-28T14:54:30Z

sonam-pankaj95
Jan 28, 2026
Maintainer

You can overlap the chunks in case it increases the chunk size of embedding model.

Ps: if you want full conversation on this refer to our discord. (It's meant for other developers)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Late Chunking Implementation #196

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Late Chunking Implementation #196

Uh oh!

wimiam1 Jan 16, 2026

Replies: 1 comment

Uh oh!

sonam-pankaj95 Jan 28, 2026 Maintainer

wimiam1
Jan 16, 2026

sonam-pankaj95
Jan 28, 2026
Maintainer