Skip to content

fix: Implement correct batch handling for Ollama embedding and improve progress updates#13815

Open
danobot wants to merge 3 commits intoinfiniflow:mainfrom
danobot:danobot-patch-1
Open

fix: Implement correct batch handling for Ollama embedding and improve progress updates#13815
danobot wants to merge 3 commits intoinfiniflow:mainfrom
danobot:danobot-patch-1

Conversation

@danobot
Copy link
Copy Markdown

@danobot danobot commented Mar 26, 2026

What problem does this PR solve?

Implements proper batch handling for OllamaEmbed so that EMBEDDING_BATCH_SIZE is properly handled for Ollama and the correct API called with the list of chunks.

Enhances progress reporting when Running Embed pipelines on data sets:
image

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

danobot added 2 commits March 26, 2026 19:23
Refactor encode method to process texts in batches and handle special tokens more efficiently.
Add callback to report total chunks to embed and update progress logging.
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 26, 2026
@danobot danobot changed the title Implement Batch Embedding for OllamaEmbed class and improve embedding progress fix: Implement correct batch handling for Ollama embedding and improve progress updates Mar 26, 2026
@danobot
Copy link
Copy Markdown
Author

danobot commented Mar 26, 2026

@claude What is the purpose on tks_num in OllamaEmbed?

@yingfeng yingfeng added the ci Continue Integration label Mar 26, 2026
@yingfeng yingfeng marked this pull request as draft March 26, 2026 13:00
@yingfeng yingfeng marked this pull request as ready for review March 26, 2026 13:00
@dosubot dosubot bot added the 🐞 bug Something isn't working, pull request that fix bug. label Mar 26, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.72%. Comparing base (e705ac6) to head (68f3971).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #13815   +/-   ##
=======================================
  Coverage   96.72%   96.72%           
=======================================
  Files          10       10           
  Lines         702      702           
  Branches      112      112           
=======================================
  Hits          679      679           
  Misses          5        5           
  Partials       18       18           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants