Releases: datalab-to/marker
Releases · datalab-to/marker
Minor fixes
What's Changed
- Fix typo in superscript/subscript condition check by @Tenkeboks in #897
- Enable passing arbitrary config by @VikParuchuri in #901
- Dev by @VikParuchuri in #902
New Contributors
- @Tenkeboks made their first contribution in #897
Full Changelog: v1.10.0...v1.10.1
New Layout Model + Misc Updates
Model Update
- Upgrade to a new layout model through surya. Major performance boost
Misc Updates
- README updates
- Added a new flag -
--html_tables_in_markdown. Whenoutput_formatis set to markdown, this will render tables using html tags, instead of the default markdown syntax.
What's Changed
- small copy + readme update by @u-ashish in #879
- Dev by @u-ashish in #880
- Layout release by @tarun-menta in #892
- Dev by @tarun-menta in #893
Full Changelog: v1.9.3...v1.10.0
Enable metadata storage
What's Changed
- Update README by @u-ashish in #872
- Add Modal example for marker deployment by @u-ashish in #850
- Dev by @u-ashish in #873
- Vik/quality by @VikParuchuri in #876
- Dev by @VikParuchuri in #877
New Contributors
Full Changelog: v1.9.2...v1.9.3
v1.9.2
Misc Updates
- Allow LLM processor to loop to improve tables further
- Detect and fix cases where table cells cut text lines to reduce hallucinations
- Update commercial terms
What's Changed
- Table Hotfixes by @tarun-menta in #865
- Vik/table loop by @VikParuchuri in #864
- Updated commercial language by @sandy0kwon in #843
- Add image format to img_to_base64 in BaseService by @EdmondChuiHW in #869
- Dev by @VikParuchuri in #868
New Contributors
- @EdmondChuiHW made their first contribution in #869
Full Changelog: v1.9.1...v1.9.2
Fix Blank Table Cells
What's Changed
- fix: make sure rounded poly == blank if all same coords by @zanussbaum in #857
- fix: blank table cells by @zanussbaum in #861
Full Changelog: v1.9.0...v1.9.1
Marker Block Mode
Moving marker to block mode inference. OCR is done at the block level now, instead of the line level. While this is a bit slower, it boosts accuracy.
What's Changed
- Marker Block Mode by @tarun-menta in #831
- Dev by @tarun-menta in #856
Full Changelog: v1.8.5...v1.9.0
Gemini JSON fix
What's Changed
- update license and README to reflect OpenRAIL license change by @sandy0kwon in #844
- Dev by @VikParuchuri in #848
New Contributors
- @sandy0kwon made their first contribution in #844
Full Changelog: v1.8.4...v1.8.5
Misc fixes
What's Changed
- fix: increase max tokens for equation processor by @zanussbaum in #828
- Add block ids to html renderer by @VikParuchuri in #840
- Fix: retry on invalid JSON from Gemini by @runarmod in #829
- Add disable_ocr_math to table processor (2) by @ArnoKlein in #826
- Fix: show tqdm total iteration count by @runarmod in #798
- Optional block ids by @VikParuchuri in #842
New Contributors
- @ArnoKlein made their first contribution in #826
Full Changelog: v1.8.3...v1.8.4
New OCR model; better OCR heuristics
- New OCR model that is better all-around, but particularly at math
- Improved OCR heuristics, will now prioritize accuracy over speed
- Drop
format_lines, sinceforce_ocris generally a bit more accurate, and less error-prone
What's Changed
- feat: allow option to keep tables split across pages by @zanussbaum in #813
- New OCR Model by @tarun-menta in #820
- Bump surya version by @VikParuchuri in #821
Full Changelog: v1.8.2...v1.8.3
Chunk renderer fixes
Minor bugfixes:
- Add images to chunk renderer output
- Fix batch size on MPS
What's Changed
- Fix: collect image HTML & data from child blocks in the chunk renderer. by @voberoi in #796
- Reduce MPS batch size - Breaking with new model by @tarun-menta in #805
- Dev by @VikParuchuri in #807
New Contributors
Full Changelog: v1.8.1...v1.8.2