-
Notifications
You must be signed in to change notification settings - Fork 20k
mtmd: Add DeepSeekOCR Support #17400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+1,569
−27
Merged
Changes from 142 commits
Commits
Show all changes
143 commits
Select commit
Hold shift + click to select a range
43a130b
mtmd: llama.cpp DeepSeekOCR support
sfallah b6b9f02
loading sam tensors
sfallah 85c7cda
mtmd: fix vision model processing
bluebread 578c8d7
Merge pull request #1 from bluebread/sf/deepseek-ocr
sfallah 2aab52e
deepseek-ocr clip-vit model impl
sfallah eab28ed
mtmd: add DeepSeek-OCR LM support with standard attention
bluebread 7630587
mtmd: successfully runs DeepSeek-OCR LM in llama-cli
bluebread 2de3436
mtmd: Fix RoPE type for DeepSeek-OCR LM.
bluebread e8b2610
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 97e0907
loading LM
sfallah 13dc6fb
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr
sfallah b32bb5e
Merge pull request #2 from bluebread/sf/deepseek-ocr
sfallah 790bbb9
sam warmup working
sfallah cec9a5c
sam erroneous return corrected
sfallah 8b3d319
clip-vit: corrected cls_embd concat
sfallah 1e08157
clip-vit: model convert qkv_proj split
sfallah 331cea8
corrected combining of image encoders' results
sfallah 6c0715b
fix: update callback for ffn_moe_weighted and add callback for attn_o…
bluebread a65ddf5
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 63a042f
concat image_newline and image_seperator tokens
sfallah 89afda8
visual_model warmup (technically) works
sfallah 88032f4
window partitioning using standard ggml ops
sfallah 1268dc3
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 68b206b
sam implementation without using CPU only ops
sfallah 8bce66d
clip: fixed warnings
bluebread 5e6cf3c
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 7e9fbec
mtmd: fix get_rel_pos
bluebread 0f5587d
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 7b8d735
mtmd: fixed the wrong scaler for get_rel_pos
bluebread 86f111f
image encoding technically works but the output can't be checked sing…
sfallah effe669
mtmd: minor changed
bluebread f8f66a1
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 3fcfc3a
Merge pull request #3 from bluebread/sf/deepseek-ocr
sfallah ee8a148
mtmd: add native resolution support
bluebread 4cfa15f
- image encoding debugged
sfallah 3f71188
mtmd: correct token order
bluebread a594990
Merge pull request #5 from bluebread/dsocr-debug
sfallah 6dfda99
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr
sfallah 7941f5d
Merge pull request #4 from bluebread/sf/deepseek-ocr
sfallah 206f8ab
- dynamic resizing
sfallah 40e7e6e
mtmd: quick fix token order
bluebread 81533e4
mtmd: fix danling pointer
bluebread 8810940
Merge pull request #6 from bluebread/sf/deepseek-ocr
sfallah a488b49
mtmd: SAM numerically works
bluebread ccb2f23
mtmd: debug CLIP-L (vit_pre_ln)
bluebread 841a4a8
mtmd: debug CLIP-L & first working DeepSeek-OCR model
bluebread ed3b7f1
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 5543094
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread c5f4c64
mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution cont…
bluebread 95239f9
mtmd: simplify SAM patch embedding
bluebread 6b0e7cd
Merge pull request #7 from bluebread/sf/deepseek-ocr
sfallah 6634166
Merge branch 'master' into sf/deepseek-ocr
sfallah c914e05
mtmd: adapt Pillow image resizing function
bluebread e20857b
mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing
bluebread 43dfc0c
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread b696c54
mtmd: remove --dsocr-mode argument
bluebread b26b507
mtmd: refactor code & remove unused helper functions
bluebread 7451b84
mtmd: fix tensor names for image newlines and view separator
bluebread 386ba47
clean up
sfallah c73748a
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr-cleanup
sfallah a661c52
reverting automatically removed spaces
sfallah 0399ddf
reverting automatically removed spaces
sfallah c89171c
mtmd: fixed bad ocr check in Deepseek2 (LM)
bluebread 2dd9924
Merge branch 'sf/deepseek-ocr-cleanup' of github.com:sfallah/llama.cp…
bluebread fc3f625
mtmd: support combined QKV projection in buid_vit
bluebread 4d7d994
Merge pull request #8 from sfallah/sf/deepseek-ocr-cleanup
sfallah 5381b9c
using common build_attn in sam
sfallah 076138a
corrected code-branch when flash-attn disabled
sfallah d0c08e3
mtmd: minor fix
bluebread f5bd310
minor formatting and style
sfallah 6687b4e
Merge pull request #9 from sfallah/sf/deepseek-ocr-attn
sfallah 5f2ee1a
Merge branch 'ggml-org:master' into sf/deepseek-ocr
sfallah 1c88647
fixed flake8 lint issues
sfallah d981f19
minor editorconfig-check fixes
sfallah 705394c
minor editorconfig-check fixes
sfallah 15f2ada
mtmd: simplify get_rel_pos
bluebread 2d918b3
mtmd: make sam hparams configurable
bluebread 5dfcc5a
mtmd: add detailed comments for resize_bicubic_pillow
bluebread 53273f8
mtmd: fixed wrong input setting
bluebread 48c6cf2
mtmd: convert model in FP16
bluebread 5174a1e
mtmd: minor fix
bluebread 0161406
mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template
bluebread ed944cd
fix: test-1.jpg ORC issue with small (640) resolution
sfallah aaf2fd1
minor: editconfig-check fix
sfallah 33fabf0
Merge branch 'master' into sf/deepseek-ocr-merge-test
sfallah d70f171
merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909
sfallah 4cbbe8a
minor: editconfig-check fix
sfallah 47f0fee
testing deepseek-ocr
sfallah e0e69fd
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr-me…
sfallah f95a6fe
quick and (potential) dirty merge with https://github.com/ggml-org/ll…
sfallah f7736f2
refactoring, one single builder function and static helpers
sfallah fb3bb6a
added deepseek-ocr test to tests.sh
sfallah 1b38ccf
Merge pull request #11 from sfallah/sf/deepseek-ocr-merge_#17965
sfallah 6c36c03
minor formatting fixes
sfallah dc2066e
check with fixed expected resutls
sfallah 3fc61d4
Merge pull request #10 from sfallah/sf/deepseek-ocr-test-script
sfallah 7f8621c
minor formatting
sfallah b3bf8cb
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 8ad98ee
editorconfig-check fix
sfallah 4a4f829
Merge branch 'ggml-org:master' into sf/deepseek-ocr
sfallah 51c3de6
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 512b2c8
merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042
sfallah 00d2357
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 87e4a00
minor
sfallah f629d02
convert: minor fix
bluebread 5a741fd
mtmd: format code
bluebread 616f009
convert: quick fix
bluebread e5d426b
convert: quick fix
bluebread c739cf2
minor python formatting
sfallah 9a05e1d
Merge branch 'master' into sf/deepseek-ocr
sfallah 4d91711
fixed merge build issue
sfallah ded9207
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah a94c241
merge resolved
sfallah 6978c37
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 05789f5
minor fix
sfallah 7e47aa8
Merge branch 'ggml-org:master' into sf/deepseek-ocr
sfallah 7ffa23c
Merge branch 'ggml-org:master' into sf/deepseek-ocr
sfallah f41d323
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 9b1a1b9
minor
sfallah 52fcb13
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 0031b41
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 5f2283b
Update convert_hf_to_gguf.py
sfallah 7856e24
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 50c1e15
- removed clip_is_deepseekocr
sfallah 3e221cf
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah e037b95
- cleaning commented out code
sfallah 0b61c6a
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 7a53e7e
fixing instabilities issues reintroducing resize_bicubic_pillow
sfallah c2e6701
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 49f3ca5
- use f16 model for deepseek-ocr test
sfallah 21243f3
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah a493dc1
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 754061e
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 7725399
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 3754c32
rename fc_w --> mm_fc_w
ngxson d88b88e
Merge branch 'master' into sf/deepseek-ocr
ngxson 0ea5fa4
add links to OCR discussion
ngxson edf020d
cleaner loading code
ngxson 8099869
add missing .weight to some tensors
ngxson 1d90094
add default jinja template (to be used by server)
ngxson 6faf264
move test model to ggml-org
ngxson 8dabfe3
rolling back upscale change
sfallah 95cc566
Update convert_hf_to_gguf.py
ngxson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.