Hi,
I am encountering an issue (maybe similar as bug 907) while running a python marker-pdf in pipx with Ollama as the LLM service. Maybe pdf-marker is not processing the answer properly fro Ollama. Below are the details of the problem :
Here is marker_single call :
marker_single --disable_image_extraction --paginate_output \
--output_format markdown \
--output_dir "$(pwd)" \
--use_llm \
--llm_service "marker.services.ollama.OllamaService" \
--OllamaService_ollama_model "qwen3-vl:latest" \
--OllamaService_ollama_base_url "http://0.0.0.0:11434" \
"${1}"
Here is the marker_single output
Recognizing Layout: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.34s/it]
Running OCR Error Detection: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 30.50it/s]
Detecting bboxes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00, 6.27s/it]
Recognizing Text: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 27/27 [13:57<00:00, 31.01s/it]
Detecting bboxes: 0it [00:00, ?it/s]
LLM processors running: 0it [00:00, ?it/s]
Running LLMSectionHeaderProcessor: 0%| | 0/1 [00:00<?, ?it/s]2026-02-22 07:25:17,262 [WARNING] marker: Ollama inference failed: Expecting value: line 1 column 1 (char 0)
2026-02-22 07:25:17,262 [WARNING] marker: LLM did not return a valid response
Running LLMSectionHeaderProcessor: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [05:28<00:00, 328.05s/it]
Here is Ollama output
time=2026-02-22T07:19:49.544-05:00 level=INFO source=server.go:245 msg="enabling flash attention"
time=2026-02-22T07:19:49.546-05:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --model /home/soundwave/.ollama/models/blobs/sha256-ed12a4674d727a74ac4816c906094ea9d3119fbea46ca93288c3ce4ffbe38c55 --port 45293"
time=2026-02-22T07:19:49.546-05:00 level=INFO source=sched.go:452 msg="system memory" total="62.5 GiB" free="13.6 GiB" free_swap="58.4 GiB"
time=2026-02-22T07:19:49.546-05:00 level=INFO source=server.go:755 msg="loading model" "model layers"=37 requested=-1
time=2026-02-22T07:19:49.580-05:00 level=INFO source=runner.go:1405 msg="starting ollama engine"
time=2026-02-22T07:19:49.580-05:00 level=INFO source=runner.go:1440 msg="Server listening on 127.0.0.1:45293"
time=2026-02-22T07:19:49.590-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType:q8_0 NumThreads:2 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-22T07:19:49.615-05:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=Q4_K_M name="" description="" num_tensors=858 num_key_values=40
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-alderlake.so
time=2026-02-22T07:19:49.642-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2026-02-22T07:19:50.036-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType:q8_0 NumThreads:2 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=runner.go:1278 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType:q8_0 NumThreads:2 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="5.7 GiB"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="306.0 MiB"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="427.4 MiB"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=device.go:272 msg="total memory" size="6.4 GiB"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=sched.go:526 msg="loaded runners" count=1
time=2026-02-22T07:19:50.602-05:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=ggml.go:494 msg="offloaded 0/37 layers to GPU"
time=2026-02-22T07:19:50.602-05:00 level=INFO source=server.go:1347 msg="waiting for llama runner to start responding"
time=2026-02-22T07:19:50.603-05:00 level=INFO source=server.go:1381 msg="waiting for server to become available" status="llm server loading model"
time=2026-02-22T07:20:00.644-05:00 level=INFO source=server.go:1385 msg="llama runner started in 11.10 seconds"
[GIN] 2026/02/22 - 07:25:17 | 200 | 5m28s | 127.0.0.1 | POST "/api/generate"
⚙️ Environment
- marker-pdf 1.10.2 ;
- Python 3.10.5 ;
- torch 2.10.0 ;
- Ubuntu 22.04 ;
- Linux 6.8.0-100-generic, x86_64
Hi,
I am encountering an issue (maybe similar as bug 907) while running a python marker-pdf in pipx with Ollama as the LLM service. Maybe pdf-marker is not processing the answer properly fro Ollama. Below are the details of the problem :
Here is marker_single call :
Here is the marker_single output
Here is Ollama output
⚙️ Environment