-
Does it work for scanned PDF documents?
Yes, Vision Parse is specifically designed to handle scanned PDF documents effectively. It uses advanced Vision LLMs to extract text, tables, images, and LaTeX equations from both regular and scanned PDF documents with high precision.
-
I am facing latency issues while running llama3.2-vision locally. How can I improve the performance of locally hosted vision models?
This is a known limitation with locally hosted Ollama models. Here are some solutions:
- Use API-based Models: For better performance, consider using API-based models like OpenAI or Gemini, which are significantly faster and more accurate.
- Enable Concurrency: Set
enable_concurrencytoTrueso that multiple pages are processed in parallel, thereby reducing latency. You can also increase the value ofOLLAMA_NUM_PARALLELto maximize the number of pages that can be processed in parallel. - Disable Detailed Extraction: Disable the
detailed_extractionparameter for simpler PDF documents, which can improve latency.
-
The llama3.2-vision:11b model was hallucinating and unable to extract content accurately from the PDF document. How can I improve the extraction accuracy of locally hosted vision models?
To improve extraction accuracy with the llama3.2-vision:11b model:
- Adjust Model Parameters: Lower the
temperatureandtop_pfor more deterministic outputs and to reduce hallucinations. - Define Custom Prompts: By defining custom prompts according to your document structure, you can guide the model to extract content more accurately.
- Enable Detailed Extraction: Enabling
detailed_extractionwill help the Vision LLM detect the presence of images, LaTeX equations, structured, and semi-structured tables, and then extract them with high accuracy. - Consider Using Alternative Models: Try API-based models like gpt-4o or gemini-1.5-pro for better accuracy and performance. Avoid using smaller models that are prone to hallucination.
- Adjust Model Parameters: Lower the
-
What are the recommended values for model parameters such as temperature, top_p, etc., to improve extraction accuracy?
Here are the recommended values for model parameters to improve extraction accuracy:
- Set
temperatureto 0.7 andtop_pto 0.5. - For Ollama models, increase
num_ctxto 16384 andnum_predictto 8092 (depending on the model size) and setrepeat_penaltyto 1.3. - For OpenAI models, increase
max_tokensto 8192 (depending on the model size) and setfrequency_penaltyto 0.3.
Note: The recommended values are generic and may need to be adjusted based on your document structure and the model's capabilities.
- Set