If you find this project helpful, consider buying me a coffee:
A ComfyUI custom node extension that integrates the Janus-Pro-7B vision-language model from DeepSeek AI on your's local computer, enabling powerful image understanding and multi-turn conversation capabilities.
- 🖼️ Advanced Image Analysis: Leverages Janus-Pro-7B's capabilities for detailed image understanding and description
- 💬 Multi-turn Chat: Supports interactive conversations about images with context awareness
- 🔄 Dual Image Support: Can analyze relationships between two images simultaneously
- 🚀 Automatic Model Download: Downloads model files automatically on first use
- ⚙️ Flexible Configuration: Customizable parameters for generation and image processing
- 🎯 ComfyUI Integration: Seamless integration with ComfyUI workflow
- Clone this repository into your ComfyUI custom nodes folder:
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-Janus_pro_vision.git
- Install required dependencies:
pip install requests
pip install tqdm
pip install attrdict
-
The model files will be automatically downloaded on first use from DeepSeek's HuggingFace repository.
-
If automatic model download failes you can download them manualy to
models\Janus-Pro
folder:
git clone https://huggingface.co/deepseek-ai/Janus-Pro-7B
Handles model loading and management.
- Input: None (uses default model path)
- Output: JANUS_MODEL (model object for use in analyzer)
Main analysis node with chat capabilities.
Inputs:
janus_model
: Model object from loader nodeimage_a
: Primary image for analysisimage_b
: (Optional) Secondary image for comparisonprompt
: Text prompt/question about the image(s)chat_mode
: Enable/disable chat functionalityseed
: Random seed for generationtemperature
: Generation temperature (0.0 - 2.0)top_p
: Top-p sampling parameter (0.0 - 1.0)max_tokens
: Maximum generation lengthimage_size
: Target image size for processing (512-2048)frame_size
: Border thickness for image display (1-10)reset_chat
: Clear chat history
Outputs:
response
: Model's response textchat_history
: Formatted chat history (in chat mode)
-
image_size
: Controls the maximum dimension while maintaining aspect ratio (default: 1024)- Range: 512 to 2048 pixels
- Steps: 64 pixels
- Example: If image is 2000x1000px and image_size=1024:
- Width will be scaled to 1024
- Height will be scaled proportionally to 512
-
frame_size
: Border thickness for visual separation (default: 2)- Range: 1 to 10 pixels
- Example values:
- frame_size=1: Thin border
- frame_size=2: Standard border
- frame_size=5: Thick border
- frame_size=10: Very thick border
temperature
: Controls response randomness- 0.1: More focused and deterministic
- 0.7: More creative and varied
top_p
: Nucleus sampling parameter (0.95 recommended)max_tokens
: Maximum length of generated response
This extension uses the Janus-Pro-7B model from DeepSeek AI, which offers:
- Strong image understanding capabilities
- Multi-turn conversation support
- High-quality natural language generation
- Support for image comparison and analysis
- ComfyUI
- Python 3.8+
- PyTorch
- Transformers library
- requests
- tqdm
This project is MIT licensed. The Janus-Pro-7B model has its own license from DeepSeek AI.
- DeepSeek AI for the Janus-Pro-7B model
- ComfyUI community for the framework and support
Contributions are welcome! Please feel free to submit a Pull Request.