A simple real-time webcam demo using OpenAI's CLIP model for zero-shot classification.
🖼️ Captures live frames from your webcam
🧾 You enter a list of class names (e.g. "cat", "mug", "face")
🤖 It shows the predicted class probabilities on top of each frame
- Python 3.8+
torchtransformersPillowopencv-python
Install dependencies with:
pip install -r requirements.txtpython webcam_clip.py --classes "cat" "mug" "person"Press q to quit the webcam window.
- Loads a pretrained CLIP model (ViT-B/32).
- Encodes input class names to CLIP text features.
- Captures webcam frames, encodes them to image features.
- Computes similarity between image and text features.
- Displays the frame with class probabilities on-screen.
python webcam_clip.py --classes "banana" "keyboard" "mouse"You’ll see the probabilities for each class live on-screen.
Created for teaching purposes by Mario Koddenbrock
