Skip to content

PaliGemma is a project created from scratch, based on a YouTube guide, to learn and demonstrate application/library/system creation. The project uses modern development approaches and best practices from the original tutorial.

Notifications You must be signed in to change notification settings

vlvink/PaliGemma-from-scratch

Repository files navigation

PaliGemma: Multimodal Vision Language Model

About the project

This repository contains an implementation of PaliGemma, a multimodal (Vision) language model written from scratch in PyTorch. The code is based on the tutorial video ‘Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation’.

PaliGemma VLM

The main goal of the project is to deepen the understanding of the device of multimodal models and improve programming skills in PyTorch.

How to start

Prerequisites

Before you start, make sure you have the following dependencies installed:

- python = ^3.11
- torch = ^2.5.1
- numpy = ^2.2.1
- pillow = ^11.1.0
- fire = ^0.7.0
- transformers = ^4.48.0
  1. Clone the repository:
git clone https://github.com/vlvink/PaliGemma-from-scratch.git
cd PaliGemma-from-scratch
  1. Install the requirements
poetry install
  1. Setting the poetry environment
poetry shell

Running the Code

./launch_inference.sh

About

PaliGemma is a project created from scratch, based on a YouTube guide, to learn and demonstrate application/library/system creation. The project uses modern development approaches and best practices from the original tutorial.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published