EfficientViT: Memory Efficient Vision Transformer 🧠

📖 Overview

This project is a custom implementation of the EfficientViT architecture, designed to perform high-accuracy image classification on resource-constrained devices. Unlike traditional Vision Transformers (ViTs) that suffer from quadratic complexity, this model utilizes Cascaded Group Attention (CGA) to maintain computational efficiency without sacrificing spatial resolution.

⚙️ Tech Stack

Framework: PyTorch 2.x
Dataset: CIFAR-100
Tools: Google Colab, thop (for FLOPs counting), Matplotlib.

🚀 Key Features Implemented

Patch Embedding Layer: Custom 2D convolutional projection for tokenization.
Cascaded Group Attention (CGA): Hierarchical attention mechanism to reduce FLOPs.
Efficiency Metrics: Achieved ~1.5 GFLOPs per forward pass with only ~21M parameters.

📊 Results

The model was trained from scratch on CIFAR-100 and evaluated using ROC curves and Attention Maps.

Macro-Average AUC: 0.85 (demonstrating strong class discrimination).
Inference Speed: ~1651 images/sec on CPU.

🔧 How to Run

Clone the repository, install the folders and run the project code.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
doc		doc
presentation		presentation
sources		sources
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EfficientViT: Memory Efficient Vision Transformer 🧠

📖 Overview

⚙️ Tech Stack

🚀 Key Features Implemented

📊 Results

🔧 How to Run

About

Uh oh!

Releases

Packages

Languages

uj-sxn/EfficientViT-PyTorch-Implementation

Folders and files

Latest commit

History

Repository files navigation

EfficientViT: Memory Efficient Vision Transformer 🧠

📖 Overview

⚙️ Tech Stack

🚀 Key Features Implemented

📊 Results

🔧 How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages