Pytorch implementation of the paper CoSwin: Convolution Enhanced Hierarchical Shifted Window Attention For Small-Scale Vision
- We have released the source code and model weights.
-
Clone the CoSwin github repository:
git clone https://github.com/puskal-khadka/coswin cd coswin -
Install required dependencies
pip install -r requirements.txt
-
To train the CoSwin model on different benchmark datasets:
python train.py --model coswin --dataset <dataset-name>
<dataset-name> is the name of dataset such as TINY-IMAGENET
-
To perform Grad-CAM visualization:
Make sure to place the images you want to visualize in a separate directory.
python gradcam.py --model coswin --dataset TINY-IMAGENET --model_path /path/to/your/model
| CoSwin | Dataset | Training Size | Resolution | Accuracy@1 | Model Weight |
|---|---|---|---|---|---|
| CIFAR-10 | 50,000 | 32x32 | 96.63 | ckpt | |
| CIFAR-100 | 50,000 | 32x32 | 81.64 | ckpt | |
| MNIST | 60,000 | 28x28 | 99.60 | ckpt | |
| SVHN | 73,257 | 32x32 | 98.07 | ckpt | |
| TINY-IMAGENET | 100,000 | 64x64 | 65.06 | ckpt |
CoSwin demonstrates strong performance on small dataset challenges compared to existing state-of-the-art models.
@article{khadka2025coswin,
title={CoSwin: Convolution Enhanced Hierarchical Shifted Window Attention For Small-Scale Vision},
author={Puskal Khadka and Rodrigue Rizk and Longwei Wang and KC Santosh},
journal={arXiv preprint arXiv:2509.08959},
year={2025}
}This repo is based on Timm, Swin Transformer. Thanks for their amazing work.
