Skip to content

oliatsm/cnn-cifar10

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU-Accelerated Convolutional Neural Network in C

A C-based Convolutional Neural Network (CNN) implementation to classify the CIFAR-10 dataset.

This project was developed as part of my undergraduate thesis in Electrical and Computer Engineering at the University of Patras.
The implementation includes a serial version and a GPU-accelerated version using OpenACC. The goal was to evaluate inference performance and explore parallelization strategies on GPUs. Multiple code versions are provided to compare performance, including an optimized version using zero-padding before convolution.


Thesis Information

Title: "Implementation of Convolutional Neural Networks Using GPUs"
(Greek: Υλοποίηση συνελικτικών νευρωνικών δικτύων σε κάρτες γραφικών)
University of Patras Institutional Repository link: https://hdl.handle.net/10889/28866


Project Structure

There are three versions of the code, each in its own directory:

  • serial/ – Basic serial implementation of CNN
  • openacc/ – GPU-accelerated version using OpenACC
  • padding/ – Optimized version with zero-padding to simplify convolution
  • weights/ - Contains the pre-trained weights of the network

CNN Architecture

The network follows a classic convolution–pooling–fully connected design:

CNN Architecture

  • Input: 32×32×3 RGB images
  • Layers: Three Conv + ReLU blocks with zero-padding followed by 2×2 Max-Pooling
  • Classifier: One fully connected layer with 10 outputs followed by a softmax

Requirements

Compiler

This project uses the nvc compiler from the NVIDIA HPC SDK, which requires an NVIDIA GPU and the appropriate drivers. If you prefer to run the serial code with a standard compiler, change the Makefile in the target directory:

CC = gcc
CFLAGS = -std=c11 -Wall -Wextra -march=native

Dataset

To run the code you need the CIFAR-10 dataset in binary format. Download and extract:

https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz

After extraction, set the dataset path in main.c using the DATA_FOLDER constant, for example:

const char* DATA_FOLDER = "../../cifar-10-batches-bin";

Make sure the path points to the folder that contains the binary files (data_batch_1.bin, data_batch_2.bin, etc.). By default the code uses all 50,000 training images. You can change the number of images in main.c:

#define NUM_IMAGES 50000

Compile and Run

Each version directory contains its own Makefile. Example usage:

cd openacc
make
./cnn-cifar10

Debugging

Compile with -g for debugging builds:

make debug
./cnn-cifar10-debug

Profiling

Compile with -pg for profiling support:

make profile
./cnn-cifar10-profile

Clean Up

To remove compiled artifacts:

make clean

Results and Speedup

For 50,000 images the CNN achieves 78.84% accuracy. The program prints execution times for major stages (data loading, network initialization, inference).

The following table summarizes the execution times (in seconds) of the different code versions on the CIFAR‑10 dataset (50,000 images). Measurements were taken on a machine with AMD Ryzen 7 5800X, NVIDIA GeForce RTX 4070 Ti 12 GB, running Ubuntu 20.04.5 LTS and NVIDIA HPC SDK 22.11.

Time (seconds) Serial Serial (padding) Parallel Parallel (padding)
Net Forward 751.8657 559.2341 5.6889 5.9116
Conv 742.4356 549.7660 3.1210 3.3586
ReLU 4.3622 4.3845 0.8783 0.9097
Pool 4.6074 4.6209 1.1799 1.2124
FC 0.4396 0.4416 0.1295 0.1292
Softmax 0.0083 0.0085 0.0054 0.0053
Total time 752.4512 559.8212 6.3366 6.5568
Speedup (×) 1.34 118.75 114.76

Example raw output:

$ ./cnn-cifar10 
Serial Code
CNN for 50000 images
Loading input batch 1...
Loading input batch 2...
Loading input batch 3...
Loading input batch 4...
Loading input batch 5...
Load Data time:0.518275 seconds
Create Network time:0.000006 seconds
Load Network Parameters time:0.003319 seconds
Create Ouputs time:0.000235 seconds

Net Forward total time:724.833147 seconds
    Time for conv1: 235.903980 seconds
    Time for relu1: 3.219701 seconds
    Time for pool1: 3.214649 seconds
    Time for conv2: 371.571337 seconds
    Time for relu2: 0.921022 seconds
    Time for pool2: 0.946631 seconds
    Time for conv3: 108.097672 seconds
    Time for relu3: 0.246505 seconds
    Time for pool3: 0.248356 seconds
    Time for fc: 0.442517 seconds
    Time for softmax: 0.008063 seconds

  Conv: 715.572989 seconds
  ReLU: 4.387228 seconds
  Pool: 4.409636 seconds
  FC:   0.442517 seconds
  Softmax: 0.008063 seconds

Net Accuracy: 78.84 % 
Net Accuracy time:0.001737 seconds
Free memory time:0.028954 seconds
Total time:725.385673 seconds
END!

About

CNN to classify CIFAR-10. GPU optimised.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published