CUDA_AI_Framework

A CUDA C implementation of a AI Framework. Only support inference with Nvidia GPUs (Desktop/WorkStation/Jetson).

Document

Please refer to the following HackMD post for the detail of the framework.

https://hackmd.io/@Erebustsai/Sku_EMMr2 Notice that this document is for OpenCL version of this framework; however, the interface is the same.

Support / Limitation

Supported Layers

2d Convolution Layer
2d Point-wise Convolution Layer
2d Depth-wise Convolution Layer
2d Max Pooling Layer
Bach Normalization Layer
Fully Connective Layer
Concatenation Operation
Global Average Operation
ReLU
ReLU6
LeakyReLU
Softmax

Helper Functions

Read img as model input
Device info report
IoU (for YOLO)
NMS (for YOLO)
getArgumentReference (This function can be used to list candidate value for thread block dimensions)
getLayerfeature (This can be used to check if the model work as expected on a certain layer)

How to use

Make sure you have CUDA and cuDNN installed.
Change the Makefile for CUDA sample common headers. Mine is in /usr/local/cuda/samples/Common/. Please download CUDA samples on Nvidia's website.
Change the Makefile for your GPU compute capabilities.
Export weight from pytorch with the following code snippet.

if EXPORT_WEIGHT:
    model.eval()
    if not LOAD_MODEL:
        print("----------Warning!----------\n Not Loading ANY MODEL!\nThe output binary file might not be meaningful.")
    
    print("Exporting weight file " + LOAD_PATH)
    LOAD_PATH = LOAD_PATH.replace('pt', 'bin')
    with open(LOAD_PATH, "wb") as file:
        for param_name in model.state_dict():
            if param_name.find("num_batches_tracked") != -1:
                continue
            layer_weight = model.state_dict()[param_name].flatten().numpy()
            for weight in layer_weight:
                file.write(weight)
    print(LOAD_PATH + " Weight file exported")
    exit()

make and run the output binary.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Makefile		Makefile
README.md		README.md
basic.cu		basic.cu
framework.cuh		framework.cuh
layer_add.cu		layer_add.cu
layer_basic.cu		layer_basic.cu
layer_batchnorm.cu		layer_batchnorm.cu
layer_conv2d.cu		layer_conv2d.cu
layer_dense.cu		layer_dense.cu
layer_dwconv2d.cu		layer_dwconv2d.cu
layer_globalavg.cu		layer_globalavg.cu
layer_input.cu		layer_input.cu
layer_leakyrelu.cu		layer_leakyrelu.cu
layer_maxpool2d.cu		layer_maxpool2d.cu
layer_pwconv2d.cu		layer_pwconv2d.cu
layer_relu.cu		layer_relu.cu
layer_relu6.cu		layer_relu6.cu
layer_softmax.cu		layer_softmax.cu
main.cpp		main.cpp
modified_no_dropout_mobilenetv2_weight_20_epochs.bin		modified_no_dropout_mobilenetv2_weight_20_epochs.bin
sequential.cpp		sequential.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA_AI_Framework

Document

Support / Limitation

Supported Layers

Helper Functions

How to use

About

Releases

Packages

Languages

Chen-KaiTsai/CUDA_AI_Framework

Folders and files

Latest commit

History

Repository files navigation

CUDA_AI_Framework

Document

Support / Limitation

Supported Layers

Helper Functions

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages