Skip to content

Commit c26a1d5

Browse files
authored
Merge pull request #11 from JetBrains-Research/cuda-managed
Cuda Managed Mem (minor fixes)
2 parents a1f32b7 + 164b19e commit c26a1d5

File tree

10 files changed

+69
-20
lines changed

10 files changed

+69
-20
lines changed

Diff for: CHANGELOG.md

-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ Added new vector C API, exposed vector primitive into python-package.
2222
- Vector creation (empty, from data, with random data)
2323
- Matrix-vector operations (matrix-vector and vector-matrix multiplication)
2424
- Vector-vector operations (element-wise addition)
25-
- Matrix operations (equality, reduce to value, extract sub-vector)
2625
- Vector data extraction (as list of indices)
2726
- Vector syntax sugar (pretty string printing, slicing, iterating through non-zero indices)
2827
- Matrix operations (extract row or matrix column as sparse vector, reduce matrix (optionally transposed) to vector)

Diff for: README.md

+27-4
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,8 @@ prototyping algorithms on a local computer for later running on a powerful serve
5151
### Platforms
5252

5353
- Linux based OS (tested on Ubuntu 20.04)
54-
- Windows (not tested yet)
55-
- macOS (not tested yet)
54+
- Windows (coming soon)
55+
- macOS (coming soon)
5656

5757
### Simple example
5858

@@ -74,9 +74,32 @@ b[2, 1] = True
7474
print(a, b, a.mxm(b), sep="\n")
7575
```
7676

77+
### Performance
78+
79+
Sparse Boolean matrix-matrix multiplication evaluation results are listed bellow.
80+
Machine configuration: PC with Ubuntu 20.04, Intel Core i7-6700 3.40GHz CPU, DDR4 64Gb RAM, GeForce GTX 1070 GPU with 8Gb VRAM.
81+
82+
![time](https://github.com/JetBrains-Research/cuBool/raw/master/docs/pictures/mxm-perf-time.svg?raw=true&sanitize=true)
83+
![mem](https://github.com/JetBrains-Research/cuBool/raw/master/docs/pictures/mxm-perf-mem.svg?raw=true&sanitize=true)
84+
85+
The matrix data is selected from the SuiteSparse Matrix Collection [link](https://sparse.tamu.edu).
86+
87+
| Matrix name | # Rows | Nnz M | Nnz/row | Max Nnz/row | Nnz M^2 |
88+
|--- |---: |---: |---: |---: |---: |
89+
| SNAP/amazon0312 | 400,727 | 3,200,440 | 7.9 | 10 | 14,390,544 |
90+
| LAW/amazon-2008 | 735,323 | 5,158,388 | 7.0 | 10 | 25,366,745 |
91+
| SNAP/web-Google | 916,428 | 5,105,039 | 5.5 | 456 | 29,710,164 |
92+
| SNAP/roadNet-PA | 1,090,920 | 3,083,796 | 2.8 | 9 | 7,238,920 |
93+
| SNAP/roadNet-TX | 1,393,383 | 3,843,320 | 2.7 | 12 | 8,903,897 |
94+
| SNAP/roadNet-CA | 1,971,281 | 5,533,214 | 2.8 | 12 | 12,908,450 |
95+
| DIMACS10/netherlands_osm | 2,216,688 | 4,882,476 | 2.2 | 7 | 8,755,758 |
96+
97+
Detailed comparison is available in the full paper text at
98+
[link](https://github.com/YaccConstructor/articles/blob/master/2021/GRAPL/Sparse_Boolean_Algebra_on_GPGPU/Sparse_Boolean_Algebra_on_GPGPU.pdf).
99+
77100
### Installation
78101

79-
If you are running **Linux based** OS (tested on Ubuntu 20.04) you can download the official
102+
If you are running **Linux-based** OS (tested on Ubuntu 20.04) you can download the official
80103
PyPI **pycubool** python package, which includes compiled library source code
81104
with Cuda and Sequential computations support. Installation process
82105
requires only `python3` to be installed on your machine. Python can be installed
@@ -102,7 +125,7 @@ These steps are required if you want to build library for your specific platform
102125

103126
### Requirements
104127

105-
- Linux based OS (tested on Ubuntu 20.04)
128+
- Linux-based OS (tested on Ubuntu 20.04)
106129
- CMake Version 3.15 or higher
107130
- CUDA Compatible GPU device (to run Cuda computations)
108131
- GCC Compiler

Diff for: cubool/include/cubool/cubool.h

+1
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ typedef struct cuBool_Vector_t* cuBool_Vector;
118118
typedef struct cuBool_DeviceCaps {
119119
char name[256];
120120
bool cudaSupported;
121+
bool managedMem;
121122
int major;
122123
int minor;
123124
int warp;

Diff for: cubool/sources/core/library.cpp

+2
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,7 @@ namespace cubool {
249249
void Library::queryCapabilities(cuBool_DeviceCaps &caps) {
250250
caps.name[0] = '\0';
251251
caps.cudaSupported = false;
252+
caps.managedMem = false;
252253
caps.major = 0;
253254
caps.minor = 0;
254255
caps.warp = 0;
@@ -272,6 +273,7 @@ namespace cubool {
272273
<< " name: " << caps.name << ","
273274
<< " major: " << caps.major << ","
274275
<< " minor: " << caps.minor << ","
276+
<< " mem type: " << (caps.managedMem? "managed": "default") << ","
275277
<< " warp size: " << caps.warp << ","
276278
<< " globalMemoryKiBs: " << caps.globalMemoryKiBs << ","
277279
<< " sharedMemoryPerMultiProcKiBs: " << caps.sharedMemoryPerMultiProcKiBs << ","

Diff for: cubool/sources/cuda/cuda_backend.cu

+1-1
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ namespace cubool {
9292
}
9393

9494
void CudaBackend::queryCapabilities(cuBool_DeviceCaps &caps) {
95-
CudaInstance::queryDeviceCapabilities(caps);
95+
mInstance->queryDeviceCapabilities(caps);
9696
}
9797

9898
CudaInstance & CudaBackend::getInstance() {

Diff for: cubool/sources/cuda/cuda_instance.cu

+12-11
Original file line numberDiff line numberDiff line change
@@ -86,17 +86,7 @@ namespace cubool {
8686
}
8787
}
8888

89-
CudaInstance::MemType CudaInstance::getMemoryType() const {
90-
return mMemoryType;
91-
}
92-
93-
bool CudaInstance::isCudaDeviceSupported() {
94-
int device;
95-
cudaError error = cudaGetDevice(&device);
96-
return error == cudaSuccess;
97-
}
98-
99-
void CudaInstance::queryDeviceCapabilities(cuBool_DeviceCaps &deviceCaps) {
89+
void CudaInstance::queryDeviceCapabilities(cuBool_DeviceCaps &deviceCaps) const {
10090
const unsigned long long KiB = 1024;
10191

10292
int device;
@@ -109,6 +99,7 @@ namespace cubool {
10999
if (error == cudaSuccess) {
110100
strcpy(deviceCaps.name, deviceProp.name);
111101
deviceCaps.cudaSupported = true;
102+
deviceCaps.managedMem = mMemoryType == MemType::Managed;
112103
deviceCaps.minor = deviceProp.minor;
113104
deviceCaps.major = deviceProp.major;
114105
deviceCaps.warp = deviceProp.warpSize;
@@ -119,6 +110,16 @@ namespace cubool {
119110
}
120111
}
121112

113+
CudaInstance::MemType CudaInstance::getMemoryType() const {
114+
return mMemoryType;
115+
}
116+
117+
bool CudaInstance::isCudaDeviceSupported() {
118+
int device;
119+
cudaError error = cudaGetDevice(&device);
120+
return error == cudaSuccess;
121+
}
122+
122123
void CudaInstance::allocate(void* &ptr, size_t size) const {
123124
ptr = malloc(size);
124125
CHECK_RAISE_ERROR(ptr != nullptr, MemOpFailed, "Failed to allocate memory on the CPU");

Diff for: cubool/sources/cuda/cuda_instance.hpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -51,12 +51,12 @@ namespace cubool {
5151
void allocateOnGpu(void* &ptr, size_t s) const;
5252
void deallocate(void* ptr) const;
5353
void deallocateOnGpu(void* ptr) const;
54+
void queryDeviceCapabilities(cuBool_DeviceCaps& deviceCaps) const;
5455

5556
void syncHostDevice() const;
5657
MemType getMemoryType() const;
5758

5859
static bool isCudaDeviceSupported();
59-
static void queryDeviceCapabilities(cuBool_DeviceCaps& deviceCaps);
6060
static CudaInstance& getInstanceRef();
6161
static CudaInstance* getInstancePtr();
6262
static bool isInstancePresent();

Diff for: docs/pictures/mxm-perf-mem.svg

+1
Loading

Diff for: docs/pictures/mxm-perf-time.svg

+1
Loading

Diff for: python/README.md

+23-2
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,6 @@ prototyping algorithms on a local computer for later running on a powerful serve
3131

3232
### Features
3333

34-
- C API for performance-critical computations
35-
- Python package for every-day tasks
3634
- Cuda backend for computations
3735
- Cpu backend for computations
3836
- Matrix/vector creation (empty, from data, with random data)
@@ -47,6 +45,29 @@ prototyping algorithms on a local computer for later running on a powerful serve
4745
- GraphViz (export single matrix or set of matrices as a graph with custom color and label settings)
4846
- Debug (matrix string debug markers, logging)
4947

48+
### Performance
49+
50+
Sparse Boolean matrix-matrix multiplication evaluation results are listed bellow.
51+
Machine configuration: PC with Ubuntu 20.04, Intel Core i7-6700 3.40GHz CPU, DDR4 64Gb RAM, GeForce GTX 1070 GPU with 8Gb VRAM.
52+
53+
![time](https://github.com/JetBrains-Research/cuBool/raw/master/docs/pictures/mxm-perf-time.svg?raw=true&sanitize=true)
54+
![mem](https://github.com/JetBrains-Research/cuBool/raw/master/docs/pictures/mxm-perf-mem.svg?raw=true&sanitize=true)
55+
56+
The matrix data is selected from the SuiteSparse Matrix Collection [link](https://sparse.tamu.edu).
57+
58+
| Matrix name | # Rows | Nnz M | Nnz/row | Max Nnz/row | Nnz M^2 |
59+
|--- |---: |---: |---: |---: |---: |
60+
| SNAP/amazon0312 | 400,727 | 3,200,440 | 7.9 | 10 | 14,390,544 |
61+
| LAW/amazon-2008 | 735,323 | 5,158,388 | 7.0 | 10 | 25,366,745 |
62+
| SNAP/web-Google | 916,428 | 5,105,039 | 5.5 | 456 | 29,710,164 |
63+
| SNAP/roadNet-PA | 1,090,920 | 3,083,796 | 2.8 | 9 | 7,238,920 |
64+
| SNAP/roadNet-TX | 1,393,383 | 3,843,320 | 2.7 | 12 | 8,903,897 |
65+
| SNAP/roadNet-CA | 1,971,281 | 5,533,214 | 2.8 | 12 | 12,908,450 |
66+
| DIMACS10/netherlands_osm | 2,216,688 | 4,882,476 | 2.2 | 7 | 8,755,758 |
67+
68+
Detailed comparison is available in the full paper text at
69+
[link](https://github.com/YaccConstructor/articles/blob/master/2021/GRAPL/Sparse_Boolean_Algebra_on_GPGPU/Sparse_Boolean_Algebra_on_GPGPU.pdf).
70+
5071
### Simple example
5172

5273
Create sparse matrices, compute matrix-matrix product and print the result to the output:

0 commit comments

Comments
 (0)