-
Notifications
You must be signed in to change notification settings - Fork 1k
Add CUDA & OpenCL support #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
crazyks
wants to merge
190
commits into
google:master
Choose a base branch
from
ianhuang-777:opencl
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
190 commits
Select commit
Hold shift + click to select a range
e5efe98
完成代码流程,但计算结果还需要校正
9a6a17c
修正 cuMask 计算结果
3345026
调整cu代码结构
5d49f24
调整代码
b4d0ffe
简化代码
18f9672
调整cu编译
601e367
CUDA编译支持宏开关
c0bab47
优化clSetKernelArg代码
39bcbd1
精简代码
1cb6e52
cu编译改回nvcc提前编译
cd2e614
更换mode方式
598603b
异步拷贝内存
8c29f1f
完成CUDA并行优化,计算结果正常
d13a9ba
修正命令行提示,Max Thread Per MP和SP是不一样的概念
f9ba50e
调整参数试试性能情况
cce5bc3
修正64、32位判断的宏
3237a50
优化
9f8597d
恢复factor=2的支持,性能差别不大,但是编译时间变长了
61fde3c
优化编译和Test脚本
8fe8454
减少kernel中一些冗余的数据copy
c90b88a
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
1e4b4f4
优化clDiffmapOpsinDynamicsImageEx
3995006
增加一些调试信息
1aa86d5
kernel运算用float替代double,节省运算时间
8ed0ce3
修正数组长度
7aff164
我也不知道为什么,删除掉这个空行计算结果就正确了
f795ad1
修正编译配置
13abc16
修正warning
0c85b8f
换一组编译参数
45300b4
Merge branch 'googleMaster'
a9ebb86
fix build
9ff693c
add sample picture
fb1032b
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
e89cdcf
float is enough
fe645a9
Add OpenCL Support
c72cece
MinSquareVal with OpenCL
c354348
OpenCL 优化卷积
5d8ba53
fix setupopencl
82265a6
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
775c63c
Add comment for understanding.
ianhuang-777 4061ccb
尝试看一下全OpenCL化Blur函数,不过目前计算误差有些大,是否有Bug?
d9a87af
add opencl process line
b618843
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
0ba6817
add function
d4c9ed9
搭建 clDiffmapOpsinDynamicsImage 的计算流程
dba4c85
Convert OpsinDynamicsImage to opencl
ianhuang-777 ac4254e
fix opencl compile error
0afb0a3
open cl compiler error fix
ianhuang-777 2cbc518
Implement clConvolutionEx
ianhuang-777 fad11fc
Remove useless code
ianhuang-777 b501393
Implement clUpsampleEx
ianhuang-777 8909cda
Implement clMinSquareValEx
ianhuang-777 5ea138c
Implement clMaskEx
ianhuang-777 437fa09
Implement clScaleImageEx
ianhuang-777 a31adf1
验证clOpinDynamicImage的效果
c30b44d
尝试双精度运算支持
2e4cf39
Print More DeviceInfo
fd520d3
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
56ac179
Implement clMaskHighIntensityChangeEx
ianhuang-777 a024ec1
Implement clDiffPrecomputeEx
ianhuang-777 13637b2
Implement clDiffPrecomputeEx
ianhuang-777 b7b19ed
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
4aeec41
test for clDiffmapOpsinDynamicsImage
da654cb
fix runtime bug
2ceb635
remove useless code
6981d9f
添加测试用例
7e1ad82
增加测试用例
7ef1b6d
修改测试用例框架
8d35692
测试用例分工
5864a11
MapBuffer之后要进行Unmap
8474de0
先排查>100*100的计算精度问题
1e8972f
Remove _constant for opencl 1.2
ianhuang-777 9400c21
Remove _constant for opencl 2.0
ianhuang-777 6962f20
修复nVidia显卡的问题
c1f83bb
fixed n卡 __constant的问题
a8aba9b
Fix __constant error for nvidia device
ianhuang-777 d9e3808
Optimize clDoMask
ianhuang-777 7731427
32位平台编译配置
3116d6a
Move some local constant array to __constant
ianhuang-777 95f10c7
for test
1a8fcc2
测试卷积函数,节省一块中间缓存的使用
e919c9b
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
f947da9
修正blockDiffMap计算
6ba5810
Merge remote-tracking branch 'origin/master'
crazyks 853222f
add clMinSquareVal test
crazyks 920de33
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
6ce7175
修正OpsinDynamicsImage运算结果
44df712
remove redundant parameter
crazyks 79cb8cd
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
ianhuang-777 7b9cf14
Add tclAverage55
ianhuang-777 81c4354
修正计算结果+增加comparator子类
389777f
fix-mapbuffer长度和需要的不符
aaddc93
添加 clButteraugliComparator,避免对第三方库代码破坏太大
36905d7
规范kernel函数名以cl开头
7c97e95
修正n卡上的编译问题
5eb14f3
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
f12e272
增加 SelectFrequencyMaskingBatch 化处理
dae1673
建立cl端的批量化ComputeZeroingOrder都有哦
148927e
调整分工
55f60a4
分配工作
16e27ab
clComputeBlockZeroingOrder
87b462a
修正n卡编译兼容问题
0925579
Implement part of BlurEx
ianhuang-777 b3455dd
Merge remote-tracking branch 'origin/master'
crazyks 5e53802
Fix BlurEx
ianhuang-777 e69365c
fix data type of coeff_t
crazyks 8d82c8e
modify MakeInputOrder
crazyks fecac92
Add BlockToImage
crazyks e2b3830
Add MaskHighIntensityChangeBlock
crazyks b76def6
SelectFrequencyMaskingBatch 计算流程修正,终于可以正常跑起来了
caa4fbb
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
8c63e20
对于8x8的块,暂时不做check,否则速度太慢了
6b8bebf
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
6482c67
增加访问接口,主要用于数据校验
c5a08a1
增加校验原图数据变化的代码,to be delete
d587e66
增加factor_x = factor_y = 2时的batch化原型
8d28110
翻译ComputeBlockEx2为OpenCL
08db770
clComputeBlockZeroingOrderFactor调试
d931558
精简代码
0bda30e
factor 2支持完成
999585d
合并类型声明,在opencl中include
643e8db
修正 clEnqueueUnmapMemObject 参数传递bug
b2d8639
精简代码
5a54624
清理代码
d0949f1
清理代码
cc746ff
清理代码
1f87bb2
清理代码
add8436
去掉编译事件
8f80356
精简代码
f766120
精简代码
264209c
const 控制
ea15082
Fix Average5x5
ianhuang-777 6496886
Inline ScaleIamge in kernel Average5x5
ianhuang-777 89cda39
Avoid const value computing in work item
ianhuang-777 f54bc0e
Fix tclCalculateDiffmap
ianhuang-777 36f2e52
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
ec42b7b
const control
a469c02
精简代码
e68cea4
调整参数顺序
7c9c34a
调整参数规则
b0d7b80
调整参数规范
f5fcd1b
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
bb1e067
调整代码,修正参数传递规则
34af91d
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
b47cb8d
增加CUDA编译,请小心更新,没安装cuda会无法编译
99631b8
support cuda opt
ef025da
运行期编译.cu
a8bcf1f
兼容CUDA编译,编译器语法检查
4533a02
cuScaleImage跑通
6240ace
cuOpsinDynamicsImage 完成
49d74ab
增加剩余的cu入口函数
63ac064
简化点代码喽
12cd120
fix linux build
crazyks 7c2e57d
merge google的改动之后,每次compuare StartBlockComparisons都会重新计算原始图片的opsin
79bce89
修复处理png时的crash
9e8bdb3
节省clComputeBlockZeroingOrderEx过程中的冗余计算
43834f7
静态库编译
e922dbf
编译参数
230924b
调整测试脚本,支持目录批量优化
0e0edb1
Merge branch 'master' of https://github.com/ianhuang-777/guetzli
c31af45
c优化选项
891def1
优化c代码
1f26bc0
不优化c了
742b284
优化c版本
b67b00d
Modify the flag for creating CUDA context
crazyks c1bc10c
Add macro for opencl version
crazyks 66a8d9f
Add simple cuda memory pool
ianhuang-777 e11a712
Add missing files
ianhuang-777 36a3ce6
Clean code
ianhuang-777 e42fdab
Modify makefile
644f563
默认开启CUDA OPENCL
340d914
移除tcmalloc,对性能没什么影响
6f2726b
Change memory block status to enum
ianhuang-777 46367ce
Remove tcmalloc
ianhuang-777 8031985
支持非主流JPEG格式
zhantong eda913f
Mofidy makefile
4058d6e
修复libjpeg库在debug和32位下编译不成功的问题
zhantong c100839
Translate the comment.
ianhuang-777 5f309e7
Remove some redundant files
crazyks 5aa73ae
Modify makefile
c525adf
Disable CUDA & OpenCL by default
crazyks ba21943
Add netpbm
crazyks 93fd3f3
Fix type cast error on Mac
crazyks 1c1d7e6
Update bazel version to 0.5.2
crazyks 1cb26c7
Add oracle-java8-installer
crazyks 40665e2
Try to fix Bazel build
crazyks 05ee2f8
Add author information
crazyks 808e624
Update ReadMe
crazyks af12f12
Update ReadMe & fix some mistakes
crazyks 14ef86d
Update appveyor.xml
crazyks File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,3 +15,5 @@ ipch/ | |
| *.cachefile | ||
| *.VC.db | ||
| *.VC.VC.opendb | ||
| guetzli.vcxproj.user | ||
| clguetzli/clguetzli.cu.ptx* | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,3 +99,59 @@ attempts made. | |
| Please note that JPEG images do not support alpha channel (transparency). If the | ||
| input is a PNG with an alpha channel, it will be overlaid on black background | ||
| before encoding. | ||
|
|
||
| # Extra features | ||
|
|
||
| **Note:** Please make sure that you can build guetzli successfully before adding the following features. | ||
|
|
||
| ## Enable CUDA/OpenCL support | ||
|
|
||
| **Note:** Before adding [CUDA](https://developer.nvidia.com/cuda-zone) support, please [check](http://developer.nvidia.com/cuda-gpus) whether your GPU support CUDA or not. | ||
|
|
||
| **Note:** If you don't have an NVIDIA card that support CUDA, you can try [OpenCL](https://www.khronos.org/opencl/) instead. You can install any of the OpenCL SDKs, such as [Intel OpenCL SDK](https://software.intel.com/en-us/intel-opencl), [AMD OpenCL SDK](http://developer.amd.com/tools-and-sdks/opencl-zone/), etc. | ||
|
|
||
| **Note:** The steps for adding OpenCL support is very similar with adding CUDA support, so the following introduction will be only for CUDA. | ||
|
|
||
| ### On POSIX systems | ||
| 1. Follow the [Installation Guide for Linux ](https://developer.nvidia.com/compute/cuda/8.0/Prod2/docs/sidebar/CUDA_Installation_Guide_Linux-pdf) to setup [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit). | ||
| 2. Edit `premake5.lua`, add `defines { "__USE_OPENCL__", "__USE_CUDA__" }` and `links { "OpenCL", "cuda" }` under `filter "action:gmake"`. Then do `premake5 --os=linux gmake` to update the makefile. | ||
| 3. Edit `clguetzli/clguetzli.cl` and add `#define __USE_OPENCL__` at first line. | ||
| 4. Run `make` and expect the binary to be created in `bin/Release/guetzli`. | ||
| 5. Run `./compile.sh 64` or `./compile.sh 32` to build the 64 or 32 bits [ptx](http://docs.nvidia.com/cuda/parallel-thread-execution) file, and the ptx file will be copied to `bin/Release/clguetzli`. | ||
|
|
||
| ### On Windows | ||
| 1. Follow the [Installation Guide for Microsoft Windows](https://developer.nvidia.com/compute/cuda/8.0/Prod2/docs/sidebar/CUDA_Installation_Guide_Windows-pdf) to setup `CUDA Toolkit`. | ||
| 2. Copy `<vs2015 dir>\VC\bin\amd64\vcvars64.bat` as `<guetzli dir>\vcvars64.bat` | ||
| 3. Open the Visual Studio project and edit the project `Property Pages` as follows: | ||
| * Add `__USE_OPENCL__` and `__USE_CUDA__` to preprocessor definitions. | ||
| * Add `OpenCL.lib` and `cuda.lib` to additional dependencies. | ||
| * Add `$(CUDA_PATH)\include` to include directories. | ||
| * Add `$(CUDA_PATH)\lib\Win32` or `$(CUDA_PATH)\lib\x64` to library directories. | ||
| 4. Edit `clguetzli/clguetzli.cl` and add `#define __USE_OPENCL__` at first line. | ||
| 5. Build it. | ||
|
|
||
| ### Usage | ||
| ```bash | ||
| guetzli [--c|--cuda|--opencl] [other options] original.png output.jpg | ||
| guetzli [--c|--cuda|--opencl] [other options] original.jpg output.jpg | ||
| ``` | ||
| You can pass a `--c` parameter to enable the procedure optimization or `--cuda` parameter to use the CUDA acceleration or `--opencl` to use the OpenCL acceleration. | ||
|
|
||
| If you have any question about CUDA/OpenCL support, please contact [email protected], [email protected] or [email protected]. | ||
|
|
||
| ## Enable full JPEG format support | ||
| ### On POSIX systems | ||
| 1. Install [libjpeg](http://libjpeg.sourceforge.net/). | ||
| If using your operating system | ||
| package manager, install development versions of the packages if the | ||
| distinction exists. | ||
| * On Ubuntu, do `apt-get install libjpeg8-dev`. | ||
| * On Fedora, do `dnf install libjpeg-devel`. | ||
| * On Arch Linux, do `pacman -S libjpeg`. | ||
| * On Alpine Linux, do `apk add libjpeg`. | ||
| 2. Edit `premake5.lua`, add `defines {"__SUPPORT_FULL_JPEG__"}` and `links { "jpeg" }` under `filter "action:gmake"`. Then do `premake5 --os=linux gmake` to update the makefile. | ||
| 3. Run `make` and expect the binary to be created in `bin/Release/guetzli` | ||
| ### On Windows | ||
| 1. Install `libjpeg-turbo` using vcpkg: `.\vcpkg install libjpeg-turbo` | ||
| 2. Open the Visual Studio project and add `__SUPPORT_FULL_JPEG__` to preprocessor definitions in the project `Property Pages`. | ||
| 3. Build it. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe create a dropbox email like [email protected] that would go to all 3 of you (and could be adjusted on your end to add/remove people as necessary without having to update these docs)