I have a tendency to choose the exact wrong thing every time when given a choice; I have sometimes wondered why. A good thing with doing things almost wrong is that you get to learn about things.I have a feeling that doing things wrong and getting feedback and correcting is somehow fundamental in the way learning process happens.
I usually start learning a technology or language by jumping right in doing things wrong and learning on the way; if you are like me, then this will save you some time and some hair pulling.
Before we start, just a very short introduciton into the why part,trust me just the bare essentials.
OpenCV operates on images , which in computers (at least the ones we have now) is stored a pixel matrices. Various algorithms that opencv provides, for example for object detection for example does a lot of matrix operations. These operations are 'embarrassingly parallel' -data parallel and could be speeded if executed in the GPU.
Now NVDIA GPU have an parallel programming API called CUDA which can help in speeding up matrix multiplication. And OpenCV has support for the same; to use it however you need to compile OpenCV with CUDA. CUDA is NVDIA proprietary and it would work with only NVDIA GPUs.
There is an open API which should work with different typed of GPU cards and that is OpenCL. However it may not be that tuned for a particular card through. OpenCV has support for OpenCL too; however we will for now use CUDA.
Finally one more thing; CUDA uses BLAS libraries. The CUDA SDK provided by NVDIA has the cublas libraries for it. Don't ask me why I chose to compile OpenBLAS for it; as I said before CMake gives a lot of choices and if you don't know as much as above, you are sure to do some totally unnecessary but very instructive things.
Okay now to to the how;on Windows
First check if your PC or laptop has an NVDIA card. The easiest to do is via dxdiag windows utility
Next step is to download the CUDA SDK from NVDIA.https://developer.nvidia.com/cuda-downloads; If you have a 64 bit system download the 64 bit SDK. Choose defaults and install it.
Then download the OpenCV source code from GIT and download CMake tool. You need to download MS Visual Studio Community edition for C++ compiler.
The main thing in correct compilation is to choose the right settings in CMake; First these are the minimum WITH variables needed to be configured
Then I found that the best way to reduce the compile time was to limit the architecture to the number I though the GPU card was supporting. In my case for GeForce GT 720M card in the CUDA wiki page the architecture code name was Fermi and compute capability was given as 2.1 . That did not work; so I gave 2.0 and I found compile time decreased considerably.
After that you Configure and make sure you select the 64 bit Visual Studio Compiler. Select 32 bit or do some other mistake and you will be led to lot of Configurations erros
If that is the case CMake will automatically select the 64 bit libraries from the CUDA SDK. Else it will try to take the 32 bit libraies and you may get configuration error about BLAS
With that you may be able to compile your OpenCV program . Note that when I used the default ARCH_BIN setting which goes all the way from 1 to 5 I got some linker errors -
Severity Code Description Project File Line Suppression State
Error LNK2019 unresolved external symbol __cudaRegisterLinkedBinary_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89 referenced in function "void __cdecl __sti____cudaRegisterAll_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89(void)" (?__sti____cudaRegisterAll_54_tmpxft_000028d8_00000000_15_gpu_mat_compute_37_cpp1_ii_71482d89@@YAXXZ) opencv_core D:\build\opencv2\modules\core\cuda_compile_generated_gpu_mat.cu.obj 1
For your program using the above built OpenCV usually most of the libraries given below are needed. If your build of OpenCV is proper you would get these many dlls in the output folder. If some are missing try to build it from Visual Studio
If you get include errors see the link
Finally check with GPU-Z and see if running the program is really using the GPU
Note for building your OpenCV solutions (1) using these libs and the following headers have to be added to OpenCV
(1) People detection example - https://gist.github.com/alexcpn/aeb8a4b8304639d8f91cc2fbc0c1c7df
C/C++ --> General --> Additional Include Directories
- D:\opencv\modules\calib3d\include;D:\opencv\modules\videoio\include;D:\opencv\modules\video\include;D:\opencv\modules\imgcodecs\include;D:\opencv\modules\cudaoptflow\include;D:\opencv\modules\cudastereo\include;D:\build\opencv4;d:\opencv\modules\core\include;D:\opencv\modules\cudawarping\include;D:\opencv\include;D:\opencv\modules\cudaobjdetect\include;D:\opencv\modules\cudaimgproc\include;D:\opencv\modules\imgproc\include;D:\opencv\modules\highgui\include;D:\opencv\modules\objdetect\include;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include
Note opencv2/opencv_modules.hpp is from the opencv build folder d:\build\opencv4\opencv2
Linker-->Input--> Additional Dependencies --> opencv_calib3d320.lib;opencv_core320.lib;opencv_features2d320.lib;opencv_flann320.lib;opencv_highgui320.lib;opencv_imgcodecs320.lib;opencv_imgproc320.lib;opencv_ml320.lib;opencv_objdetect320.lib;opencv_shape320.lib;opencv_ts320.lib;opencv_video320.lib;opencv_videoio320.lib;opencv_cudaimgproc320.lib;opencv_cudaarithm320.lib;opencv_cudabgsegm320.lib;opencv_cudacodec320.lib;opencv_cudalegacy320.lib;opencv_cudaobjdetect320.lib;opencv_cudawarping320.lib;opencv_cudev320.lib;opencv_cudafilters320.lib;%(AdditionalDependencies)
Lib Directories : - D:\build\opencv4\lib\Release
Here is what I did to install the latest OpenCV in an x86 664 bit machine running Ubuntu
sudo apt-get install -y build-essential cmake
//video codecs; these many are not given in opencv site but got this from some other blog; I am not sure what is the bare minimum
sudo apt-get install -y libdc1394-22-dev libavcodec-dev libavformat-dev libswscale-dev libtheora-dev libvorbis-dev libxvidcore-dev libx264-dev yasm libopencore-amrnb-dev libopencore-amrwb-dev libv4l-dev libxine2-dev
sudo apt-get install -y libtbb-dev libeigen3-dev
sudo apt-get install libavformat-dev libswscale-dev
// The below should be done at the begining; I did not do this and got some broken package error above; so did it; you learn the hard way :)
sudo apt-get -y update
sudo apt-get -y upgrade
sudo apt-get -y autoremove
sudo apt-get install -y libtbb-dev libeigen3-dev
sudo apt-get install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
sudo apt-get install libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev
sudo apt-get install -y qt5-default
sudo apt-get install -y zlib1g-dev libjpeg-dev libwebp-dev libpng-dev libtiff5-dev libjasper-dev libopenexr-dev libgdal-dev