CUDA
This category describes the ImageDev algorithms GPU-accelerated with the CUDA Toolkit.
- Geometry And Matching: This category provides algorithms for performing geometrical transformations and detecting predefined patterns.
- Image Filtering: This category gathers filtering algorithms (for example, for denoising an image or enhancing its contrast).
- Mathematical Morphology: This category introduces a theory for the analysis of geometrical structures.
Overview
Most ImageDev algorithms are CPU parallelized. Taking advantage of these accelerations is transparent for the user and automatically managed by their default implementation.In addition to CPU parallelization, ImageDev also offers a collection of algorithms that are optimized for GPU computing. These algorithms are built using the NVIDIA® CUDA® toolkit and are provided as separate implementations, prefixed by the Cuda keyword. The main motivations for having created new algorithms rather than automatically invoking the GPU implementations are:
- Depending on the available hardware and the specific parameters chosen, the CPU implementation of an algorithm may actually be faster than the GPU implementation.
- Some parameters are proposed to perform out-of-core processing when the input data doesn't fit in GPU memory.
Prerequisites
To execute any GPU-accelerated ImageDev algorithm, you need:- An NVIDIA GPU, supporting CUDA Compute Capability 3.5 or higher is required. Drivers should be up to date. Compatible GPUs are listed here.
- A CUDA 12 version installed and the path to its binary files defined in the system path. The CUDA download page is here.
Memory Management
The IOLink ImageView objects can be allocated either in CPU memory or in CUDA GPU memory.- For an algorithm implemented in CUDA:
- If it receives a CPU ImageView as input, it transfers it to CUDA memory, applies the algorithm on the GPU, and then provides a CUDA ImageView as output.
- If it receives a GPU ImageView as input, it directly applies the algorithm on the GPU, and then provides a CUDA ImageView as output.
- For an algorithm implemented in CPU:
- If it receives a CPU ImageView as input, it directly applies the algorithm on the CPU, and then provides a CPU ImageView as output.
- If it receives a CUDA ImageView as input, it transfers it to CPU memory, applies the algorithm on the CPU, and then provides a CPU ImageView as output.
#include <iolink/cuda/CudaImageFactory.h> #include <iolink/view/ImageViewFactory.h>>
// Transfer a CPU image to CUDA memory auto imageCuda = iolink_cuda::CudaImageFactory::copyInCudaMemory( imageCpu ); // This image can now be directly processed on the GPU, and then transferred to CPU memory auto imageCpuNew = iolink::ImageViewFactory::copyInMemory( imageCuda );
import iolink import iolink_cuda
# Transfer a CPU image to CUDA memory image_cuda = iolink_cuda.CudaImageFactory.copy_in_cuda_memory(image_cpu) # This image can now be directly processed on the GPU, and then transferred to CPU memory image_cpu_new = iolink.ImageViewFactory.copy_in_memory(image_cuda)
using IOLink; using IOLink_Cuda;
// Transfer a CPU image to CUDA memory var imageCuda = CudaImageFactory.CopyInCudaMemory( imageCpu ); // This image can now be directly processed on the GPU, and then transferred to CPU memory var imageCpuNew = ImageViewFactory.CopyInMemory( imageCuda );
Tiling
If the image to be processed is large, ImageDev can process it tile by tile to reduce the GPU memory requirements. The tiling process is defined by a tiling mode and a size parameter. An overlap is deduced from the selected parameters and automatically set. For example, when applying an erosion of size 10 with tiling mode enabled, tiles are extracted with an overlap of 10 pixels.The tiling step can be skipped by setting the tiling mode to NONE.
© 2025 Thermo Fisher Scientific Inc. All rights reserved.