Deep learning

This group contains algorithms performing a prediction from a fully convolutional neural network.

OnnxPredictionFiltering2d: Computes a prediction on a two-dimensional image from an ONNX model and generates an image representing the prediction scores.
OnnxPredictionFiltering3d: Computes a prediction on a three-dimensional image from an ONNX model and generates an image representing the prediction scores.
OnnxPredictionSegmentation2d: Computes a prediction on a two-dimensional image from an ONNX model and applies a post processing to generate a label or a binary image.
OnnxPredictionSegmentation3d: Computes a prediction on a three-dimensional image from an ONNX model and applies a post processing to generate a label or a binary image.

Overview

Among machine learning methods, deep learning has proved to be especially valuable in many image processing tasks. Deep learning models can be trained from a set of input images and the corresponding target results, such as manual segmentations reviewed by an expert. They can then be applied to predict results automatically from previously unseen images.

Deep learning refers to neural network models, which contain several layers of neurons. Each neuron combines pieces of input data or results from other neurons, to produce a result. The combination is realized through a weighted sum, where each weight corresponds to a parameter of the model. Typically, deep learning models can easily involve millions of such parameters.

Making predictions using a pre-trained model is a straightforward task. The deep learning prediction tools available in ImageDev can apply a variety of trained models, provided they are designed for 2D image processing tasks, such as image restoration or segmentation.

Prerequisites

To reap benefits from GPU optimizations when launching a prediction with ImageDev, you need:

An NVIDIA GPU, supporting CUDA Compute Capability 3.5 or higher is required. Drivers should be up to date. Compatible GPUs are listed here.
A CUDA version installed and the path to its binary files defined in the system path. The CUDA download page is here.

CUDA 12 is required by ImageDev C++ for Visual Studio 2019 or greater.
CUDA 11 is required by ImageDev C++ for Visual Studio 2017, ImageDev Python, and ImageDev C#.

A cuDNN 8.4.2 version or later installed and the path to its binary files defined in the system path. The cuDNN download page is here.

If the verbose mode is activated, it is indicated in the standard output, for each call to a ONNX command, if the command is executed on CPU or on GPU.

ONNX

ImageDev prediction tools use the Open Neural Network Exchange run time. ONNX is an interoperable framework enabling collaboration in the AI community. The ONNX framework provides tools for executing AI operations and a data model for representing convolutional neural networks.

ONNX models are the input of ImageDev prediction tools.

ImageDev relies on ONXX RunTime 1.8.2 that can run models with an opset version 14 or lower.

Model conversion

Some models can be found directly on the ONNX Model Zoo. TensorFlow and Keras models can be converted to ONNX with a Python script.

To apply the following snippet, we consider that a trained model has been created in the my_path folder. This model is composed of a weight file my_model.hdf5 and a configuration file my_model.json.

import keras
import tf2onnx
import tensorflow as tf

json_file = open(my_path + 'my_model.json', 'r')
model_json = json_file.read()
json_file.close()
model_keras = keras.models.model_from_json(model_json)
model_keras.load_weights(my_path + 'my_model.hdf5')

spec = (tf.TensorSpec((None, None, None, 1), tf.float32, name="input"),)

model_onnx, _ = tf2onnx.convert.from_keras(model_keras, input_signature=spec, opset=13, output_path= my_path+"my_model.onnx")

Pre-processing

Before performing a prediction, a set of operations can be sequentially applied to prepare the data in accordance with what the model expects and how it has been trained.

Normalization

A normalization is optionally applied on the input to map the input values to the range expected by the model.

The different options are:

Disable the normalization
Apply as standardization $N = \frac{I-\mu (I)}{\sigma (I)}$
Apply a normalization with the minimum and the maximum of the input data set $N = \frac{I-Min(I)}{Max(I)-Min(I)}$

where:

I is the input image
N is the normalized image
$\mu$ is the mean operator
$\sigma$ is the standard deviation operator
Min is the minimum operator
Max is the maximum operator

The normalization can be applied either individually on each image of the input batch, or globally to the input batch.

Tiling

If the image to be processed is large, ImageDev can process it tile by tile to reduce the GPU memory requirements. The tiling process is defined by a size parameter and an overlap. The patches sent to the prediction have systematically the size defined by the tileSize parameter. Using an overlap avoids wrong predictions at the tile borders.

Using tiles presents several interests:

Sending patches whose size is compatible with the model requirements
Processing input data that is too large for the GPU capabilities
Producing better prediction results by using the tile size of the training step.

It is important to note that the size of the tiles to be processed depends on the architecture of the model. Indeed, the dimensions of each tile must be proportional to the number of downsamplings performed by the model in its contracting phase in order to retrieve a coherent image at the output of the expansive phase. Each component of the tile size must therefore be a multiple of

$M = 2^N$ , where

$N$ is the number of downsampling or upsampling layers. This number can be determined by counting the number of upsampling layers in the model. If this condition is not met, an exception is raised.

If the input dimensions are a multiple of

$M$ , the tiling step can be skipped by setting a tile size equal to the input image size.

Data Format

A 4D tensor is expected as input by most deep learning models. ImageDev allows the conversion of the input data set to the NCHW and NHWC tensor layouts that are commonly used by the deep learning community.

N is the number of images contained in the batch.
C is the number of channels.
H is the image height, or number of rows.
W is the image width, or number of columns.

Depending on the model to apply, a specific layout must be set. ImageDev can automatically convert the input data set to the expected layout by specifying it with the dataFormat parameter.