Image Denoising with GAN

7 min readMay 4, 2022

1. Introduction

In the Image Understanding domain (whose applications can be information extraction in identity cards, text extraction in scanned documents, object recognition, medical image analysis, etc.), one of the main steps is denoising the image, i.e. improving its quality by removing unwanted pixels or patterns that appeared due to the scanning or capturing process. If the denoising step is well-performed, we can expect to obtain a clean/high-quality image, which improves the quality of the post-extraction process.

Image noise can be caused by different intrinsic (i.e., sensor) and extrinsic (i.e., environmental) conditions which are usually unavoidable in practical situations. The presence of noise has some negative effects on various practical applications.

Many approaches are designed to attack the image denoising problems. Among them, generative models became one of the most widely studied topics in Machine Learning and especially Deep Learning. The generative domain is very interesting and recently observed a major breakthrough, namely Generative Adversarial Networks (GANs) [1].

As we will explain later, a Generative Adversarial Network contains two parts: a generative model and a discriminator model. For a computer vision problem like image denoising, we choose U-Net as the candidate for the backbone of the generative model. U-Net architecture is like a CNN encoder network followed by a CNN decoder network. The model was first used in biomedical image segmentation. In our customer’s projects @ La Javaness, we successfully applied the GAN technique to their image denoising related problems.

In this article, we will explain our approach step-by-step with a case study, starting from creating a noisy data set, passing by implementing a deep learning model and finally analyzing the results.

2. Use Cases

As mentioned in Section 1, image denoising has useful impacts on various computer vision use cases such as image restoration, visual tracking, image registration, image segmentation, image classification, image understanding, object recognition and medical image analysis, where obtaining the original image or high-quality image content is crucial for strong performance.

For example, in document understanding, the image quality can be very poor due to scanning, capturing images, some human intervention or old documents, which provokes lots of document comprehension issues. This undesirable situation occurs in many of our company’s client projects. Here, we present a solution for a project on information extraction from ID cards.

Some examples of problems when scanning, capturing images, old documents:

Some examples of problems when scanning and capturing images and old documents. *(The last image is collected from the Internet for illustration. They are not real dataset images. Essential information is masked.)*

3. Creating a dataset

We create a dataset of 360 images for training by downloading images from Google Search using the search term “identity card/passport”. To test the model, we used a real test set of 50 customer-provided ID card images. We converted all images to grayscale.

A portion of the dataset with 40 grayscale images. *(Images collected from the Internet and used for illustration. They are not real dataset images.)*

To train the model, we need the pairs of [Noisy-Ground Truth] images, i.e. pairs of low and high-quality images based on the same image. However, in our situation, we do not have real-world noisy images from ground-truth images in all cases and vice versa. Therefore, to obtain paired images, we choose to generate synthetic data by adding artificial noise to the images. We will explain the generation approach in the next section.

4. Noise Generation

Approach 1 — Add real patterns

Creating synthetic noise datasets from clean images is an important step towards getting good results for an algorithm, as learning-based methods are best suited for the type of synthetic noise used to train them.

If noisy images are artificially generated from clean images with known types of noise (e.g., additive Gaussian, salt and pepper scattering, surrounding pixel values), they often differ from noisy real-world images. To solve this problem, the idea is to collect an amount of “patterns” (the real noises) from real noisy images and add them to real images. Starting with a high-quality image in the dataset created in the previous step, the method varies the blur level and the brightness of the images, then adds some of the collected “patterns” to it. Examples of the collected patterns cut from real images are listed below:

*Examples of patterns from the real noise images.*

We use the package imgaug (a library for image augmentation in machine learning experiments) and fastai to create the noise (e.g., additive blur and colour, brightness).

The images below are examples of the data augmentation approach described, i.e. adding synthetic noise into real images.

Synthetic noisy images created by adding “patterns” and a combination of blur, brightness and patterns noise added to the ground image. (Images are collected from the Internet and used for illustration. They are not real dataset images.)

Below is our code for creating the synthetic noisy dataset by the combination of the blur & brightness variation and the addition of noise patterns to the ground images.

Approach 2 — Grayscale threshold variation

In our project, as ID cards stored in our customer’s server are grayscale images, we also introduced a different method to create the noisy image: changing the binarization threshold when converting from RGB to grayscale. For example, a threshold of 128, which makes the image sharper, can be used to generate a “high-quality” image (as shown on the left in the image below), while a threshold of 64, which makes the image darker, can be used to generate a “noisy” image (as shown on the right).

Ground true image — noisy image. (Source *images are collected from the Internet and used for illustration. They are not real dataset images.)*

Combining the two approaches, we got pairs of [noisy, high-quality] images for training.

5. Training GAN

Most deep learning-based denoising or super-resolution models are trained using GAN methods. GAN stands for Generative Adversarial Nets and was invented by Ian Goodfellow. The concept is that we train two models at the same time: a generator and a discriminator (critic). The critic will try to distinguish which images are real or generated. The generator (G) tries to fool the critic (D) by creating new images similar to those in the dataset. The generator (G) returns an image and the critic returns a single number (usually a probability, 0 for fake images and 1 for real images).

We train both the generator and critic against each other by maximizing the objective function with respect to the generator and minimizing it with respect to the critic (noting that in the normal process, we only minimize a loss function).

Formulation of the optimization problem.

Recall that GAN networks are usually complex (having two parts), have non-monotonic loss functions (max-min), and are quite sensitive to hyperparameters. Therefore, it is often necessary to initiate a GAN network by pre-training the generator and the critic.

Let’s dive into the training details. The first step is to pre-train the generator. As mentioned earlier, we use U-Net as the backbone of the generator:

Our U-Net has a Resnet34 encoder (Resnet34 forward) followed by a Resnet34 decoder (Resnet34 backward), and the two networks are connected by some skip connections. The forward network acts as an abstraction of the information, while the backward network acts as a reconstruction of the information. The skip connection provides extra attention to help the reconstruction process. We train U-Net where the forward network input is a noisy image and the backward network output is its corresponding clean image. The loss function is the MSE (Mean Squared Error) function on the image pixels. We try to reconstruct a clean image pixel by pixel.

The second step is to pre-train the critic. We use the image generated by the generator in the previous step and the image in the dataset to train a classifier to distinguish a fake (generated) image from the real (original) image. The loss function is the usual binary cross-entropy loss function for binary classification. We observe that for the first two steps so far, all loss functions are monotonic.

Code for fine-tuning. From: https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres-gan.ipynb

The third (and final) step is to combine the two models and train the U-Net model through the GAN method. The loss function is a max-min function. We observe that the loss function is no longer monotonic, implying a competition between the two networks, the generator and the critic. We try to reconstruct the clean image, but no longer at pixel level but a global one, the result would be more realistic.

Code for fine-tuning. From: https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres-gan.ipynb

We refer here for a tutorial of GAN with fastai:
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres-gan.ipynb