Loading...

Follow Machine Learning Mastery on Feedspot

Continue with Google
Continue with Facebook
or

Valid

The convolutional layer in convolutional neural networks systematically applies filters to an input and creates output feature maps.

Although the convolutional layer is very simple, it is capable of achieving sophisticated and impressive results. Nevertheless, it can be challenging to develop an intuition for how the shape of the filters impacts the shape of the output feature map and how related configuration hyperparameters such as padding and stride should be configured.

In this tutorial, you will discover an intuition for filter size, the need for padding, and stride in convolutional neural networks.

After completing this tutorial, you will know:

  • How filter size or kernel size impacts the shape of the output feature map.
  • How the filter size creates a border effect in the feature map and how it can be overcome with padding.
  • How the stride of the filter on the input image can be used to downsample the size of the output feature map.

Let’s get started.

A Gentle Introduction to Padding and Stride for Convolutional Neural Networks
Photo by Red~Star, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. Convolutional Layer
  2. Problem of Border Effects
  3. Effect of Filter Size (Kernel Size)
  4. Fix the Border Effect Problem With Padding
  5. Downsample Input With Stride
Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Convolutional Layer

In a convolutional neural network, a convolutional layer is responsible for the systematic application of one or more filters to an input.

The multiplication of the filter to the input image results in a single output. The input is typically three-dimensional images (e.g. rows, columns and channels), and in turn, the filters are also three-dimensional with the same number of channels and fewer rows and columns than the input image. As such, the filter is repeatedly applied to each part of the input image, resulting in a two-dimensional output map of activations, called a feature map.

Keras provides an implementation of the convolutional layer called a Conv2D.

It requires that you specify the expected shape of the input images in terms of rows (height), columns (width), and channels (depth) or [rows, columns, channels].

The filter contains the weights that must be learned during the training of the layer. The filter weights represent the structure or feature that the filter will detect and the strength of the activation indicates the degree to which the feature was detected.

The layer requires that both the number of filters be specified and that the shape of the filters be specified.

We can demonstrate this with a small example. In this example, we define a single input image or sample that has one channel and is an eight pixel by eight pixel square with all 0 values and a two-pixel wide vertical line in the center.

# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)

Next, we can define a model that expects input samples to have the shape (8, 8, 1) and has a single hidden convolutional layer with a single filter with the shape of three pixels by three pixels.

# create model
model = Sequential()
model.add(Conv2D(1, (3,3), input_shape=(8, 8, 1)))
# summarize model
model.summary()

The filter is initialized with random weights as part of the initialization of the model. We will overwrite the random weights and hard code our own 3×3 filter that will detect vertical lines.

That is the filter will strongly activate when it detects a vertical line and weakly activate when it does not. We expect that by applying this filter across the input image, the output feature map will show that the vertical line was detected.

# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)

Next, we can apply the filter to our input image by calling the predict() function on the model.

# apply filter to input data
yhat = model.predict(data)

The result is a four-dimensional output with one batch, a given number of rows and columns, and one filter, or [batch, rows, columns, filters].

We can print the activations in the single feature map to confirm that the line was detected.

# enumerate rows
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Tying all of this together, the complete example is listed below.

# example of using a single convolutional layer
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), input_shape=(8, 8, 1)))
# summarize model
model.summary()
# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# apply filter to input data
yhat = model.predict(data)
# enumerate rows
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Running the example first summarizes the structure of the model.

Of note is that the single hidden convolutional layer will take the 8×8 pixel input image and will produce a feature map with the dimensions of 6×6. We will go into why this is the case in the next section.

We can also see that the layer has 10 parameters, that is nine weights for the filter (3×3) and one weight for the bias.

Finally, the feature map is printed. We can see from reviewing the numbers in the 6×6 matrix that indeed the manually specified filter detected the vertical line in the middle of our input image.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 6, 6, 1)           10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________


[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

Problem of Border Effects

In the previous section, we defined a single filter with the size of three pixels high and three pixels wide (rows, columns).

We saw that the application of the 3×3 filter, referred to as the kernel size in Keras, to the 8×8 input image resulted in a feature map with the size of 6×6.

That is, the input image with 64 pixels was reduced to a feature map with 36 pixels. Where did the other 28 pixels go?

The filter is applied systematically to the input image. It starts at the top left corner of the image and is moved from left to right one pixel column at a time until the edge of the filter reaches the edge of the image.

For a 3×3 pixel filter applied to a 8×8 input image, we can see that it can only be applied six times, resulting in the width of six in the output feature map.

For example, let’s work through each of the six patches of the input image (left) dot product (“.” operator) the filter (right):

0, 0, 0   0, 1, 0
0, 0, 0 . 0, 1, 0 = 0
0, 0, 0   0, 1, 0

Moved right one pixel:

0, 0, 1   0, 1, 0
0, 0, 1 . 0, 1, 0 = 0
0, 0, 1   0, 1, 0

Moved right one pixel:

0, 1, 1   0, 1, 0
0, 1, 1 . 0, 1, 0 = 3
0, 1, 1   0, 1, 0

Moved right one pixel:

1, 1, 0   0, 1, 0
1, 1, 0 . 0, 1, 0 = 3
1, 1, 0   0, 1, 0

Moved right one pixel:

1, 0, 0   0, 1, 0
1, 0, 0 . 0, 1, 0 = 0
1, 0, 0   0, 1, 0

Moved right one pixel:

0, 0, 0   0, 1, 0
0, 0, 0 . 0, 1, 0 = 0
0, 0, 0   0, 1, 0

That gives us the first row and each column of the output feature map:

0.0, 0.0, 3.0, 3.0, 0.0, 0.0

The reduction in the size of the input to the feature map is referred to as border effects. It is caused by the interaction of the filter with the border of the image.

This is often not a problem for large images and small filters but can be a problem with small images. It can also become a problem once a number of convolutional layers are stacked.

For example, below is the same model updated to have two stacked convolutional layers.

This means that a 3×3 filter is applied to the 8×8 input image to result in a 6×6 feature map as in the previous section. A 3×3 filter is then applied to the 6×6 feature map.

# example of stacked convolutional layers
from keras.models import Sequential
from keras.layers import Conv2D
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), input_shape=(8, 8, 1)))
model.add(Conv2D(1, (3,3)))
# summarize model
model.summary()

Running the example summarizes the shape of the output from each layer.

We can see that the application of filters to the feature map output of the first layer, in turn, results in a smaller 4×4 feature map.

This can become a problem as we develop very deep convolutional neural network models with tens or hundreds of layers. We will simply run out of data in our feature maps upon which to operate.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 6, 6, 1)           10
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 1)           10
=================================================================
Total params: 20
Trainable params: 20
Non-trainable params: 0
_________________________________________________________________

Effect of Filter Size (Kernel Size)

Different sized filters will detect different sized features in the input image and, in turn, will result in differently sized feature maps.

It is common to use 3×3 sized filters, and perhaps 5×5 or even 7×7 sized filters, for larger input images.

For example, below is an example of the model with a single filter updated to use a filter size of 5×5 pixels.

# example of a convolutional layer
from keras.models import Sequential
from keras.layers import Conv2D
# create model
model = Sequential()
model.add(Conv2D(1, (5,5), input_shape=(8, 8, 1)))
# summarize model
model.summary()

Running the example demonstrates that the 5×5 filter can only be applied to the 8×8 input image 4 times, resulting in a 4×4 feature map output.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 4, 4, 1)           26
=================================================================
Total params: 26
Trainable params: 26
Non-trainable params: 0
_________________________________________________________________

It may help to further develop the intuition of the relationship between filter size and the output feature map to look at two extreme cases.

The first is a filter with the size of 1×1 pixels.

# example of a convolutional layer
from keras.models import Sequential
from keras.layers import Conv2D
# create model
model = Sequential()
model.add(Conv2D(1, (1,1), input_shape=(8, 8, 1)))
# summarize model
model.summary()

Running the example demonstrates that the output feature map has the same size as the input, specifically 8×8. This is because the filter only has a single weight (and a bias).

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 8, 8, 1)           2
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________

The other extreme is a filter with the same size as the input, in this case, 8×8 pixels.

# example of a convolutional layer
from keras.models import Sequential
from keras.layers import Conv2D
# create model
model = Sequential()
model.add(Conv2D(1, (8,8), input_shape=(8, 8, 1)))
# summarize model
model.summary()

Running the example, we can see that, as you might expect, there is one weight for each pixel in the input image (64 + 1 for the bias) and that the output is a feature map with a single pixel.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 1, 1, 1)           65
=================================================================
Total params: 65
Trainable params: 65
Non-trainable params: 0
_________________________________________________________________

Now that we are familiar with the effect of filter sizes on the size of the resulting feature map, let’s look at how we can stop losing pixels.

Fix the Border Effect Problem With Padding

By default, a filter starts at the left of the image with the left-hand side of the filter sitting on the far left pixels of the image. The filter is then stepped across the image one column at a time until the right-hand side of the filter is sitting on the far right pixels of the image.

An alternative approach to applying a filter to an image is to ensure that each pixel in the image is given an opportunity to be at the center of the filter.

By default, this is not the case, as the pixels on the edge of the input are only ever exposed to the edge of the filter. By starting the filter outside the frame of the image, it gives the pixels on the border of the image more of an opportunity for interacting with the filter, more of an opportunity for features to be detected by the filter, and in turn, an output feature map that has the same shape as the input image.

For example, in the case of applying a 3×3 filter to the 8×8 input image, we can add a border of one pixel around the outside of the image. This has the effect of artificially creating a 10×10 input image. When the 3×3 filter is applied, it results in an 8×8 feature map. The added pixel values could have the value zero value that has no effect with the dot product operation when the filter is applied.

x, x, x   0, 1, 0
x, 0, 0 . 0, 1, 0 = 0
x, 0, 0   0, 1, 0

The addition of pixels to the edge of the image is called padding.

In Keras, this is specified via the “padding” argument on the Conv2D layer, which has the default value of ‘valid‘ (no padding). This means that the filter is applied only to valid ways to the input.

The ‘padding‘ value of ‘same‘ calculates and adds the padding required to the input image (or feature map) to ensure that the output has the same shape as the input.

The example below adds padding to the convolutional layer in our worked example.

# example a convolutional layer with padding
from keras.models import Sequential
from keras.layers import Conv2D
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), padding='same', input_shape=(8, 8, 1)))
# summarize model
model.summary()

Running the example demonstrates that the shape of the output feature map is the same as the input image: that the padding had the desired effect.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 8, 8, 1)           10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________

The addition of padding allows the development of very deep models in such a way that the feature maps do not dwindle away to nothing.

The example below demonstrates this with three stacked convolutional layers.

# example a deep cnn with padding
from keras.models import Sequential
from keras.layers import Conv2D
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), padding='same', input_shape=(8, 8, 1)))
model.add(Conv2D(1, (3,3), padding='same'))
model.add(Conv2D(1, (3,3), padding='same'))
# summarize model
model.summary()

Running the example, we can see that with the addition of padding, the shape of the output feature maps remains fixed at 8×8 even three layers deep.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 8, 8, 1)           10
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 8, 8, 1)           10
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 8, 8, 1)           10
=================================================================
Total params: 30
Trainable params: 30
Non-trainable params: 0
_________________________________________________________________

Downsample Input With Stride

The filter is moved across the image left to right, top to bottom, with a one-pixel column change on the horizontal movements, then a one-pixel row change on the vertical movements.

The amount of movement between applications of the filter to the input image is referred to as the stride, and it is almost always symmetrical in height and width dimensions.

The default stride or strides in two dimensions is (1,1) for the height and the width movement, performed when needed. And this default works well in most cases.

The stride can be changed, which has an effect both on how the filter is applied to the image and, in turn, the size of the resulting feature map.

For example, the stride can be changed to (2,2). This has the effect of moving the filter two pixels left for each horizontal movement of the filter and two pixels down for each vertical movement of the filter when creating the feature map.

We can demonstrate this with an example using the 8×8 image with a vertical line (left) dot product (“.” operator) with the vertical line filter (right) with a stride of two pixels:

0, 0, 0   0, 1, 0
0, 0, 0 . 0, 1, 0 = 0
0, 0, 0   0, 1, 0

Moved right two pixels:

0, 1, 1   0, 1, 0
0, 1, 1 . 0, 1, 0 = 3
0, 1, 1   0, 1, 0

Moved right two pixels:

1, 0, 0   0, 1, 0
1, 0, 0 . 0, 1, 0 = 0
1, 0, 0   0, 1, 0

We can see that there are only three valid applications of the 3×3 filters to the 8×8 input image with a stride of two. This will be the same in the vertical dimension.

This has the effect of applying the filter in such a way that the normal feature map output (6×6) is down-sampled so that the size of each dimension is reduced by half (3×3), resulting in 1/4 the number of pixels (36 pixels down to 9).

The stride can be specified in Keras on the Conv2D layer via the ‘stride‘ argument and specified as a tuple with height and width.

The example demonstrates the application of our manual vertical line filter on the 8×8 input image with a convolutional layer that has a stride of two.

# example of vertical line filter with a stride of 2
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), strides=(2, 2), input_shape=(8, 8, 1)))
# summarize model
model.summary()
# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# apply filter to input data
yhat = model.predict(data)
# enumerate rows
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Running the example, we can see from the summary of the model that the shape of the output feature map will be 3×3.

Applying the handcrafted filter to the input image and printing the resulting activation feature map, we can see that, indeed, the filter still detected the vertical line, and can represent this finding with less information.

Downsampling may be desirable in some cases where deeper knowledge of the filters used in the model or of the model architecture allows for some compression in the resulting feature maps.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 3, 3, 1)           10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________


[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Posts Books
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Convolution and the convolutional layer are the major building blocks used in convolutional neural networks.

A convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.

The innovation of convolutional neural networks is the ability to automatically learn a large number of filters in parallel specific to a training dataset under the constraints of a specific predictive modeling problem, such as image classification. The result is highly specific features that can be detected anywhere on input images.

In this tutorial, you will discover how convolutions work in the convolutional neural network.

After completing this tutorial, you will know:

  • Convolutional neural networks apply a filter to an input to create a feature map that summarizes the presence of detected features in the input.
  • Filters can be handcrafted, such as line detectors, but the innovation of convolutional neural networks is to learn the filters during training in the context of a specific prediction problem.
  • How to calculate the feature map for one- and two-dimensional convolutional layers in a convolutional neural network.

Let’s get started.

A Gentle Introduction to Convolutional Layers for Deep Learning Neural Networks
Photo by mendhak, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

  1. Convolution in Convolutional Neural Networks
  2. Convolution in Computer Vision
  3. Power of Learned Filters
  4. Worked Example of Convolutional Layers
Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Convolution in Convolutional Neural Networks

The convolutional neural network, or CNN for short, is a specialized type of neural network model designed for working with two-dimensional image data, although they can be used with one-dimensional and three-dimensional data.

Central to the convolutional neural network is the convolutional layer that gives the network its name. This layer performs an operation called a “convolution“.

In the context of a convolutional neural network, a convolution is a linear operation that involves the multiplication of a set of weights with the input, much like a traditional neural network. Given that the technique was designed for two-dimensional input, the multiplication is performed between an array of input data and a two-dimensional array of weights, called a filter or a kernel.

The filter is smaller than the input data and the type of multiplication applied between a filter-sized patch of the input and the filter is a dot product. A dot product is the element-wise multiplication between the filter-sized patch of the input and filter, which is then summed, always resulting in a single value. Because it results in a single value, the operation is often referred to as the “scalar product“.

Using a filter smaller than the input is intentional as it allows the same filter (set of weights) to be multiplied by the input array multiple times at different points on the input. Specifically, the filter is applied systematically to each overlapping part or filter-sized patch of the input data, left to right, top to bottom.

This systematic application of the same filter across an image is a powerful idea. If the filter is designed to detect a specific type of feature in the input, then the application of that filter systematically across the entire input image allows the filter an opportunity to discover that feature anywhere in the image. This capability is commonly referred to as translation invariance, e.g. the general interest in whether the feature is present rather than where it was present.

Invariance to local translation can be a very useful property if we care more about whether some feature is present than exactly where it is. For example, when determining whether an image contains a face, we need not know the location of the eyes with pixel-perfect accuracy, we just need to know that there is an eye on the left side of the face and an eye on the right side of the face.

— Page 342, Deep Learning, 2016.

The output from multiplying the filter with the input array one time is a single value. As the filter is applied multiple times to the input array, the result is a two-dimensional array of output values that represent a filtering of the input. As such, the two-dimensional output array from this operation is called a “feature map“.

Once a feature map is created, we can pass each value in the feature map through a nonlinearity, such as a ReLU, much like we do for the outputs of a fully connected layer.

Example of a Filter Applied to a Two-Dimensional Input to Create a Feature Map

If you come from a digital signal processing field or related area of mathematics, you may understand the convolution operation on a matrix as something different. Specifically, the filter (kernel) is flipped prior to being applied to the input. Technically, the convolution as described in the use of convolutional neural networks is actually a “cross-correlation”. Nevertheless, in deep learning, it is referred to as a “convolution” operation.

Many machine learning libraries implement cross-correlation but call it convolution.

— Page 333, Deep Learning, 2016.

In summary, we have a input, such as an image of pixel values, and we have a filter, which is a set of weights, and the filter is systematically applied to the input data to create a feature map.

Convolution in Computer Vision

The idea of applying the convolutional operation to image data is not new or unique to convolutional neural networks; it is a common technique used in computer vision.

Historically, filters were designed by hand by computer vision experts, which were then applied to an image to result in a feature map or output from applying the filter then makes the analysis of the image easier in some way.

For example, below is a hand crafted 3×3 element filter for detecting vertical lines:

0.0, 1.0, 0.0
0.0, 1.0, 0.0
0.0, 1.0, 0.0

Applying this filter to an image will result in a feature map that only contains vertical lines. It is a vertical line detector.

You can see this from the weight values in the filter; any pixels values in the center vertical line will be positively activated and any on either side will be negatively activated. Dragging this filter systematically across pixel values in an image can only highlight vertical line pixels.

A horizontal line detector could also be created and also applied to the image, for example:

0.0, 0.0, 0.0
1.0, 1.0, 1.0
0.0, 0.0, 0.0

Combining the results from both filters, e.g. combining both feature maps, will result in all of the lines in an image being highlighted.

A suite of tens or even hundreds of other small filters can be designed to detect other features in the image.

The innovation of using the convolution operation in a neural network is that the values of the filter are weights to be learned during the training of the network.

The network will learn what types of features to extract from the input. Specifically, training under stochastic gradient descent, the network is forced to learn to extract features from the image that minimize the loss for the specific task the network is being trained to solve, e.g. extract features that are the most useful for classifying images as dogs or cats.

In this context, you can see that this is a powerful idea.

Power of Learned Filters

Learning a single filter specific to a machine learning task is a powerful technique.

Yet, convolutional neural networks achieve much more in practice.

Multiple Filters

Convolutional neural networks do not learn a single filter; they, in fact, learn multiple features in parallel for a given input.

For example, it is common for a convolutional layer to learn from 32 to 512 filters in parallel for a given input.

This gives the model 32, or even 512, different ways of extracting features from an input, or many different ways of both “learning to see” and after training, many different ways of “seeing” the input data.

This diversity allows specialization, e.g. not just lines, but the specific lines seen in your specific training data.

Multiple Channels

Color images have multiple channels, typically one for each color channel, such as red, green, and blue.

From a data perspective, that means that a single image provided as input to the model is, in fact, three images.

A filter must always have the same number of channels as the input, often referred to as “depth“. If an input image has 3 channels (e.g. a depth of 3), then a filter applied to that image must also have 3 channels (e.g. a depth of 3). In this case, a 3×3 filter would in fact be 3x3x3 or [3, 3, 3] for rows, columns, and depth. Regardless of the depth of the input and depth of the filter, the filter is applied to the input using a dot product operation which results in a single value.

This means that if a convolutional layer has 32 filters, these 32 filters are not just two-dimensional for the two-dimensional image input, but are also three-dimensional, having specific filter weights for each of the three channels. Yet, each filter results in a single feature map. Which means that the depth of the output of applying the convolutional layer with 32 filters is 32 for the 32 feature maps created.

Multiple Layers

Convolutional layers are not only applied to input data, e.g. raw pixel values, but they can also be applied to the output of other layers.

The stacking of convolutional layers allows a hierarchical decomposition of the input.

Consider that the filters that operate directly on the raw pixel values will learn to extract low-level features, such as lines.

The filters that operate on the output of the first line layers may extract features that are combinations of lower-level features, such as features that comprise multiple lines to express shapes.

This process continues until very deep layers are extracting faces, animals, houses, and so on.

This is exactly what we see in practice. The abstraction of features to high and higher orders as the depth of the network is increased.

Worked Example of Convolutional Layers

The Keras deep learning library provides a suite of convolutional layers.

We can better understand the convolution operation by looking at some worked examples with contrived data and handcrafted filters.

In this section, we’ll look at both a one-dimensional convolutional layer and a two-dimensional convolutional layer example to both make the convolution operation concrete and provide a worked example of using the Keras layers.

Example of 1D Convolutional Layer

We can define a one-dimensional input that has eight elements all with the value of 0.0, with a two element bump in the middle with the values 1.0.

[0, 0, 0, 1, 1, 0, 0, 0]

The input to Keras must be three dimensional for a 1D convolutional layer.

The first dimension refers to each input sample; in this case, we only have one sample. The second dimension refers to the length of each sample; in this case, the length is eight. The third dimension refers to the number of channels in each sample; in this case, we only have a single channel.

Therefore, the shape of the input array will be [1, 8, 1].

# define input data
data = asarray([0, 0, 0, 1, 1, 0, 0, 0])
data = data.reshape(1, 8, 1)

We will define a model that expects input samples to have the shape [8, 1].

The model will have a single filter with the shape of 3, or three elements wide. Keras refers to the shape of the filter as the kernel_size.

# create model
model = Sequential()
model.add(Conv1D(1, 3, input_shape=(8, 1)))

By default, the filters in a convolutional layer are initialized with random weights. In this contrived example, we will manually specify the weights for the single filter. We will define a filter that is capable of detecting bumps, that is a high input value surrounded by low input values, as we defined in our input example.

The three element filter we will define looks as follows:

[0, 1, 0]

The convolutional layer also has a bias input value that also requires a weight that we will set to zero.

Therefore, we can force the weights of our one-dimensional convolutional layer to use our handcrafted filter as follows:

# define a vertical line detector
weights = [asarray([[[0]],[[1]],[[0]]]), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)

The weights must be specified in a three-dimensional structure, in terms of rows, columns, and channels. The filter has a single row, three columns, and one channel.

We can retrieve the weights and confirm that they were set correctly.

# confirm they were stored
print(model.get_weights())

Finally, we can apply the single filter to our input data.

We can achieve this by calling the predict() function on the model. This will return the feature map directly: that is the output of applying the filter systematically across the input sequence.

# apply filter to input data
yhat = model.predict(data)
print(yhat)

Tying all of this together, the complete example is listed below.

# example of calculation 1d convolutions
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv1D
# define input data
data = asarray([0, 0, 0, 1, 1, 0, 0, 0])
data = data.reshape(1, 8, 1)
# create model
model = Sequential()
model.add(Conv1D(1, 3, input_shape=(8, 1)))
# define a vertical line detector
weights = [asarray([[[0]],[[1]],[[0]]]), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# confirm they were stored
print(model.get_weights())
# apply filter to input data
yhat = model.predict(data)
print(yhat)

Running the example first prints the weights of the network; that is the confirmation that our handcrafted filter was set in the model as we expected.

Next, the filter is applied to the input pattern and the feature map is calculated and displayed. We can see from the values of the feature map that the bump was detected correctly.

[array([[[0.]],
       [[1.]],
       [[0.]]], dtype=float32), array([0.], dtype=float32)]

[[[0.]
  [0.]
  [1.]
  [1.]
  [0.]
  [0.]]]

Let’s take a closer look at what happened here.

Recall that the input is an eight element vector with the values: [0, 0, 0, 1, 1, 0, 0, 0].

First, the three-element filter [0, 1, 0] was applied to the first three inputs of the input [0, 0, 0] by calculating the dot product (“.” operator), which resulted in a single output value in the feature map of zero.

Recall that a dot product is the sum of the element-wise multiplications, or here it is (0 x 0) + (1 x 0) + (0 x 0) = 0. In NumPy, this can be implemented manually as:

from numpy import asarray
print(asarray([0, 1, 0]).dot(asarray([0, 0, 0])))

In our manual example, this is as follows:

[0, 1, 0] . [0, 0, 0] = 0

The filter was then moved along one element of the input sequence and the process was repeated; specifically, the same filter was applied to the input sequence at indexes 1, 2, and 3, which also resulted in a zero output in the feature map.

[0, 1, 0] . [0, 0, 1] = 0

We are being systematic, so again, the filter is moved along one more element of the input and applied to the input at indexes 2, 3, and 4. This time the output is a value of one in the feature map. We detected the feature and activated appropriately.

[0, 1, 0] . [0, 1, 1] = 1

The process is repeated until we calculate the entire feature map.

[0, 0, 1, 1, 0, 0]

Note that the feature map has six elements, whereas our input has eight elements. This is an artefact of how the filter was applied to the input sequence. There are other ways to apply the filter to the input sequence that changes the shape of the resulting feature map, such as padding, but we will not discuss these methods in this post.

You can imagine that with different inputs, we may detect the feature with more or less intensity, and with different weights in the filter, that we would detect different features in the input sequence.

Example of 2D Convolutional Layer

We can expand the bump detection example in the previous section to a vertical line detector in a two-dimensional image.

Again, we can constrain the input, in this case to a square 8×8 pixel input image with a single channel (e.g. grayscale) with a single vertical line in the middle.

[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 1, 1, 0, 0, 0]

The input to a Conv2D layer must be four-dimensional.

The first dimension defines the samples; in this case, there is only a single sample. The second dimension defines the number of rows; in this case, eight. The third dimension defines the number of columns, again eight in this case, and finally the number of channels, which is one in this case.

Therefore, the input must have the four-dimensional shape [samples, columns, rows, channels] or [1, 8, 8, 1] in this case.

# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)

We will define the Conv2D with a single filter as we did in the previous section with the Conv1D example.

The filter will be two-dimensional and square with the shape 3×3. The layer will expect input samples to have the shape [columns, rows, channels] or [8,8,1].

# create model
model = Sequential()
model.add(Conv2D(1, (3,3), input_shape=(8, 8, 1)))

We will define a vertical line detector filter to detect the single vertical line in our input data.

The filter looks as follows:

0, 1, 0
0, 1, 0
0, 1, 0

We can implement this as follows:

# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# confirm they were stored
print(model.get_weights())

Finally, we will apply the filter to the input image, which will result in a feature map that we would expect to show the detection of the vertical line in the input image.

# apply filter to input data
yhat = model.predict(data)

The shape of the feature map output will be four-dimensional with the shape [batch, rows, columns, filters]. We will be performing a single batch and we have a single filter (one filter and one input channel), therefore the output shape is [1, ?, ?, 1]. We can pretty-print the content of the single feature map as follows:

for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Tying all of this together, the complete example is listed below.

# example of calculation 2d convolutions
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), input_shape=(8, 8, 1)))
# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# confirm they were stored
print(model.get_weights())
# apply filter to input data
yhat = model.predict(data)
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Running the example first confirms that the handcrafted filter was correctly defined in the layer weights

Next, the calculated feature map is printed. We can see from the scale of the numbers that indeed the filter has detected the single vertical line with strong activation in the middle of the feature map.

[array([[[[0.]],
        [[1.]],
        [[0.]]],
       [[[0.]],
        [[1.]],
        [[0.]]],
       [[[0.]],
        [[1.]],
        [[0.]]]], dtype=float32), array([0.], dtype=float32)]

[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

Let’s take a closer look at what was calculated.

First, the filter was applied to the top left corner of the image, or an image patch of 3×3 elements. Technically, the image patch is three dimensional with a single channel, and the filter has the same dimensions. We cannot implement this in NumPy using the dot() function, instead, we must use the tensordot() function so we can appropriately sum across all dimensions, for example:

from numpy import asarray
from numpy import tensordot
m1 = asarray([[0, 1, 0],
			  [0, 1, 0],
			  [0, 1, 0]])
m2 = asarray([[0, 0, 0],
			  [0, 0, 0],
			  [0, 0, 0]])
print(tensordot(m1, m2))

This calculation results in a..

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Data augmentation is a technique often used to improve performance and reduce generalization error when training neural network models for computer vision problems.

The image data augmentation technique can also be applied when making predictions with a fit model in order to allow the model to make predictions for multiple different versions of each image in the test dataset. The predictions on the augmented images can be averaged, which can result in better predictive performance.

In this tutorial, you will discover test-time augmentation for improving the performance of models for image classification tasks.

After completing this tutorial, you will know:

  • Test-time augmentation is the application of data augmentation techniques normally used during training when making predictions.
  • How to implement test-time augmentation from scratch in Keras.
  • How to use test-time augmentation to improve the performance of a convolutional neural network model on a standard image classification task.

Let’s get started.

How to Use Test-Time Augmentation to Improve Model Performance for Image Classification
Photo by daveynin, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. Test-Time Augmentation
  2. Test-Time Augmentation in Keras
  3. Dataset and Baseline Model
  4. Example of Test-Time Augmentation
  5. How to Tune Test-Time Augmentation Configuration
Test-Time Augmentation

Data augmentation is an approach typically used during the training of the model that expands the training set with modified copies of samples from the training dataset.

Data augmentation is often performed with image data, where copies of images in the training dataset are created with some image manipulation techniques performed, such as zooms, flips, shifts, and more.

The artificially expanded training dataset can result in a more skillful model, as often the performance of deep learning models continues to scale in concert with the size of the training dataset. In addition, the modified or augmented versions of the images in the training dataset assist the model in extracting and learning features in a way that is invariant to their position, lighting, and more.

Test-time augmentation, or TTA for short, is an application of data augmentation to the test dataset.

Specifically, it involves creating multiple augmented copies of each image in the test set, having the model make a prediction for each, then returning an ensemble of those predictions.

Augmentations are chosen to give the model the best opportunity for correctly classifying a given image, and the number of copies of an image for which a model must make a prediction is often small, such as less than 10 or 20.

Often, a single simple test-time augmentation is performed, such as a shift, crop, or image flip.

In their 2015 paper that achieved then state-of-the-art results on the ILSVRC dataset titled “Very Deep Convolutional Networks for Large-Scale Image Recognition,” the authors use horizontal flip test-time augmentation:

We also augment the test set by horizontal flipping of the images; the soft-max class posteriors of the original and flipped images are averaged to obtain the final scores for the image.

Similarly, in their 2015 paper on the inception architecture titled “Rethinking the Inception Architecture for Computer Vision,” the authors at Google use cropping test-time augmentation, which they refer to as multi-crop evaluation.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Test-Time Augmentation in Keras

Test-time augmentation is not provided natively in the Keras deep learning library but can be implemented easily.

The ImageDataGenerator class can be used to configure the choice of test-time augmentation. For example, the data generator below is configured for horizontal flip image data augmentation.

# configure image data augmentation
datagen = ImageDataGenerator(horizontal_flip=True)

The augmentation can then be applied to each sample in the test dataset separately.

First, the dimensions of the single image can be expanded from [rows][cols][channels] to [samples][rows][cols][channels], where the number of samples is one, for the single image. This transforms the array for the image into an array of samples with one image.

# convert image into dataset
samples = expand_dims(image, 0)

Next, an iterator can be created for the sample, and the batch size can be used to specify the number of augmented images to generate, such as 10.

# prepare iterator
it = datagen.flow(samples, batch_size=10)

The iterator can then be passed to the predict_generator() function of the model in order to make a prediction. Specifically, a batch of 10 augmented images will be generated and the model will make a prediction for each.

# make predictions for each augmented image
yhats = model.predict_generator(it, steps=10, verbose=0)

Finally, an ensemble prediction can be made. A prediction was made for each image, and each prediction contains a probability of the image belonging to each class, in the case of image multiclass classification.

An ensemble prediction can be made using soft voting where the probabilities of each class are summed across the predictions and a class prediction is made by calculating the argmax() of the summed predictions, returning the index or class number of the largest summed probability.

# sum across predictions
summed = numpy.sum(yhats, axis=0)
# argmax across classes
return argmax(summed)

We can tie these elements together into a function that will take a configured data generator, fit model, and single image, and will return a class prediction (integer) using test-time augmentation.

# make a prediction using test-time augmentation
def tta_prediction(datagen, model, image, n_examples):
	# convert image into dataset
	samples = expand_dims(image, 0)
	# prepare iterator
	it = datagen.flow(samples, batch_size=n_examples)
	# make predictions for each augmented image
	yhats = model.predict_generator(it, steps=n_examples, verbose=0)
	# sum across predictions
	summed = numpy.sum(yhats, axis=0)
	# argmax across classes
	return argmax(summed)

Now that we know how to make predictions in Keras using test-time augmentation, let’s work through an example to demonstrate the approach.

Dataset and Baseline Model

We can demonstrate test-time augmentation using a standard computer vision dataset and a convolutional neural network.

Before we can do that, we must select a dataset and a baseline model.

We will use the CIFAR-10 dataset, comprised of 60,000 32×32 pixel color photographs of objects from 10 classes, such as frogs, birds, cats, ships, etc. CIFAR-10 is a well-understood dataset and widely used for benchmarking computer vision algorithms in the field of machine learning. The problem is “solved.” Top performance on the problem is achieved by deep learning convolutional neural networks with a classification accuracy above 96% or 97% on the test dataset.

We will also use a convolutional neural network, or CNN, model that is capable of achieving good (better than random) results, but not state-of-the-art results, on the problem. This will be sufficient to demonstrate the lift in performance that test-time augmentation can provide.

The CIFAR-10 dataset can be loaded easily via the Keras API by calling the cifar10.load_data() function, that returns a tuple with the training and test datasets split into input (images) and output (class labels) components.

# load dataset
(trainX, trainY), (testX, testY) = load_data()

It is good practice to normalize the pixel values from the range 0-255 down to the range 0-1 prior to modeling. This ensures that the inputs are small and close to zero, and will, in turn, mean that the weights of the model will be kept small, leading to faster and better learning.

# normalize pixel values
trainX = trainX.astype('float32') / 255
testX = testX.astype('float32') / 255

The class labels are integers and must be converted to a one hot encoding prior to modeling.

This can be achieved using the to_categorical() Keras utility function.

# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)

We are now ready to define a model for this multi-class classification problem.

The model has a convolutional layer with 32 filter maps with a 3×3 kernel using the rectifier linear activation, “same” padding so the output is the same size as the input and the He weight initialization. This is followed by a batch normalization layer and a max pooling layer.

This pattern is repeated with a convolutional, batch norm, and max pooling layer, although the number of filters is increased to 64. The output is then flattened before being interpreted by a dense layer and finally provided to the output layer to make a prediction.

# define model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))

The Adam variation of stochastic gradient descent is used to find the model weights.

The categorical cross entropy loss function is used, required for multi-class classification, and classification accuracy is monitored during training.

# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

The model is fit for three training epochs and a large batch size of 128 images is used.

# fit model
model.fit(trainX, trainY, epochs=3, batch_size=128)

Once fit, the model is evaluated on the test dataset.

# evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)
print(acc)

The complete example is listed below and will easily run on the CPU in a few minutes.

# baseline cnn model for the cifar10 problem
from keras.datasets.cifar10 import load_data
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import BatchNormalization
# load dataset
(trainX, trainY), (testX, testY) = load_data()
# normalize pixel values
trainX = trainX.astype('float32') / 255
testX = testX.astype('float32') / 255
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
# define model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))
# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainY, epochs=3, batch_size=128)
# evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)
print(acc)

Running the example shows that the model is capable of learning the problem well and quickly.

A test set accuracy of about 66% is achieved, which is okay, but not terrific. The chosen model configuration has already started to overfit and could benefit from the use of regularization and further tuning. Nevertheless, this provides a good starting point for demonstrating test-time augmentation.

Epoch 1/3
50000/50000 [==============================] - 64s 1ms/step - loss: 1.2135 - acc: 0.5766
Epoch 2/3
50000/50000 [==============================] - 63s 1ms/step - loss: 0.8498 - acc: 0.7035
Epoch 3/3
50000/50000 [==============================] - 63s 1ms/step - loss: 0.6799 - acc: 0.7632
0.6679

Neural networks are stochastic algorithms and the same model fit on the same data multiple times may find a different set of weights and, in turn, have different performance each time.

In order to even out the estimate of model performance, we can change the example to re-run the fit and evaluation of the model multiple times and report the mean and standard deviation of the distribution of scores on the test dataset.

First, we can define a function named load_dataset() that will load the CIFAR-10 dataset and prepare it for modeling.

# load and return the cifar10 dataset ready for modeling
def load_dataset():
	# load dataset
	(trainX, trainY), (testX, testY) = load_data()
	# normalize pixel values
	trainX = trainX.astype('float32') / 255
	testX = testX.astype('float32') / 255
	# one hot encode target values
	trainY = to_categorical(trainY)
	testY = to_categorical(testY)
	return trainX, trainY, testX, testY

Next, we can define a function named define_model() that will define a model for the CIFAR-10 dataset, ready to be fit and then evaluated.

# define the cnn model for the cifar10 dataset
def define_model():
	# define model
	model = Sequential()
	model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))
	model.add(BatchNormalization())
	model.add(MaxPooling2D((2, 2)))
	model.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))
	model.add(BatchNormalization())
	model.add(MaxPooling2D((2, 2)))
	model.add(Flatten())
	model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
	model.add(BatchNormalization())
	model.add(Dense(10, activation='softmax'))
	# compile model
	model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
	return model

Next, an evaluate_model() function is defined that will fit the defined model on the training dataset and then evaluate it on the test dataset, returning the estimated classification accuracy for the run.

# fit and evaluate a defined model
def evaluate_model(model, trainX, trainY, testX, testY):
	# fit model
	model.fit(trainX, trainY, epochs=3, batch_size=128, verbose=0)
	# evaluate model
	_, acc = model.evaluate(testX, testY, verbose=0)
	return acc

Next, we can define a function with new behavior to repeatedly define, fit, and evaluate a new model and return the distribution of accuracy scores.

The repeated_evaluation() function below implements this, taking the dataset and using a default of 10 repeated evaluations.

# repeatedly evaluate model, return distribution of scores
def repeated_evaluation(trainX, trainY, testX, testY, repeats=10):
	scores = list()
	for _ in range(repeats):
		# define model
		model = define_model()
		# fit and evaluate model
		accuracy = evaluate_model(model, trainX, trainY, testX, testY)
		# store score
		scores.append(accuracy)
		print('> %.3f' % accuracy)
	return scores

Finally, we can call the load_dataset() function to prepare the dataset, then repeated_evaluation() to get a distribution of accuracy scores that can be summarized by reporting the mean and standard deviation.

# load dataset
trainX, trainY, testX, testY = load_dataset()
# evaluate model
scores = repeated_evaluation(trainX, trainY, testX, testY)
# summarize result
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

Tying all of this together, the complete code example of repeatedly evaluating a CNN model on the MNIST dataset is listed below.

# baseline cnn model for the cifar10 problem, repeated evaluation
from numpy import mean
from numpy import std
from keras.datasets.cifar10 import load_data
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import BatchNormalization

# load and return the cifar10 dataset ready for modeling
def load_dataset():
	# load dataset
	(trainX, trainY), (testX, testY) = load_data()
	# normalize pixel values
	trainX = trainX.astype('float32') / 255
	testX = testX.astype('float32') / 255
	# one hot encode target values
	trainY = to_categorical(trainY)
	testY = to_categorical(testY)
	return trainX, trainY, testX, testY

# define the cnn model for the cifar10 dataset
def define_model():
	# define model
	model = Sequential()
	model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform', input_shape=(32, 32, 3)))
	model.add(BatchNormalization())
	model.add(MaxPooling2D((2, 2)))
	model.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_uniform'))
	model.add(BatchNormalization())
	model.add(MaxPooling2D((2, 2)))
	model.add(Flatten())
	model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
	model.add(BatchNormalization())
	model.add(Dense(10, activation='softmax'))
	# compile model
	model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
	return model

# fit and evaluate a defined model
def evaluate_model(model, trainX, trainY, testX, testY):
	# fit model
	model.fit(trainX, trainY, epochs=3, batch_size=128, verbose=0)
	# evaluate model
	_, acc = model.evaluate(testX, testY, verbose=0)
	return acc

# repeatedly evaluate model, return distribution of scores
def repeated_evaluation(trainX, trainY, testX, testY, repeats=10):
	scores = list()
	for _ in range(repeats):
		# define model
		model = define_model()
		# fit and evaluate model
		accuracy = evaluate_model(model, trainX, trainY, testX, testY)
		# store score
		scores.append(accuracy)
		print('> %.3f' % accuracy)
	return scores

# load dataset
trainX, trainY, testX, testY = load_dataset()
# evaluate model
scores = repeated_evaluation(trainX, trainY, testX, testY)
# summarize result
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

Running the example may take a while on modern CPU hardware and is much faster on GPU hardware.

The accuracy of the model is reported for each repeated evaluation and the final mean model performance is reported.

In this case, we can see that the mean accuracy of the chosen model configuration is about 68%, which is close to the estimate from a single model run.

> 0.690
> 0.662
> 0.698
> 0.681
> 0.686
> 0.680
> 0.697
> 0.696
> 0.689
> 0.679
Accuracy: 0.686 (0.010)

Now that we have developed a baseline model for a standard dataset, let’s look at updating the example to use test-time augmentation.

Example of Test-Time Augmentation

We can now update our repeated evaluation of the CNN model on CIFAR-10 to use test-time augmentation.

The tta_prediction() function developed in the section above on how to implement test-time augmentation in Keras can be used directly.

# make a prediction using test-time augmentation
def tta_prediction(datagen, model, image, n_examples):
	# convert image into dataset
	samples = expand_dims(image, 0)
	# prepare iterator
	it = datagen.flow(samples, batch_size=n_examples)
	# make predictions for each augmented image
	yhats = model.predict_generator(it, steps=n_examples, verbose=0)
	# sum across predictions
	summed = numpy.sum(yhats, axis=0)
	# argmax across classes
	return argmax(summed)

We can develop a function that will drive the test-time augmentation by defining the ImageDataGenerator configuration and call tta_prediction() for each image in the test dataset.

It is important to consider the types of image augmentations that may benefit a model fit on the CIFAR-10 dataset. Augmentations that cause minor modifications to the photographs might be useful. This might include augmentations such as zooms, shifts, and horizontal flips.

In this example, we will only use horizontal flips.

# configure image data augmentation
datagen = ImageDataGenerator(horizontal_flip=True)

We will configure the image generator to create seven photos, from which the mean prediction for each example in the test set will be made.

The tta_evaluate_model() function below configures the ImageDataGenerator then enumerates the test dataset, making a class label prediction for each image in the test dataset. The accuracy is then calculated by comparing the predicted class labels to the class labels in the test dataset. This requires that we reverse the one hot encoding performed in load_dataset() by using argmax().

# evaluate a model on a dataset using test-time augmentation
def tta_evaluate_model(model, testX, testY):
	# configure image data augmentation
	datagen = ImageDataGenerator(horizontal_flip=True)
	# define the number of augmented images to generate per test set image
	n_examples_per_image = 7
	yhats = list()
	for i in range(len(testX)):
		# make augmented prediction
		yhat = tta_prediction(datagen, model, testX[i], n_examples_per_image)
		# store for evaluation
		yhats.append(yhat)
	# calculate accuracy
	testY_labels = argmax(testY, axis=1)
	acc = accuracy_score(testY_labels, yhats)
	return acc

The evaluate_model() function can then be updated to call tta_evaluate_model() in order to get model accuracy scores.

# fit and evaluate a defined model
def evaluate_model(model, trainX, trainY, testX, testY):
	# fit model
	model.fit(trainX, trainY, epochs=3, batch_size=128, verbose=0)
	# evaluate model using tta
	acc = tta_evaluate_model(model, testX, testY)
	return acc

Tying all of this together, the complete example of the repeated evaluation of a CNN for CIFAR-10 with test-time augmentation is listed below.

# cnn model for the cifar10 problem with test-time augmentation
import numpy
from numpy import argmax
from numpy import mean
from numpy import std
from numpy import expand_dims
from sklearn.metrics import..
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Training deep learning neural network models on more data can result in more skillful models, and the augmentation techniques can create variations of the images that can improve the ability of the fit models to generalize what they have learned to new images.

The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

In this tutorial, you will discover how to use image data augmentation when training deep learning neural networks.

After completing this tutorial, you will know:

  • Image data augmentation is used to expand the training dataset in order to improve the performance and ability of the model to generalize.
  • Image data augmentation is supported in the Keras deep learning library via the ImageDataGenerator class.
  • How to use shift, flip, brightness, and zoom image data augmentation.

Let’s get started.

Tutorial Overview

This tutorial is divided into eight parts; they are:

  1. Image Data Augmentation
  2. Sample Image
  3. Image Augmentation With ImageDataGenerator
  4. Horizontal and Vertical Shift Augmentation
  5. Horizontal and Vertical Flip Augmentation
  6. Random Rotation Augmentation
  7. Random Brightness Augmentation
  8. Random Zoom Augmentation
Image Data Augmentation

The performance of deep learning neural networks often improves with the amount of data available.

Data augmentation is a technique to artificially create new training data from existing training data. This is done by applying domain-specific techniques to examples from the training data that create new and different training examples.

Image data augmentation is perhaps the most well-known type of data augmentation and involves creating transformed versions of images in the training dataset that belong to the same class as the original image.

Transforms include a range of operations from the field of image manipulation, such as shifts, flips, zooms, and much more.

The intent is to expand the training dataset with new, plausible examples. This means, variations of the training set images that are likely to be seen by the model. For example, a horizontal flip of a picture of a cat may make sense, because the photo could have been taken from the left or right. A vertical flip of the photo of a cat does not make sense and would probably not be appropriate given that the model is very unlikely to see a photo of an upside down cat.

As such, it is clear that the choice of the specific data augmentation techniques used for a training dataset must be chosen carefully and within the context of the training dataset and knowledge of the problem domain. In addition, it can be useful to experiment with data augmentation methods in isolation and in concert to see if they result in a measurable improvement to model performance, perhaps with a small prototype dataset, model, and training run.

Modern deep learning algorithms, such as the convolutional neural network, or CNN, can learn features that are invariant to their location in the image. Nevertheless, augmentation can further aid in this transform invariant approach to learning and can aid the model in learning features that are also invariant to transforms such as left-to-right to top-to-bottom ordering, light levels in photographs, and more.

Image data augmentation is typically only applied to the training dataset, and not to the validation or test dataset. This is different from data preparation such as image resizing and pixel scaling; they must be performed consistently across all datasets that interact with the model.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Sample Image

We need a sample image to demonstrate standard data augmentation techniques.

In this tutorial, we will use a photograph of a bird titled “Feathered Friend” by AndYaDontStop, released under a permissive license.

Download the image and save it in your current working directory with the filename ‘bird.jpg‘.

Feathered Friend, taken by AndYaDontStop.
Some rights reserved.

Image Augmentation With ImageDataGenerator

The Keras deep learning library provides the ability to use data augmentation automatically when training a model.

This is achieved by using the ImageDataGenerator class.

First, the class may be instantiated and the configuration for the types of data augmentation are specified by arguments to the class constructor.

A range of techniques are supported, as well as pixel scaling methods. We will focus on five main types of data augmentation techniques for image data; specifically:

  • Image shifts via the width_shift_range and height_shift_range arguments.
  • Image flips via the horizontal_flip and vertical_flip arguments.
  • Image rotations via the rotation_range argument
  • Image brightness via the brightness_range argument.
  • Image zoom via the zoom_range argument.

For example, an instance of the ImageDataGenerator class can be constructed.

...
# create data generator
datagen = ImageDataGenerator()

Once constructed, an iterator can be created for an image dataset.

The iterator will return one batch of augmented images for each iteration.

An iterator can be created from an image dataset loaded in memory via the flow() function; for example:

...
# load image dataset
X, y = ...
# create iterator
it = dataset.flow(X, y)

Alternately, an iterator can be created for an image dataset located on disk in a specified directory, where images in that directory are organized into subdirectories according to their class.

...
# create iterator
it = dataset.flow_from_directory(X, y, ...)

Once the iterator is created, it can be used to train a neural network model by calling the fit_generator() function.

The steps_per_epoch argument must specify the number of batches of samples comprising one epoch. For example, if your original dataset has 10,000 images and your batch size is 32, then a reasonable value for steps_per_epoch when fitting a model on the augmented data might be ceil(10,000/32), or 313 batches.

# define model
model = ...
# fit model on the augmented dataset
model.fit_generator(it, steps_per_epoch=313, ...)

The images in the dataset are not used directly. Instead, only augmented images are provided to the model. Because the augmentations are performed randomly, this allows both modified images and close facsimiles of the original images (e.g. almost no augmentation) to be generated and used during training.

A data generator can also be used to specify the validation dataset and the test dataset. Often, a separate ImageDataGenerator instance is used that may have the same pixel scaling configuration (not covered in this tutorial) as the ImageDataGenerator instance used for the training dataset, but would not use data augmentation. This is because data augmentation is only used as a technique for artificially extending the training dataset in order to improve model performance on an unaugmented dataset.

Now that we are familiar with how to use the ImageDataGenerator, let’s look at some specific data augmentation techniques for image data.

We will demonstrate each technique standalone by reviewing examples of images after they have been augmented. This is a good practice and is recommended when configuring your data augmentation. It is also common to use a range of augmentation techniques at the same time when training. We have isolated the techniques to one per section for demonstration purposes only.

Horizontal and Vertical Shift Augmentation

A shift to an image means moving all pixels of the image in one direction, such as horizontally or vertically, while keeping the image dimensions the same.

This means that some of the pixels will be clipped off the image and there will be a region of the image where new pixel values will have to be specified.

The width_shift_range and height_shift_range arguments to the ImageDataGenerator constructor control the amount of horizontal and vertical shift respectively.

These arguments can specify a floating point value that indicates the percentage (between 0 and 1) of the width or height of the image to shift. Alternately, a number of pixels can be specified to shift the image.

Specifically, a value in the range between no shift and the percentage or pixel value will be sampled for each image and the shift performed, e.g. [0, value]. Alternately, you can specify a tuple or array of the min and max range from which the shift will be sampled; for example: [-100, 100] or [-0.5, 0.5].

The example below demonstrates a horizontal shift with the width_shift_range argument between [-200,200] pixels and generates a plot of generated images to demonstrate the effect.

# example of horizontal shift image augmentation
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(width_shift_range=[-200,200])
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# generate batch of images
	batch = it.next()
	# convert to unsigned integers for viewing
	image = batch[0].astype('uint32')
	# plot raw pixel data
	pyplot.imshow(image)
# show the figure
pyplot.show()

Running the example creates the instance of ImageDataGenerator configured for image augmentation, then creates the iterator. The iterator is then called nine times in a loop and each augmented image is plotted.

We can see in the plot of the result that a range of different randomly selected positive and negative horizontal shifts was performed and the pixel values at the edge of the image are duplicated to fill in the empty part of the image created by the shift.

Plot of Augmented Generated With a Random Horizontal Shift

Below is the same example updated to perform vertical shifts of the image via the height_shift_range argument, in this case specifying the percentage of the image to shift as 0.5 the height of the image.

# example of vertical shift image augmentation
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(height_shift_range=0.5)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# generate batch of images
	batch = it.next()
	# convert to unsigned integers for viewing
	image = batch[0].astype('uint32')
	# plot raw pixel data
	pyplot.imshow(image)
# show the figure
pyplot.show()

Running the example creates a plot of images augmented with random positive and negative vertical shifts.

We can see that both horizontal and vertical positive and negative shifts probably make sense for the chosen photograph, but in some cases, the replicated pixels at the edge of the image may not make sense to a model.

Note that other fill modes can be specified via “fill_mode” argument.

Plot of Augmented Images With a Random Vertical Shift

Horizontal and Vertical Flip Augmentation

An image flip means reversing the rows or columns of pixels in the case of a vertical or horizontal flip respectively.

The flip augmentation is specified by a boolean horizontal_flip or vertical_flip argument to the ImageDataGenerator class constructor. For photographs like the bird photograph used in this tutorial, horizontal flips may make sense, but vertical flips would not.

For other types of images, such as aerial photographs, cosmology photographs, and microscopic photographs, perhaps vertical flips make sense.

The example below demonstrates augmenting the chosen photograph with horizontal flips via the horizontal_flip argument.

# example of horizontal flip image augmentation
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(horizontal_flip=True)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# generate batch of images
	batch = it.next()
	# convert to unsigned integers for viewing
	image = batch[0].astype('uint32')
	# plot raw pixel data
	pyplot.imshow(image)
# show the figure
pyplot.show()

Running the example creates a plot of nine augmented images.

We can see that the horizontal flip is applied randomly to some images and not others.

Plot of Augmented Images With a Random Horizontal Flip

Random Rotation Augmentation

A rotation augmentation randomly rotates the image clockwise by a given number of degrees from 0 to 360.

The rotation will likely rotate pixels out of the image frame and leave areas of the frame with no pixel data that must be filled in.

The example below demonstrates random rotations via the rotation_range argument, with rotations to the image between 0 and 90 degrees.

# example of random rotation image augmentation
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(rotation_range=90)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# generate batch of images
	batch = it.next()
	# convert to unsigned integers for viewing
	image = batch[0].astype('uint32')
	# plot raw pixel data
	pyplot.imshow(image)
# show the figure
pyplot.show()

Running the example generates examples of the rotated image, showing in some cases pixels rotated out of the frame and the nearest-neighbor fill.

Plot of Images Generated With a Random Rotation Augmentation

Random Brightness Augmentation

The brightness of the image can be augmented by either randomly darkening images, brightening images, or both.

The intent is to allow a model to generalize across images trained on different lighting levels.

This can be achieved by specifying the brightness_range argument to the ImageDataGenerator() constructor that specifies min and max range as a float representing a percentage for selecting a brightening amount.

Values less than 1.0 darken the image, e.g. [0.5, 1.0], whereas values larger than 1.0 brighten the image, e.g. [1.0, 1.5], where 1.0 has no effect on brightness.

The example below demonstrates a brightness image augmentation, allowing the generator to randomly darken the image between 1.0 (no change) and 0.2 or 20%.

# example of brighting image augmentation
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(brightness_range=[0.2,1.0])
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# generate batch of images
	batch = it.next()
	# convert to unsigned integers for viewing
	image = batch[0].astype('uint32')
	# plot raw pixel data
	pyplot.imshow(image)
# show the figure
pyplot.show()

Running the example shows the augmented images with varying amounts of darkening applied.

Plot of Images Generated With a Random Brightness Augmentation

Random Zoom Augmentation

A zoom augmentation randomly zooms the image in and either adds new pixel values around the image or interpolates pixel values respectively.

Image zooming can be configured by the zoom_range argument to the ImageDataGenerator constructor. You can specify the percentage of the zoom as a single float or a range as an array or tuple.

If a float is specified, then the range for the zoom will be [1-value, 1+value]. For example, if you specify 0.3, then the range will be [0.7, 1.3], or between 70% (zoom in) and 130% (zoom out).

The zoom amount is uniformly randomly sampled from the zoom region for each dimension (width, height) separately.

The zoom may not feel intuitive. Note that zoom values less than 1.0 will zoom the image in, e.g. [0.5,0.5] makes the object in the image 50% larger or closer, and values larger than 1.0 will zoom the image out by 50%, e.g. [1.5, 1.5] makes the object in the image smaller or further away. A zoom of [1.0,1.0] has no effect.

The example below demonstrates zooming the image in, e.g. making the object in the photograph larger.

# example of zoom image augmentation
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create..
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

There are conventions for storing and structuring your image dataset on disk in order to make it fast and efficient to load and when training and evaluating deep learning models.

Once structured, you can use tools like the ImageDataGenerator class in the Keras deep learning library to automatically load your train, test, and validation datasets. In addition, the generator will progressively load the images in your dataset, allowing you to work with both small and very large datasets containing thousands or millions of images that may not fit into system memory.

In this tutorial, you will discover how to structure an image dataset and how to load it progressively when fitting and evaluating a deep learning model.

After completing this tutorial, you will know:

  • How to organize train, test, and validation image datasets into a consistent directory structure.
  • How to use the ImageDataGenerator class to progressively load the images for a given dataset.
  • How to use a prepared data generator to train, evaluate, and make predictions with a deep learning model.

Let’s get started.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Dataset Directory Structure
  2. Example Dataset Structure
  3. How to Progressively Load Images
Dataset Directory Structure

There is a standard way to lay out your image data for modeling.

After you have collected your images, you must sort them first by dataset, such as train, test, and validation, and second by their class.

For example, imagine an image classification problem where we wish to classify photos of cars based on their color, e.g. red cars, blue cars, etc.

First, we have a data/ directory where we will store all of the image data.

Next, we will have a data/train/ directory for the training dataset and a data/test/ for the holdout test dataset. We may also have a data/validation/ for a validation dataset during training.

So far, we have:

data/
data/train/
data/test/
data/validation/

Under each of the dataset directories, we will have subdirectories, one for each class where the actual image files will be placed.

For example, if we have a binary classification task for classifying photos of cars as either a red car or a blue car, we would have two classes, ‘red‘ and ‘blue‘, and therefore two class directories under each dataset directory.

For example:

data/
data/train/
data/train/red/
data/train/blue/
data/test/
data/test/red/
data/test/blue/
data/validation/
data/validation/red/
data/validation/blue/

Images of red cars would then be placed in the appropriate class directory.

For example:

data/train/red/car01.jpg
data/train/red/car02.jpg
data/train/red/car03.jpg
...
data/train/blue/car01.jpg
data/train/blue/car02.jpg
data/train/blue/car03.jpg
...

Remember, we are not placing the same files under the red/ and blue/ directories; instead, there are different photos of red cars and blue cars respectively.

Also recall that we require different photos in the train, test, and validation datasets.

The filenames used for the actual images often do not matter as we will load all images with given file extensions.

A good naming convention, if you have the ability to rename files consistently, is to use some name followed by a number with zero padding, e.g. image0001.jpg if you have thousands of images for a class.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Example Dataset Structure

We can make the image dataset structure concrete with an example.

Imagine we are classifying photographs of cars, as we discussed in the previous section. Specifically, a binary classification problem with red cars and blue cars.

We must create the directory structure outlined in the previous section, specifically:

data/
data/train/
data/train/red/
data/train/blue/
data/test/
data/test/red/
data/test/blue/
data/validation/
data/validation/red/
data/validation/blue/

Let’s actually create these directories.

We can also put some photos in the directories.

You can use the creative commons image search to find some images with a permissive license that you can download and use for this example.

I will use two images:

Red Car, by Dennis Jarvis

Blue Car, by Bill Smith

Download the photos to your current working directory and save the photo of the red car as ‘red_car_01.jpg‘ and the photo of the blue car as ‘blue_car_01.jpg‘.

We must have different photos for each of the train, test, and validation datasets.

In the interest of keeping this tutorial focused, we will re-use the same image files in each of the three datasets but pretend they are different photographs.

Place copies of the ‘red_car_01.jpg‘ file in data/train/red/, data/test/red/, and data/validation/red/ directories.

Now place copies of the ‘blue_car_01.jpg‘ file in data/train/blue/, data/test/blue/, and data/validation/blue/ directories.

We now have a very basic dataset layout that looks like the following (output from the tree command):

data
├── test
│   ├── blue
│   │   └── blue_car_01.jpg
│   └── red
│       └── red_car_01.jpg
├── train
│   ├── blue
│   │   └── blue_car_01.jpg
│   └── red
│       └── red_car_01.jpg
└── validation
    ├── blue
    │   └── blue_car_01.jpg
    └── red
        └── red_car_01.jpg

Below is a screenshot of the directory structure, taken from the Finder window on macOS.

Screenshot of Image Dataset Directory and File Structure

Now that we have a basic directory structure, let’s practice loading image data from file for use with modeling.

How to Progressively Load Images

It is possible to write code to manually load image data and return data ready for modeling.

This would include walking the directory structure for a dataset, loading image data, and returning the input (pixel arrays) and output (class integer).

Thankfully, we don’t need to write this code. Instead, we can use the ImageDataGenerator class provided by Keras.

The main benefit of using this class to load the data is that images are loaded for a single dataset in batches, meaning that it can be used for loading both small datasets as well as very large image datasets with thousands or millions of images.

Instead of loading all images into memory, it will load just enough images into memory for the current and perhaps the next few mini-batches when training and evaluating a deep learning model. I refer to this as progressive loading, as the dataset is progressively loaded from file, retrieving just enough data for what is needed immediately.

Two additional benefits of the using the ImageDataGenerator class is that it can also automatically scale pixel values of images and it can automatically generate augmented versions of images. We will leave these topics for discussion in another tutorial and instead focus on how to use the ImageDataGenerator class to load image data from file.

The pattern for using the ImageDataGenerator class is used as follows:

  1. Construct and configure an instance of the ImageDataGenerator class.
  2. Retrieve an iterator by calling the flow_from_directory() function.
  3. Use the iterator in the training or evaluation of a model.

Let’s take a closer look at each step.

The constructor for the ImageDataGenerator contains many arguments to specify how to manipulate the image data after it is loaded, including pixel scaling and data augmentation. We do not need any of these features at this stage, so configuring the ImageDataGenerator is easy.

...
# create a data generator
datagen = ImageDataGenerator()

Next, an iterator is required to progressively load images for a single dataset.

This requires calling the flow_from_directory() function and specifying the dataset directory, such as the train, test, or validation directory.

The function also allows you to configure more details related to the loading of images. Of note is the ‘target_size‘ argument that allows you to load all images to a specific size, which is often required when modeling. The function defaults to square images with the size (256, 256).

The function also allows you to specify the type of classification task via the ‘class_mode‘ argument, specifically whether it is ‘binary‘ or a multi-class classification ‘categorical‘.

The default ‘batch_size‘ is 32, which means that 32 randomly selected images from across the classes in the dataset will be returned in each batch when training. Larger or smaller batches may be desired. You may also want to return batches in a deterministic order when evaluating a model, which you can do by setting ‘shuffle‘ to ‘False.’

There are many other options, and I encourage you to review the API documentation.

We can use the same ImageDataGenerator to prepare separate iterators for separate dataset directories. This is useful if we would like the same pixel scaling applied to multiple datasets (e.g. trian, test, etc.).

...
# load and iterate training dataset
train_it = datagen.flow_from_directory('data/train/', class_mode='binary', batch_size=64)
# load and iterate validation dataset
val_it = datagen.flow_from_directory('data/validation/', class_mode='binary', batch_size=64)
# load and iterate test dataset
test_it = datagen.flow_from_directory('data/test/', class_mode='binary', batch_size=64)

Once the iterators have been prepared, we can use them when fitting and evaluating a deep learning model.

For example, fitting a model with a data generator can be achieved by calling the fit_generator() function on the model and passing the training iterator (train_it). The validation iterator (val_it) can be specified when calling this function via the ‘validation_data‘ argument.

The ‘steps_per_epoch‘ argument must be specified for the training iterator in order to define how many batches of images defines a single epoch.

For example, if you have 1,000 images in the training dataset (across all classes) and a batch size of 64, then the steps_per_epoch would be about 16, or 1000/64.

Similarly, if a validation iterator is applied, then the ‘validation_steps‘ argument must also be specified to indicate the number of batches in the validation dataset defining one epoch.

...
# define model
model = ...
# fit model
model.fit_generator(train_it, steps_per_epoch=16, validation_data=val_it, validation_steps=8)

Once the model is fit, it can be evaluated on a test dataset using the evaluate_generator() function and passing in the test iterator (test_it). The ‘steps‘ argument defines the number of batches of samples to step through when evaluating the model before stopping.

...
# evaluate model
loss = model.evaluate_generator(test_it, steps=24)

Finally, if you want to use your fit model for making predictions on a very large dataset, you can create an iterator for that dataset as well (e.g. predict_it) and call the predict_generator() function on the model.

...
# make a prediction
yhat = model.predict_generator(predict_it, steps=24)

Let’s use our small dataset defined in the previous section to demonstrate how to define an ImageDataGenerator instance and prepare the dataset iterators.

A complete example is listed below.

# example of progressively loading images from file
from keras.preprocessing.image import ImageDataGenerator
# create generator
datagen = ImageDataGenerator()
# prepare an iterators for each dataset
train_it = datagen.flow_from_directory('data/train/', class_mode='binary')
val_it = datagen.flow_from_directory('data/validation/', class_mode='binary')
test_it = datagen.flow_from_directory('data/test/', class_mode='binary')
# confirm the iterator works
batchX, batchy = train_it.next()
print('Batch shape=%s, min=%.3f, max=%.3f' % (batchX.shape, batchX.min(), batchX.max()))

Running the example first creates an instance of the ImageDataGenerator with all default configuration.

Next, three iterators are created, one for each of the train, validation, and test binary classification datasets. As each iterator is created, we can see debug messages reporting the number of images and classes discovered and prepared.

Finally, we test out the train iterator that would be used to fit a model. The first batch of images is retrieved and we can confirm that the batch contains two images, as only two images were available. We can also confirm that the images were loaded and forced to the square dimensions of 256 rows and 256 columns of pixels and the pixel data was not scaled and remains in the range [0, 255].

Found 2 images belonging to 2 classes.
Found 2 images belonging to 2 classes.
Found 2 images belonging to 2 classes.
Batch shape=(2, 256, 256, 3), min=0.000, max=255.000

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

API Articles Summary

In this tutorial, you discovered how to structure an image dataset and how to load it progressively when fitting and evaluating a deep learning model.

Specifically, you learned:

  • How to organize train, test, and validation image datasets into a consistent directory structure.
  • How to use the ImageDataGenerator class to progressively load the images for a given dataset.
  • How to use a prepared data generator to train, evaluate, and make predictions with a deep learning model.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Load Large Datasets From Directories for Deep Learning with Keras appeared first on Machine Learning Mastery.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Deep Learning for Computer Vision Crash Course.
Bring Deep Learning Methods to Your Computer Vision Project in 7 Days.

We are awash in digital images from photos, videos, Instagram, YouTube, and increasingly live video streams.

Working with image data is hard as it requires drawing upon knowledge from diverse domains such as digital signal processing, machine learning, statistical methods, and these days, deep learning.

Deep learning methods are out-competing the classical and statistical methods on some challenging computer vision problems with singular and simpler models.

In this crash course, you will discover how you can get started and confidently develop deep learning for computer vision problems using Python in seven days.

Note: This is a big and important post. You might want to bookmark it.

Let’s get started.

How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)
Photo by oliver.dodd, some rights reserved.

Who Is This Crash-Course For?

Before we get started, let’s make sure you are in the right place.

The list below provides some general guidelines as to who this course was designed for.

Don’t panic if you don’t match these points exactly; you might just need to brush up in one area or another to keep up.

You need to know:

  • You need to know your way around basic Python, NumPy, and Keras for deep learning.

You do NOT need to be:

  • You do not need to be a math wiz!
  • You do not need to be a deep learning expert!
  • You do not need to be a computer vision researcher!

This crash course will take you from a developer that knows a little machine learning to a developer who can bring deep learning methods to your own computer vision project.

Note: This crash course assumes you have a working Python 2 or 3 SciPy environment with at least NumPy, Pandas, scikit-learn, and Keras 2 installed. If you need help with your environment, you can follow the step-by-step tutorial here:

Crash-Course Overview

This crash course is broken down into seven lessons.

You could complete one lesson per day (recommended) or complete all of the lessons in one day (hardcore). It really depends on the time you have available and your level of enthusiasm.

Below are the seven lessons that will get you started and productive with deep learning for computer vision in Python:

  • Lesson 01: Deep Learning and Computer Vision
  • Lesson 02: Preparing Image Data
  • Lesson 03: Convolutional Neural Networks
  • Lesson 04: Image Classification
  • Lesson 05: Train Image Classification Model
  • Lesson 06: Image Augmentation
  • Lesson 07: Face Detection

Each lesson could take you anywhere from 60 seconds up to 30 minutes. Take your time and complete the lessons at your own pace. Ask questions and even post results in the comments below.

The lessons might expect you to go off and find out how to do things. I will give you hints, but part of the point of each lesson is to force you to learn where to go to look for help on and about the deep learning, computer vision, and the best-of-breed tools in Python (hint: I have all of the answers on this blog, just use the search box).

Post your results in the comments; I’ll cheer you on!

Hang in there; don’t give up.

Note: This is just a crash course. For a lot more detail and fleshed out tutorials, see my book on the topic titled “Deep Learning for Computer Vision.”

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Lesson 01: Deep Learning and Computer Vision

In this lesson, you will discover the promise of deep learning methods for computer vision.

Computer Vision

Computer Vision, or CV for short, is broadly defined as helping computers to “see” or extract meaning from digital images such as photographs and videos.

Researchers have been working on the problem of helping computers see for more than 50 years, and some great successes have been achieved, such as the face detection available in modern cameras and smartphones.

The problem of understanding images is not solved, and may never be. This is primarily because the world is complex and messy. There are few rules. And yet we can easily and effortlessly recognize objects, people, and context.

Deep Learning

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

A property of deep learning is that the performance of this type of model improves by training it with more examples and by increasing its depth or representational capacity.

In addition to scalability, another often-cited benefit of deep learning models is their ability to perform automatic feature extraction from raw data, also called feature learning.

Promise of Deep Learning for Computer vision

Deep learning methods are popular for computer vision, primarily because they are delivering on their promise.

Some of the first large demonstrations of the power of deep learning were in computer vision, specifically image classification. More recently in object detection and face recognition.

The three key promises of deep learning for computer vision are as follows:

  • The Promise of Feature Learning. That is, that deep learning methods can automatically learn the features from image data required by the model, rather than requiring that the feature detectors be handcrafted and specified by an expert.
  • The Promise of Continued Improvement. That is, that the performance of deep learning in computer vision is based on real results and that the improvements appear to be continuing and perhaps speeding up.
  • The Promise of End-to-End Models. That is, that large end-to-end deep learning models can be fit on large datasets of images or video offering a more general and better-performing approach.

Computer vision is not “solved” but deep learning is required to get you to the state-of-the-art on many challenging problems in the field.

Your Task

For this lesson, you must research and list five impressive applications of deep learning methods in the field of computer vision. Bonus points if you can link to a research paper that demonstrates the example.

Post your answer in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to prepare image data for modeling.

Lesson 02: Preparing Image Data

In this lesson, you will discover how to prepare image data for modeling.

Images are comprised of matrices of pixel values.

Pixel values are often unsigned integers in the range between 0 and 255. Although these pixel values can be presented directly to neural network models in their raw format, this can result in challenges during modeling, such as slower than expected training of the model.

Instead, there can be great benefit in preparing the image pixel values prior to modeling, such as simply scaling pixel values to the range 0-1 to centering and even standardizing the values.

This is called normalization and can be performed directly on a loaded image. The example below uses the PIL library (the standard image handling library in Python) to load an image and normalize its pixel values.

First, confirm that you have the Pillow library installed; it is installed with most SciPy environments, but you can learn more here:

Next, download a photograph of Bondi Beach in Sydney Australia, taken by Isabell Schulz and released under a permissive license. Save the image in your current working directory with the filename ‘bondi_beach.jpg‘.

Next, we can use the Pillow library to load the photo, confirm the min and max pixel values, normalize the values, and confirm the normalization was performed.

# example of pixel normalization
from numpy import asarray
from PIL import Image
# load image
image = Image.open('bondi_beach.jpg')
pixels = asarray(image)
# confirm pixel range is 0-255
print('Data Type: %s' % pixels.dtype)
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))
# convert from integers to floats
pixels = pixels.astype('float32')
# normalize to the range 0-1
pixels /= 255.0
# confirm the normalization
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))

Your Task

Your task in this lesson is to run the example code on the provided photograph and report the min and max pixel values before and after the normalization.

For bonus points, you can update the example to standardize the pixel values.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover information about convolutional neural network models.

Lesson 03: Convolutional Neural Networks

In this lesson, you will discover how to construct a convolutional neural network using a convolutional layer, pooling layer, and fully connected output layer.

Convolutional Layers

A convolution is the simple application of a filter to an input that results in an activation. Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.

A convolutional layer can be created by specifying both the number of filters to learn and the fixed size of each filter, often called the kernel shape.

Pooling Layers

Pooling layers provide an approach to downsampling feature maps by summarizing the presence of features in patches of the feature map.

Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.

Classifier Layer

Once the features have been extracted, they can be interpreted and used to make a prediction, such as classifying the type of object in a photograph.

This can be achieved by first flattening the two-dimensional feature maps, and then adding a fully connected output layer. For a binary classification problem, the output layer would have one node that would predict a value between 0 and 1 for the two classes.

Convolutional Neural Network

The example below creates a convolutional neural network that expects grayscale images with the square size of 256×256 pixels, with one convolutional layer with 32 filters, each with the size of 3×3 pixels, a max pooling layer, and a binary classification output layer.

# cnn with single convolutional, pooling and output layer
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
# create model
model = Sequential()
# add convolutional layer
model.add(Conv2D(32, (3,3), input_shape=(256, 256, 1)))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.summary()

Your Task

Your task in this lesson is to run the example and describe how the shape of an input image would be changed by the convolutional and pooling layers.

For extra points, you could try adding more convolutional or pooling layers and describe the effect it has on the image as it flows through the model.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will learn how to use a deep convolutional neural network to classify photographs of objects.

Lesson 04: Image Classification

In this lesson, you will discover how to use a pre-trained model to classify photographs of objects.

Deep convolutional neural network models may take days, or even weeks, to train on very large datasets.

A way to short-cut this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet image recognition tasks.

The example below uses the VGG-16 pre-trained model to classify photographs of objects into one of 1,000 known classes.

Download this photograph of a dog taken by Justin Morgan and released under a permissive license. Save it in your current working directory with the filename ‘dog.jpg‘.

The example below will load the photograph and output a prediction, classifying the object in the photograph.

Note: The first time you run the example, the pre-trained model will have to be downloaded, which is a few hundred megabytes and make take a few minutes based on the speed of your internet connection.

# example of using a pre-trained model as a classifier
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from keras.applications.vgg16 import VGG16
# load an image from file
image = load_img('dog.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# load the model
model = VGG16()
# predict the probability across all output classes
yhat = model.predict(image)
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the most likely result, e.g. highest probability
label = label[0][0]
# print the classification
print('%s (%.2f%%)' % (label[1], label[2]*100))

Your Task

Your task in this lesson is to run the example and report the result.

For bonus points, try running the example on another photograph of a common object.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to fit and evaluate a model for image classification.

Lesson 05: Train Image Classification Model

In this lesson, you will discover how to train and evaluate a convolutional neural network for image classification.

The Fashion-MNIST clothing classification problem is a new standard dataset used in computer vision and deep learning.

It is a dataset comprised of 60,000 small square 28×28 pixel grayscale images of items of 10 types of clothing, such as shoes, t-shirts, dresses, and more.

The example below loads the dataset, scales the pixel values, then fits a convolutional neural network on the training dataset and evaluates the performance of the network on the test dataset.

The example will run in just a few minutes on a modern CPU; no GPU is required.

# fit a cnn on the fashion mnist dataset
from keras.datasets import fashion_mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
# load dataset
(trainX, trainY), (testX, testY) = fashion_mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))
# convert from integers to floats
trainX, testX = trainX.astype('float32'), testX.astype('float32')
# normalize to range 0-1
trainX,testX  = trainX / 255.0, testX / 255.0
# one hot encode target values
trainY, testY = to_categorical(trainY), to_categorical(testY)
# define model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# fit model
model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=2)
# evaluate model
loss, acc = model.evaluate(testX, testY, verbose=0)
print(loss, acc)

Your Task

Your task in this lesson is to run the example and report the performance of the model on the test dataset.

For bonus points, try varying the configuration of the model, or try saving the model and later loading it and using it to make a prediction on new grayscale photographs of clothing.

Post your findings in the comments below. I would love to see what you discover.

In the next lesson, you will discover how to use image augmentation on training data.

Lesson 06: Image Augmentation

In this lesson, you will discover how to use image augmentation.

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Training deep learning neural network models on more data can result in more skillful models, and the augmentation techniques can create variations of the images that can improve the ability of the fit models to generalize what they have learned to new images.

The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

Download a photograph of a bird by AndYaDontStop, released under a permissive license. Save it into your current working directory with the name ‘bird.jpg‘.

The example below will load the photograph as a dataset and use image augmentation to create flipped and rotated versions of the image that can be used to train a convolutional neural network model.

# example using image augmentation
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rotation_range=90)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
for i in range(9):
     # define subplot
     pyplot.subplot(330 + 1 + i)
     # generate batch of images
     batch = it.next()
     # convert to unsigned integers for viewing
     image = batch[0].astype('uint32')
     # plot raw pixel data
     pyplot.imshow(image)
# show the figure
pyplot.show()

Your Task

Your task in this lesson is to run the example and report the effect that the image augmentation has had on the original image.

For bonus points, try additional types of image augmentation, supported by the ImageDataGenerator class.

Post your findings in the comments below. I would love to see what you find.

In the next lesson, you will discover how to use a deep convolutional network to detect faces in photographs.

Lesson 07: Face Detection

In this lesson, you will discover how to use a convolutional neural network for face detection.

Face detection is a trivial problem for humans to solve and has been solved reasonably well by classical feature-based techniques, such as the cascade classifier.

More recently, deep learning methods have achieved state-of-the-art results on standard face detection datasets. One example is the Multi-task Cascade Convolutional Neural Network, or MTCNN for short.

The ipazc/MTCNN project provides an open source implementation of the MTCNN that can be installed easily as follows:

sudo pip install mtcnn

Download a photograph of a person on the street taken by Holland and released under a permissive license. Save it into your current working directory with the name ‘street.jpg‘.

The example below will load the photograph and use the MTCNN model to detect faces and will plot the photo and draw a box around the first detected face.

# face detection with mtcnn on a photograph
from matplotlib import pyplot
from matplotlib.patches import Rectangle
from mtcnn.mtcnn import MTCNN
# load image from file
pixels = pyplot.imread('street.jpg')
# create the detector, using default weights
detector = MTCNN()
# detect faces in the image
faces = detector.detect_faces(pixels)
# plot the image
pyplot.imshow(pixels)
# get the context for drawing boxes
ax = pyplot.gca()
# get coordinates from the first face
x, y, width, height = faces[0]['box']
# create the shape
rect = Rectangle((x, y), width, height, fill=False, color='red')
# draw the box
ax.add_patch(rect)
# show the plot
pyplot.show()

Your Task

Your task in this lesson is to run the example and describe the result.

For bonus points, try the model on another photograph with multiple faces and update the code example to draw a box around each detected face.

Post your findings in the comments below. I would love to see what you discover.

The End!
(Look How Far You Have Come)

You made it. Well done!

Take a moment and look back at how far you have come.

You discovered:

  • What computer vision is and the promise and impact that deep learning is having on the field.
  • How to scale the pixel values of image data in order to make them ready for modeling.
  • How to develop a convolutional neural network model from scratch.
  • How to use a pre-trained model to classify photographs of objects.
  • How to train a model from scratch to classify photographs of..
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

It can be convenient to use a standard computer vision dataset when getting started with deep learning methods for computer vision.

Standard datasets are often well understood, small, and easy to load. They can provide the basis for testing techniques and reproducing results in order to build confidence with libraries and methods.

In this tutorial, you will discover the standard computer vision datasets provided with the Keras deep learning library.

After completing this tutorial, you will know:

  • The API and idioms for downloading standard computer vision datasets using Keras.
  • The structure, nature, and top results for the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 computer vision datasets.
  • How to load and visualize standard computer vision datasets using the Keras API.

Let’s get started.

How to Load and Visualize Standard Computer Vision Datasets With Keras
Photo by Marina del Castell, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. Keras Computer Vision Datasets
  2. MNIST Dataset
  3. Fashion-MNIST Dataset
  4. CIFAR-10 Dataset
  5. CIFAR-100 Dataset
Keras Computer Vision Datasets

The Keras deep learning library provides access to four standard computer vision datasets.

This is particularly helpful as it allows you to rapidly start testing model architectures and configurations for computer vision.

Four specific multi-class image classification dataset are provided; they are:

  • MNIST: Classify photos of handwritten digits (10 classes).
  • Fashion-MNIST: Classify photos of items of clothing (10 classes).
  • CIFAR-10: Classify small photos of objects (10 classes).
  • CIFAR-100: Classify small photos of common objects (100 classes).

The datasets are available under the keras.datasets module via dataset-specific load functions.

After a call to the load function, the dataset is downloaded to your workstation and stored in the ~/.keras directory under a “datasets” subdirectory. The datasets are stored in a compressed format, but may also include additional metadata.

After the first call to a dataset-specific load function and the dataset is downloaded, the dataset does not need to be downloaded again. Subsequent calls will load the dataset immediately from disk.

The load functions return two tuples, the first containing the input and output elements for samples in the training dataset, and the second containing the input and output elements for samples in the test dataset. The splits between train and test datasets often follow a standard split, used when benchmarking algorithms on the dataset.

The standard idiom for loading the datasets is as follows:

...
# load dataset
(trainX, trainy), (testX, testy) = load_data()

Each of the train and test X and y elements are NumPy arrays of pixel or class values respectively.

Two of the datasets contain grayscale images and two contain color images. The shape of the grayscale images must be converted from two-dimensional to three-dimensional arrays to match the preferred channel ordering of Keras. For example:

# reshape grayscale images to have a single channel
width, height, channels = trainX.shape[1], trainX.shape[2], 1
trainX = trainX.reshape((trainX.shape[0], width, height, channels))
testX = testX.reshape((testX.shape[0], width, height, channels))

Both grayscale and color image pixel data are stored as unsigned integer values with values between 0 and 255.

Before modeling, the image data will need to be rescaled, e.g. such as normalization to the range 0-1 and perhaps further standardized. For example:

# normalize pixel values
trainX = trainX.astype('float32') / 255
testX = testX.astype('float32') / 255

The output elements of each sample (y) are stored as class integer values. Each problem is a multi-class classification problem (more than two classes); as such, it is common practice to one hot encode the class values prior to modeling. This can be achieved using the to_categorical() function provided by Keras; for example:

...
# one hot encode target values
trainy = to_categorical(trainy)
testy = to_categorical(testy)

Now that we are familiar with the idioms for working with the standard computer vision datasets provided by Keras, let’s take a closer look at each dataset in turn.

Note, the examples in this tutorial assume that you have internet access and may download the datasets the first time each example is run on your system. The download speed will depend on the speed of your internet connection and you are recommended to run the examples from the command line.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

MNIST Dataset

The MNIST dataset is an acronym that stands for the Modified National Institute of Standards and Technology dataset.

It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.

The task is to classify a given image of a handwritten digit into one of 10 classes representing integer values from 0 to 9, inclusively.

It is a widely used and deeply understood dataset, and for the most part, is “solved.” Top-performing models are deep learning convolutional neural networks that achieve a classification accuracy of above 99%, with an error rate between 0.4 %and 0.2% on the holdout test dataset.

The example below loads the MNIST dataset using the Keras API and creates a plot of the first 9 images in the training dataset.

# example of loading the mnist dataset
from keras.datasets import mnist
from matplotlib import pyplot
# load dataset
(trainX, trainy), (testX, testy) = mnist.load_data()
# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))
# plot first few images
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# plot raw pixel data
	pyplot.imshow(trainX[i], cmap=pyplot.get_cmap('gray'))
# show the figure
pyplot.show()

Running the example loads the MNIST train and test dataset and prints their shape.

We can see that there are 60,000 examples in the training dataset and 10,000 in the test dataset and that images are indeed square with 28×28 pixels.

Train: X=(60000, 28, 28), y=(60000,)
Test: X=(10000, 28, 28), y=(10000,)

A plot of the first nine images in the dataset is also created showing the natural handwritten nature of the images to be classified.

Plot of a Subset of Images From the MNIST Dataset

Fashion-MNIST Dataset

The Fashion-MNIST is proposed as a more challenging replacement dataset for the MNIST dataset.

It is a dataset comprised of 60,000 small square 28×28 pixel grayscale images of items of 10 types of clothing, such as shoes, t-shirts, dresses, and more.

It is a more challenging classification problem than MNIST and top results are achieved by deep learning convolutional networks with a classification accuracy of about 95% to 96% on the holdout test dataset.

The example below loads the Fashion-MNIST dataset using the Keras API and creates a plot of the first nine images in the training dataset.

# example of loading the fashion mnist dataset
from matplotlib import pyplot
from keras.datasets import fashion_mnist
# load dataset
(trainX, trainy), (testX, testy) = fashion_mnist.load_data()
# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))
# plot first few images
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# plot raw pixel data
	pyplot.imshow(trainX[i], cmap=pyplot.get_cmap('gray'))
# show the figure
pyplot.show()

Running the example loads the Fashion-MNIST train and test dataset and prints their shape.

We can see that there are 60,000 examples in the training dataset and 10,000 in the test dataset and that images are indeed square with 28×28 pixels.

Train: X=(60000, 28, 28), y=(60000,)
Test: X=(10000, 28, 28), y=(10000,)

A plot of the first nine images in the dataset is also created, showing that indeed the images are grayscale photographs of items of clothing.

Plot of a Subset of Images From the Fashion-MNIST Dataset

CIFAR-10 Dataset

CIFAR is an acronym that stands for the Canadian Institute For Advanced Research and the CIFAR-10 dataset was developed along with the CIFAR-100 dataset (covered in the next section) by researchers at the CIFAR institute.

The dataset is comprised of 60,000 32×32 pixel color photographs of objects from 10 classes, such as frogs, birds, cats, ships, etc.

These are very small images, much smaller than a typical photograph, and the dataset is intended for computer vision research.

CIFAR-10 is a dataset and was widely used for benchmarking computer vision algorithms in the field of machine learning. The problem is “solved.” Top performance on the problem is achieved by deep learning convolutional neural networks with a classification accuracy above 96% or 97% on the test dataset.

The example below loads the CIFAR-10 dataset using the Keras API and creates a plot of the first nine images in the training dataset.

# example of loading the cifar10 dataset
from matplotlib import pyplot
from keras.datasets import cifar10
# load dataset
(trainX, trainy), (testX, testy) = cifar10.load_data()
# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))
# plot first few images
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# plot raw pixel data
	pyplot.imshow(trainX[i])
# show the figure
pyplot.show()

Running the example loads the CIFAR-10 train and test dataset and prints their shape.

We can see that there are 50,000 examples in the training dataset and 10,000 in the test dataset and that images are indeed square with 32×32 pixels and color, with three channels.

Train: X=(50000, 32, 32, 3), y=(50000, 1)
Test: X=(10000, 32, 32, 3), y=(10000, 1)

A plot of the first nine images in the dataset is also created. It is clear that the images are indeed very small compared to modern photographs; it can be challenging to see what exactly is represented in some of the images given the extremely low resolution.

This low resolution is likely the cause of the limited performance that top-of-the-line algorithms are able to achieve on the dataset.

Plot of a Subset of Images From the CIFAR-10 Dataset

CIFAR-100 Dataset

The CIFAR-100 dataset was prepared along with the CIFAR-10 dataset by academics at the Canadian Institute For Advanced Research (CIFAR).

The dataset is comprised of 60,000 32×32 pixel color photographs of objects from 100 classes, such as fish, flowers, insects, and much more.

Like CIFAR-10, the images are intentionally small and unrealistic photographs and the dataset is intended for computer vision research.

The example below loads the CIFAR-100 dataset using the Keras API and creates a plot of the first nine images in the training dataset.

# example of loading the cifar100 dataset
from matplotlib import pyplot
from keras.datasets import cifar100
# load dataset
(trainX, trainy), (testX, testy) = cifar100.load_data()
# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))
# plot first few images
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# plot raw pixel data
	pyplot.imshow(trainX[i])
# show the figure
pyplot.show()

Running the example loads the CIFAR-100 train and test dataset and prints their shape.

We can see that there are 50,000 examples in the training dataset and 10,000 in the test dataset and that images are indeed square with 32×32 pixels and color, with three channels.

Train: X=(50000, 32, 32, 3), y=(50000, 1)
Test: X=(10000, 32, 32, 3), y=(10000, 1)

A plot of the first nine images in the dataset is also created, and like CIFAR-10, the low resolution of the images can make it challenging to clearly see what is present in some photos.

Plot of a Subset of Images From the CIFAR-100 Dataset

Although there are images organized into 100 classes, the 100 classes are organized into 20 super-classes, e.g. groups of common classes.

Keras will return labels for 100 classes by default, although labels can be retrieved by setting the “label_mode” argument to “coarse” (instead of the default “fine“) when calling the load_data() function. For example:

# load coarse labels
(trainX, trainy), (testX, testy) = cifar100.load_data(label_mode='coarse')

The difference is made clear when the labels are one hot encoded using the to_categorical() function, where instead of each output vector having 100 dimensions, it will only have 20. The example below demonstrates this by loading the dataset with course labels and encoding the class labels.

# example of loading the cifar100 dataset with coarse labels
from keras.datasets import cifar100
from keras.utils import to_categorical
# load coarse labels
(trainX, trainy), (testX, testy) = cifar100.load_data(label_mode='coarse')
# one hot encode target values
trainy = to_categorical(trainy)
testy = to_categorical(testy)
# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))

Running the example loads the CIFAR-100 dataset as before, but images are now classified as belonging to one of the twenty super-classes.

The class labels are one hot encoded and we can see that each label is represented by a twenty element vector instead of a 100 element vector we would expect for the fine class labels.

Train: X=(50000, 32, 32, 3), y=(50000, 20)
Test: X=(10000, 32, 32, 3), y=(10000, 20)

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

APIs Articles Summary

In this tutorial, you discovered the standard computer vision datasets provided with the Keras deep learning library.

Specifically, you learned:

  • The API and idioms for downloading standard computer vision datasets using Keras.
  • The structure, nature, and top results for the MNIST, Fashion-MNIST, CIFAR-10 and CIFAR-100 computer vision datasets.
  • How to load and visualize standard computer vision datasets using the Keras API.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Load and Visualize Standard Computer Vision Datasets With Keras appeared first on Machine Learning Mastery.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Color images have height, width, and color channel dimensions.

When represented as three-dimensional arrays, the channel dimension for the image data is last by default, but may be moved to be the first dimension, often for performance-tuning reasons.

The use of these two “channel ordering formats” and preparing data to meet a specific preferred channel ordering can be confusing to beginners.

In this tutorial, you will discover channel ordering formats, how to prepare and manipulate image data to meet formats, and how to configure the Keras deep learning library for different channel orderings.

After completing this tutorial, you will know:

  • The three-dimensional array structure of images and the channels first and channels last array formats.
  • How to add a channels dimension and how to convert images between the channel formats.
  • How the Keras deep learning library manages a preferred channel ordering and how to change and query this preference.

Let’s get started.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Images as 3D Arrays
  2. Manipulating Image Channels
  3. Keras Channel Ordering
Images as 3D Arrays

An image can be stored as a three-dimensional array in memory.

Typically, the image format has one dimension for rows (height), one for columns (width) and one for channels.

If the image is black and white (grayscale), the channels dimension may not be explicitly present, e.g. there is one unsigned integer pixel value for each (row, column) coordinate in the image.

Colored images typically have three channels, for the pixel value at the (row, column) coordinate for the red, green, and blue components.

Deep learning neural networks require that image data be provided as three-dimensional arrays.

This applies even if your image is grayscale. In this case, the additional dimension for the single color channel must be added.

There are two ways to represent the image data as a three dimensional array. The first involves having the channels as the last or third dimension in the array. This is called “channels last“. The second involves having the channels as the first dimension in the array, called “channels first“.

  • Channels Last. Image data is represented in a three-dimensional array where the last channel represents the color channels, e.g. [rows][cols][channels].
  • Channels First. Image data is represented in a three-dimensional array where the first channel represents the color channels, e.g. [channels][rows][cols].

Some image processing and deep learning libraries prefer channels first ordering, and some prefer channels last. As such, it is important to be familiar with the two approaches to representing images.

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Manipulating Image Channels

You may need to change or manipulate the image channels or channel ordering.

This can be achieved easily using the NumPy python library.

Let’s look at some examples.

In this tutorial, we will use a photograph taken by Larry Koester, some rights reserved, of the Phillip Island Penguin Parade.

Phillip Island Penguin Parade
Photo by Larry Koester, some rights reserved.

Download the image and place it in your current working directory with the filename “penguin_parade.jpg“.

The code examples in this tutorials assume that the Pillow library is installed.

How to Add a Channel to a Grayscale Image

Grayscale images are loaded as a two-dimensional array.

Before they can be used for modeling, you may have to add an explicit channel dimension to the image. This does not add new data; instead, it changes the array data structure to have an additional third axis with one dimension to hold the grayscale pixel values.

For example, a grayscale image with the dimensions [rows][cols] can be changed to [rows][cols][channels] or [channels][rows][cols] where the new [channels] axis has one dimension.

This can be achieved using the expand_dims() NumPy function. The “axis” argument allows you to specify where the new dimension will be added to the first, e.g. first for channels first or last for channels last.

The example below loads the Penguin Parade photograph using the Pillow library as a grayscale image and demonstrates how to add a channel dimension.

# example of expanding dimensions
from numpy import expand_dims
from numpy import asarray
from PIL import Image
# load the image
img = Image.open('penguin_arade.jpg')
# convert the image to grayscale
img = img.convert(mode='L')
# convert to numpy array
data = asarray(img)
print(data.shape)
# add channels first
data_first = expand_dims(data, axis=0)
print(data_first.shape)
# add channels first
data_last = expand_dims(data, axis=2)
print(data_last.shape)

Running the example first loads the photograph using the Pillow library, then converts it to a grayscale image.

The image object is converted to a NumPy array and we confirm the shape of the array is two dimensional, specifically (424, 640).

The expand_dims() function is then used to add a channel via axis=0 to the front of the array and the change is confirmed with the shape (1, 424, 640). The same function is then used to add a channel to the end or third dimension of the array with axis=2 and the change is confirmed with the shape (424, 640, 1).

(424, 640)
(1, 424, 640)
(424, 640, 1)

Another popular alternative to expanding the dimensions of an array is to use the reshape() NumPy function and specify a tuple with the new shape; for example:

data = data.reshape((424, 640, 1))

How to Change Image Channel Ordering

After a color image is loaded as a three-dimensional array, the channel ordering can be changed.

This can be achieved using the moveaxis() NumPy function. It allows you to specify the index of the source axis and the destination axis.

This function can be used to change an array in channel last format such, as [rows][cols][channels] to channels first format, such as [channels][rows][cols], or the reverse.

The example below loads the Penguin Parade photograph in channel last format and uses the moveaxis() function change it to channels first format.

# change image from channels last to channels first format
from numpy import moveaxis
from numpy import asarray
from PIL import Image
# load the color image
img = Image.open('penguin_arade.jpg')
# convert to numpy array
data = asarray(img)
print(data.shape)
# change channels last to channels first format
data = moveaxis(data, 2, 0)
print(data.shape)
# change channels first to channels last format
data = moveaxis(data, 0, 2)
print(data.shape)

Running the example first loads the photograph using the Pillow library and converts it to a NumPy array confirming that the image was loaded in channels last format with the shape (424, 640, 3).

The moveaxis() function is then used to move the channels axis from position 2 to position 0 and the result is confirmed showing channels first format (3, 424, 640). This is then reversed, moving the channels in position 0 to position 2 again.

(424, 640, 3)
(3, 424, 640)
(424, 640, 3)

Keras Channel Ordering

The Keras deep learning library is agnostic to how you wish to represent images in either channel first or last format, but the preference must be specified and adhered to when using the library.

Keras wraps a number of mathematical libraries, and each has a preferred channel ordering. The three main libraries that Keras may wrap and their preferred channel ordering are listed below:

  • TensorFlow: Channels last order.
  • Theano: Channels first order.
  • CNTK: Channels last order.

By default, Keras is configured to use TensorFlow and the channel ordering is also by default channels last. You can use either channel ordering with any library and the Keras library.

Some libraries claim that the preferred channel ordering can result in a large difference in performance. For example, use of the MXNet mathematical library as the backend for Keras recommends using the channels first ordering for better performance.

We strongly recommend changing the image_data_format to channels_first. MXNet is significantly faster on channels_first data.

Performance Tuning Keras with MXNet Backend, Apache MXNet

Default Channel Ordering

The library and preferred channel ordering are listed in the Keras configuration file, stored in your home directory under ~/.keras/keras.json.

The preferred channel ordering is stored in the “image_data_format” configuration setting and can be set as either “channels_last” or “channels_first“.

For example, below is the contents of a keras.json configuration file. In it, you can see that the system is configured to use tensorflow and channels_last order.

{
    "image_data_format": "channels_last",
    "backend": "tensorflow",
    "epsilon": 1e-07,
    "floatx": "float32"
}

Based on your preferred channel ordering, you will have to prepare your image data to match the preferred ordering.

Specifically, this will include tasks such as:

  • Resizing or expanding the dimensions of any training, validation, and test data to meet the expectation.
  • Specifying the expected input shape of samples when defining models (e.g. input_shape=(28, 28, 1)).
Model-Specific Channel Ordering

In addition, those neural network layers that are designed to work with images, such as Conv2D, also provide an argument called “data_format” that allows you to specify the channel ordering. For example:

...
model.add(Conv2D(..., data_format='channels_first'))

By default, this will use the preferred ordering specified in the “image_data_format” value of the Keras configuration file. Nevertheless, you can change the channel order for a given model, and in turn, the datasets and input shape would also have to be changed to use the new channel ordering for the model.

This can be useful when loading a model used for transfer learning that has a channel ordering different to your preferred channel ordering.

Query Channel Ordering

You can confirm your current preferred channel ordering by printing the result of the image_data_format() function. The example below demonstrates.

# show preferred channel order
from keras import backend
print(backend.image_data_format())

Running the example prints your preferred channel ordering as configured in your Keras configuration file. In this case, the channels last format is used.

channels_last

Accessing this property can be helpful if you want to automatically construct models or prepare data differently depending on the systems preferred channel ordering; for example:

if backend.image_data_format() == 'channels_last':
	...
else:
	...

Force Channel Ordering

Finally, the channel ordering can be forced for a specific program.

This can be achieved by calling the set_image_dim_ordering() function on the Keras backend to either ‘th‘ (theano) for channel-first ordering, or ‘tf‘ (tensorflow) for channel-last ordering.

This can be useful if you want a program or model to operate consistently regardless of Keras default channel ordering configuration.

# force a channel ordering
from keras import backend
# force channels-first ordering
backend.set_image_dim_ordering('th')
print(backend.image_data_format())
# force channels-last ordering
backend.set_image_dim_ordering('tf')
print(backend.image_data_format())

Running the example first forces channels-first ordering, then channels-last ordering, confirming each configuration by printing the channel ordering mode after the change.

channels_first
channels_last

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Summary

In this tutorial, you discovered channel ordering formats, how to prepare and manipulate image data to meet formats, and how to configure the Keras deep learning library for different channel orderings.

Specifically, you learned:

  • The three-dimensional array structure of images and the channels first and channels last array formats.
  • How to add a channels dimension and how to convert images between the channel formats.
  • How the Keras deep learning library manages a preferred channel ordering and how to change and query this preference.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to Channels First and Channels Last Image Formats for Deep Learning appeared first on Machine Learning Mastery.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The pixel values in images must be scaled prior to providing the images as input to a deep learning neural network model during the training or evaluation of the model.

Traditionally, the images would have to be scaled prior to the development of the model and stored in memory or on disk in the scaled format.

An alternative approach is to scale the images using a preferred scaling technique just-in-time during the training or model evaluation process. Keras supports this type of data preparation for image data via the ImageDataGenerator class and API.

In this tutorial, you will discover how to use the ImageDataGenerator class to scale pixel data just-in-time when fitting and evaluating deep learning neural network models.

After completing this tutorial, you will know:

  • How to configure and a use the ImageDataGenerator class for train, validation, and test datasets of images.
  • How to use the ImageDataGenerator to normalize pixel values when fitting and evaluating a convolutional neural network model.
  • How to use the ImageDataGenerator to center and standardize pixel values when fitting and evaluating a convolutional neural network model.

Let’s get started.

How to Normalize, Center, and Standardize Images With the ImageDataGenerator in Keras
Photo by Sagar, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. MNIST Handwritten Image Classification Dataset
  2. ImageDataGenerator class for Pixel Scaling
  3. How to Normalize Images With ImageDataGenerator
  4. How to Center Images With ImageDataGenerator
  5. How to Standardize Image With ImageDataGenerator
MNIST Handwritten Image Classification Dataset

Before we dive into the usage of the ImageDataGenerator class for preparing image data, we must select an image dataset on which to test the generator.

The MNIST problem, or MNIST for short, is an image classification problem comprised of 70,000 images of handwritten digits.

The goal of the problem is to classify a given image of a handwritten digit as an integer from 0 to 9. As such, it is a multiclass image classification problem.

This dataset is provided as part of the Keras library and can be automatically downloaded (if needed) and loaded into memory by a call to the keras.datasets.mnist.load_data() function.

The function returns two tuples: one for the training inputs and outputs and one for the test inputs and outputs. For example:

# example of loading the MNIST dataset
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

We can load the MNIST dataset and summarize the dataset. The complete example is listed below.

# load and summarize the MNIST dataset
from keras.datasets import mnist
# load dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# summarize dataset shape
print('Train', train_images.shape, train_labels.shape)
print('Test', (test_images.shape, test_labels.shape))
# summarize pixel values
print('Train', train_images.min(), train_images.max(), train_images.mean(), train_images.std())
print('Train', test_images.min(), test_images.max(), test_images.mean(), test_images.std())

Running the example first loads the dataset into memory. Then the shape of the train and test datasets is reported.

We can see that all images are 28 by 28 pixels with a single channel for black-and-white images. There are 60,000 images for the training dataset and 10,000 for the test dataset.

We can also see that pixel values are integer values between 0 and 255 and that the mean and standard deviation of the pixel values are similar between the two datasets.

Train (60000, 28, 28) (60000,)
Test ((10000, 28, 28), (10000,))
Train 0 255 33.318421449829934 78.56748998339798
Train 0 255 33.791224489795916 79.17246322228644

We will use this dataset to explore different pixel scaling methods using the ImageDataGenerator class in Keras.

ImageDataGenerator Class for Pixel Scaling

The ImageDataGenerator class in Keras provides a suite of techniques for scaling pixel values in your image dataset prior to modeling.

The class will wrap your image dataset, then when requested, it will return images in batches to the algorithm during training, validation, or evaluation and apply the scaling operations just-in-time. This provides an efficient and convenient approach to scaling image data when modeling with neural networks.

The usage of the ImageDataGenerator class is as follows.

  • 1. Load your dataset.
  • 2. Configure the ImageDataGenerator (e.g. construct an instance).
  • 3. Calculate image statistics (e.g. call the fit() function).
  • 4. Use the generator to fit the model (e.g. pass the instance to the fit_generator() function).
  • 5. Use the generator to evaluate the model (e.g. pass the instance to the evaluate_generator() function).

The ImageDataGenerator class supports a number of pixel scaling methods, as well as a range of data augmentation techniques. We will focus on the pixel scaling techniques and leave the data augmentation methods to a later discussion.

The three main types of pixel scaling techniques supported by the ImageDataGenerator class are as follows:

  • Pixel Normalization: scale pixel values to the range 0-1.
  • Pixel Centering: scale pixel values to have a zero mean.
  • Pixel Standardization: scale pixel values to have a zero mean and unit variance.

The pixel standardization is supported at two levels: either per-image (called sample-wise) or per-dataset (called feature-wise). Specifically, the mean and/or mean and standard deviation statistics required to standardize pixel values can be calculated from the pixel values in each image only (sample-wise) or across the entire training dataset (feature-wise).

Other pixel scaling methods are supported, such as ZCA, brightening, and more, but wel will focus on these three most common methods.

The choice of pixel scaling is selected by specifying arguments to the ImageDataGenerator when an instance is constructed; for example:

# create and configure the data generator
datagen = ImageDataGenerator(...)

Next, if the chosen scaling method requires that statistics be calculated across the training dataset, then these statistics can be calculated and stored by calling the fit() function.

When evaluating and selecting a model, it is common to calculate these statistics on the training dataset and then apply them to the validation and test datasets.

# calculate scaling statistics on the training dataset
datagen.fit(trainX)

Once prepared, the data generator can be used to fit a neural network model by calling the flow() function to retrieve an iterator that returns batches of samples and passing it to the fit_generator() function.

# get batch iterator
train_iterator = datagen.flow(trainX, trainy)
# fit model
model.fit_generator(train_iterator, ...)

If a validation dataset is required, a separate batch iterator can be created from the same data generator that will perform the same pixel scaling operations and use any required statistics calculated on the training dataset.

# get batch iterator for training
train_iterator = datagen.flow(trainX, trainy)
# get batch iterator for validation
val_iterator = datagen.flow(valX, valy)
# fit model
model.fit_generator(train_iterator, validation_data=val_iterator, ...)

Once fit, the model can be evaluated by creating a batch iterator for the test dataset and calling the evaluate_generator() function on the model.

Again, the same pixel scaling operations will be performed and any statistics calculated on the training dataset will be used, if needed.

# get batch iterator for testing
test_iterator = datagen.flow(testX, testy)
# evaluate model loss on test dataset
loss = model.evaluate_generator(test_iterator, ...)

Now that we are familiar with how to use the ImageDataGenerator class for scaling pixel values, let’s look at some specific examples.

How to Normalize Images With ImageDataGenerator

The ImageDataGenerator class can be used to rescale pixel values from the range of 0-255 to the range 0-1 preferred for neural network models.

Scaling data to the range of 0-1 is traditionally referred to as normalization.

This can be achieved by setting the rescale argument to a ratio by which each pixel can be multiplied to achieve the desired range.

In this case, the ratio is 1/255 or about 0.0039. For example:

# create generator (1.0/255.0 = 0.003921568627451)
datagen = ImageDataGenerator(rescale=1.0/255.0)

The ImageDataGenerator does not need to be fit in this case because there are no global statistics that need to be calculated.

Next, iterators can be created using the generator for both the train and test datasets. We will use a batch size of 64. This means that each of the train and test datasets of images are divided into groups of 64 images that will then be scaled when returned from the iterator.

We can see how many batches there will be in one epoch, e.g. one pass through the training dataset, by printing the length of each iterator.

# prepare an iterators to scale images
train_iterator = datagen.flow(trainX, trainY, batch_size=64)
test_iterator = datagen.flow(testX, testY, batch_size=64)
print('Batches train=%d, test=%d' % (len(train_iterator), len(test_iterator)))

We can then confirm that the pixel normalization has been performed as expected by retrieving the first batch of scaled images and inspecting the min and max pixel values.

# confirm the scaling works
batchX, batchy = train_iterator.next()
print('Batch shape=%s, min=%.3f, max=%.3f' % (batchX.shape, batchX.min(), batchX.max()))

Next, we can use the data generator to fit and evaluate a model. We will define a simple convolutional neural network model and fit it on the train_iterator for five epochs with 60,000 samples divided by 64 samples per batch, or about 938 batches per epoch.

# fit model with generator
model.fit_generator(train_iterator, steps_per_epoch=len(train_iterator), epochs=5)

Once fit, we will evaluate the model on the test dataset, with about 10,000 images divided by 64 samples per batch, or about 157 steps in a single epoch.

_, acc = model.evaluate_generator(test_iterator, steps=len(test_iterator), verbose=0)
print('Test Accuracy: %.3f' % (acc * 100))

We can tie all of this together; the complete example is listed below.

# example of using ImageDataGenerator to normalize images
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.preprocessing.image import ImageDataGenerator
# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
width, height, channels = trainX.shape[1], trainX.shape[2], 1
trainX = trainX.reshape((trainX.shape[0], width, height, channels))
testX = testX.reshape((testX.shape[0], width, height, channels))
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
# confirm scale of pixels
print('Train min=%.3f, max=%.3f' % (trainX.min(), trainX.max()))
print('Test min=%.3f, max=%.3f' % (testX.min(), testX.max()))
# create generator (1.0/255.0 = 0.003921568627451)
datagen = ImageDataGenerator(rescale=1.0/255.0)
# prepare an iterators to scale images
train_iterator = datagen.flow(trainX, trainY, batch_size=64)
test_iterator = datagen.flow(testX, testY, batch_size=64)
print('Batches train=%d, test=%d' % (len(train_iterator), len(test_iterator)))
# confirm the scaling works
batchX, batchy = train_iterator.next()
print('Batch shape=%s, min=%.3f, max=%.3f' % (batchX.shape, batchX.min(), batchX.max()))
# define model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(width, height, channels)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# fit model with generator
model.fit_generator(train_iterator, steps_per_epoch=len(train_iterator), epochs=5)
# evaluate model
_, acc = model.evaluate_generator(test_iterator, steps=len(test_iterator), verbose=0)
print('Test Accuracy: %.3f' % (acc * 100))

Running the example first reports the min and max pixel values on the train and test sets. This confirms that indeed the raw data has pixel values in the range 0-255.

Next, the data generator is created and the iterators are prepared. We can see that we have 938 batches per epoch with the training dataset and 157 batches per epoch with the test dataset.

We retrieve the first batch from the dataset and confirm that it contains 64 images with the height and width (rows and columns) of 28 pixels and 1 channel, and that the new minimum and maximum pixel values are 0 and 1 respectively. This confirms that the normalization has had the desired effect.

Train min=0.000, max=255.000
Test min=0.000, max=255.000
Batches train=938, test=157
Batch shape=(64, 28, 28, 1), min=0.000, max=1.000

The model is then fit on the normalized image data. Training does not take long on the CPU. Finally, the model is evaluated in the test dataset, applying the same normalization.

Epoch 1/5
938/938 [==============================] - 12s 13ms/step - loss: 0.1841 - acc: 0.9448
Epoch 2/5
938/938 [==============================] - 12s 13ms/step - loss: 0.0573 - acc: 0.9826
Epoch 3/5
938/938 [==============================] - 12s 13ms/step - loss: 0.0407 - acc: 0.9870
Epoch 4/5
938/938 [==============================] - 12s 13ms/step - loss: 0.0299 - acc: 0.9904
Epoch 5/5
938/938 [==============================] - 12s 13ms/step - loss: 0.0238 - acc: 0.9928
Test Accuracy: 99.050

Now that we are familiar with how to use the ImageDataGenerator in general and specifically for image normalization, let’s look at examples of pixel centering and standardization.

How to Center Images With ImageDataGenerator

Another popular pixel scaling method is to calculate the mean pixel value across the entire training dataset, then subtract it from each image.

This is called centering and has the effect of centering the distribution of pixel values on zero: that is, the mean pixel value for centered images will be zero.

The ImageDataGenerator class refers to centering that uses the mean calculated on the training dataset as feature-wise centering. It requires that the statistic is calculated on the training dataset prior to scaling.

# create generator that centers pixel values
datagen = ImageDataGenerator(featurewise_center=True)
# calculate the mean on the training dataset
datagen.fit(trainX)

It is different to calculating of the mean pixel value for each image, which Keras refers to as sample-wise centering and does not require any statistics to be calculated on the training dataset.

# create generator that centers pixel values
datagen = ImageDataGenerator(samplewise_center=True)

We will demonstrate feature-wise centering in this section. Once the statistic is calculated on the training dataset, we can confirm the value by accessing and printing it; for example:

# print the mean calculated on the training dataset.
print(datagen.mean)

We can also confirm that the scaling procedure has had the desired effect by calculating the mean of a batch of images returned from the batch iterator. We would expect the mean to be a small value close to zero, but not zero because of the small number of images in the batch.

# get a batch
batchX, batchy = iterator.next()
# mean pixel value in the batch
print(batchX.shape, batchX.mean())

A better check would be to set the batch size to the size of the training dataset (e.g. 60,000 samples), retrieve one batch, then calculate the mean. It should be a very small value close to zero.

# try to flow the entire training dataset
iterator = datagen.flow(trainX, trainy, batch_size=len(trainX), shuffle=False)
# get a batch
batchX, batchy = iterator.next()
# mean pixel value in the batch
print(batchX.shape, batchX.mean())

The complete example is listed below.

# example of centering a image dataset
from keras.datasets import mnist
from keras.preprocessing.image import ImageDataGenerator
# load dataset
(trainX, trainy), (testX, testy) = mnist.load_data()
# reshape dataset to have a single channel
width, height, channels = trainX.shape[1], trainX.shape[2], 1
trainX = trainX.reshape((trainX.shape[0], width, height, channels))
testX = testX.reshape((testX.shape[0], width, height, channels))
# report per-image mean
print('Means train=%.3f, test=%.3f' % (trainX.mean(), testX.mean()))
# create generator that centers pixel values
datagen = ImageDataGenerator(featurewise_center=True)
# calculate the mean on the training dataset
datagen.fit(trainX)
print('Data Generator Mean: %.3f' % datagen.mean)
# demonstrate effect on a single batch of samples
iterator = datagen.flow(trainX, trainy, batch_size=64)
# get a batch
batchX, batchy = iterator.next()
# mean pixel value in the batch
print(batchX.shape, batchX.mean())
# demonstrate effect on entire training dataset
iterator = datagen.flow(trainX, trainy, batch_size=len(trainX), shuffle=False)
# get a batch
batchX, batchy = iterator.next()
# mean pixel value in the batch
print(batchX.shape, batchX.mean())

Running the example first reports the mean pixel value for the train and test datasets.

The MNIST dataset only has a single channel because the images are black and white (grayscale), but if the images were color, the mean pixel values would be calculated across all channels in all images in the training dataset, i.e. there would not be a separate mean value for each channel.

The ImageDataGenerator is fit on the training dataset and we can confirm that the mean pixel value matches our own manual calculation.

A single batch of centered images is retrieved and we can confirm that the mean pixel value is a small-ish value close to zero. The test is repeated using the entire training dataset as a the batch size, and in this case, the mean pixel value for the scaled dataset is a number very close to zero, confirming that centering is having the desired effect.

Means train=33.318, test=33.791
Data Generator Mean: 33.318
(64, 28, 28, 1) 0.09971977
(60000, 28, 28, 1) -1.9512918e-05

We can demonstrate centering with our convolutional neural network developed in the previous section.

The complete example with feature-wise centering is listed below.

# example of using ImageDataGenerator to center images
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.preprocessing.image import ImageDataGenerator
# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
width, height, channels = trainX.shape[1], trainX.shape[2], 1
trainX = trainX.reshape((trainX.shape[0], width, height, channels))
testX = testX.reshape((testX.shape[0], width, height, channels))
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
# create generator to center images
datagen = ImageDataGenerator(featurewise_center=True)
# calculate mean on training dataset
datagen.fit(trainX)
# prepare an iterators to scale images
train_iterator = datagen.flow(trainX, trainY, batch_size=64)
test_iterator = datagen.flow(testX, testY, batch_size=64)
print('Batches train=%d, test=%d' % (len(train_iterator), len(test_iterator)))
# define model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(width, height, channels)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# fit model with generator
model.fit_generator(train_iterator, steps_per_epoch=len(train_iterator), epochs=5)
# evaluate model
_, acc = model.evaluate_generator(test_iterator, steps=len(test_iterator), verbose=0)
print('Test Accuracy: %.3f' % (acc * 100))

Running the example prepares the ImageDataGenerator, centering images using statistics calculated on the training dataset.

We can see that performance starts off poor but does improve. The centered pixel values will have a range of about -227 to 227, and neural networks often train more efficiently with small inputs. Normalizing followed by centering would be a better approach in practice.

Importantly, the model is evaluated on the test dataset, where the images in the test dataset were centered using the mean value calculated on the training dataset. This is to avoid any data leakage.

Batches train=938, test=157
Epoch 1/5
938/938 [==============================] - 12s 13ms/step - loss: 12.8824 - acc: 0.2001
Epoch 2/5
938/938 [==============================] - 12s 13ms/step - loss: 6.1425 - acc: 0.5958
Epoch 3/5
938/938 [==============================] - 12s 13ms/step - loss: 0.0678 - acc: 0.9796
Epoch 4/5
938/938 [==============================] - 12s 13ms/step - loss: 0.0464 - acc: 0.9857
Epoch 5/5
938/938 [==============================] - 12s 13ms/step - loss: 0.0373 - acc: 0.9880
Test Accuracy: 98.540

How to Standardize Image With ImageDataGenerator

Standardization is a data scaling technique that assumes that the distribution of the data is Gaussian and shifts the distribution of the data to have a mean of zero and a standard deviation of one.

Data with this distribution is referred to as a standard Gaussian. It can be beneficial when training neural networks as the dataset sums to zero and the inputs are small values in the rough range of about -3.0 to 3.0 (e.g. 99.7 of the values will fall within three standard deviations of the mean).

Standardization of images is achieved by subtracting the mean pixel value and dividing the result by the standard deviation of the pixel values.

The mean and standard deviation statistics can be calculated on the training dataset, and as discussed in the previous section, Keras refers to this as feature-wise.

# feature-wise generator
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True)
# calculate mean and standard deviation on the training dataset
datagen.fit(trainX)

The statistics can also be calculated then used to standardize each image separately, and Keras refers to this as sample-wise standardization.

# sample-wise standardization
datagen = ImageDataGenerator(samplewise_center=True,..
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The Keras deep learning library provides a sophisticated API for loading, preparing, and augmenting image data.

Also included in the API are some undocumented functions that allow you to quickly and easily load, convert, and save image files. These functions can be convenient when getting started on a computer vision deep learning project, allowing you to use the same Keras API initially to inspect and handle image data.

In this tutorial, you will discover how to use the basic image handling functions provided by the Keras API.

After completing this tutorial, you will know:

  • How to load and display an image using the Keras API.
  • How to convert a loaded image to a NumPy array and back to PIL format using the Keras API.
  • How to convert a loaded image to grayscale and save it to a new file using the Keras API.

Let’s get started.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. Test Image
  2. Keras Image Processing API
  3. How to Load an Image With Keras
  4. Convert Image an With Keras
  5. Save Image an With Keras
Test Image

The first step is to select a test image to use in this tutorial.

We will use a photograph of Bondi Beach, Sydney, taken by Isabell Schulz, released under a permissive creative commons license.

Bondi Beach, Sydney

Download the image and place it into your current working directory with the filename “bondi_beach.jpg“.

Keras Image Processing API

The Keras deep learning library provides utilities for working with image data.

The main API is the ImageDataGenerator class that combines data loading, preparation, and augmentation.

We will not cover the ImageDataGenerator class in this tutorial. Instead, we will take a closer look at a few less-documented or undocumented functions that may be useful when working with image data and modeling with the Keras API.

Specifically, Keras provides functions for loading, converting, and saving image data. The functions are in the utils.py function and exposed via the image.py module.

These functions can be useful convenience functions when getting started on a new deep learning computer vision project or when you need to inspect specific images.

Some of these functions are demonstrated when working with pre-trained models in the Applications section of the API documentation.

All image handling in Keras requires that the Pillow library is installed. If it is not installed, you can review the installation instructions.

Let’s take a closer look at each of these functions in turn.

How to Load an Image with Keras

Keras provides the load_img() function for loading an image from file as a PIL image object.

The example below loads the Bondi Beach photograph from file as a PIL image and reports details about the loaded image.

# example of loading an image with the Keras API
from keras.preprocessing.image import load_img
# load the image
img = load_img('bondi_beach.jpg')
# report details about the image
print(type(img))
print(img.format)
print(img.mode)
print(img.size)
# show the image
img.show()

Running the example loads the image and reports details about the loaded image.

We can confirm that the image was loaded as a PIL image in JPEG format with RGB channels and the size of 640 by 427 pixels.

<class 'PIL.JpegImagePlugin.JpegImageFile'>
JPEG
RGB
(640, 427)

The loaded image is then displayed using the default application on the workstation, in this case, the Preview application on macOS.

Example of Displaying a PIL image using the Default Application

The load_img() function provides additional arguments that may be useful when loading the image, such as ‘grayscale‘ that allows the image to be loaded in grayscale (defaults to False), ‘color_mode‘ that allows the image mode or channel format to be specified (defaults to rgb), and ‘target_size‘ that allows a tuple of (height, width) to be specified, resizing the image automatically after being loaded.

How to Convert an Image With Keras

Keras provides the img_to_array() function for converting a loaded image in PIL format into a NumPy array for use with deep learning models.

The API also provides the array_to_img() function that can be used for converting a NumPy array of pixel data into a PIL image. This can be useful if the pixel data is modified while the image is in array format and can then be saved or viewed.

The example below loads the test image, converts it to a NumPy array, and then converts it back into a PIL image.

# example of converting an image with the Keras API
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import array_to_img
# load the image
img = load_img('bondi_beach.jpg')
print(type(img))
# convert to numpy array
img_array = img_to_array(img)
print(img_array.dtype)
print(img_array.shape)
# convert back to image
img_pil = array_to_img(img_array)
print(type(img))

Running the example first loads the photograph in PIL format, then converts the image to a NumPy array and reports the data type and shape.

We can see that the pixel values are converted from unsigned integers to 32-bit floating point values, and in this case, converted to the array format [height, width, channels]. Finally, the image is converted back into PIL format.

<class 'PIL.JpegImagePlugin.JpegImageFile'>
float32
(427, 640, 3)
<class 'PIL.JpegImagePlugin.JpegImageFile'>

How to Save Image With Keras

The Keras API also provides the save_img() function to save an image to file.

The function takes the path to save the image, and the image data in NumPy array format. The file format is inferred from the filename, but can also be specified via the ‘file_format‘ argument.

This can be useful if you have manipulated image pixel data, such as scaling, and wish to save the image for later use.

The example below loads the photograph image in grayscale format, converts it to a NumPy array, and saves it to a new file name.

# example of saving an image with the Keras API
from keras.preprocessing.image import load_img
from keras.preprocessing.image import save_img
from keras.preprocessing.image import img_to_array
# load image as as grayscale
img = load_img('bondi_beach.jpg', grayscale=True)
# convert image to a numpy array
img_array = img_to_array(img)
# save the image with a new filename
save_img('bondi_beach_grayscale.jpg', img_array)
# load the image to confirm it was saved correctly
img = load_img('bondi_beach_grayscale.jpg')
print(type(img))
print(img.format)
print(img.mode)
print(img.size)
img.show()

Running the example first loads the image and forces the format to be grayscale.

The image is then converted to a NumPy array and saved to the new filename ‘bondi_beach_grayscale.jpg‘ in the current working directory.

To confirm that the file was saved correctly, it is loaded again as a PIL image and details of the image are reported.

<class 'PIL.Image.Image'>
None
RGB
(640, 427)

The loaded grayscale image is then displayed using the default image preview application on the workstation, which in macOS is the Preview application.

Example of Saved Grayscale Image Shown Using the Default Image Viewing Application

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Posts API Summary

In this tutorial, you discovered how to use the basic image handling functions provided by the Keras API.

Specifically, you learned:

  • How to load and display an image using the Keras API.
  • How to convert a loaded image to a NumPy array and back to PIL format using the Keras API.
  • How to convert a loaded image to grayscale and save it to a new file using the Keras API.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post How to Load, Convert, and Save Images With the Keras API appeared first on Machine Learning Mastery.

Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview