Digit Recogniser Using Neural Networks

1909 Views, 23 Favorites, 0 Comments

Digit Recogniser Using Neural Networks

Interested in Machine Learning and Neural Networks?

Have you ever wondered how many applications like facial recognition works? Machine Learning is the answer to that. One of the most recent widely used model is the Neural Network.

In this instructable, you will deploy your own Neural Network in less than 2 hours! If you haven't programmed before, don't worry! All the code is provided with links to instructions on how to setup the environment. Experience with neural networks is not important but it will help understand some of the concepts.

Let's Get Started!

What Are Neural Networks?

Neural networks are a computation model based on a large collection of artificial neurons which in part mimics the way the biological neurons solve problems using large clusters. Each of the neurons is connected with many others and these links can be enforcing or inhibitory of some property.

A signal is provided as an input to the network (image of a digit) and it propagates through the network. The hidden layers decide on the correct response and supply that as the output. In order to correct and mistakes the model makes, the error from the output is propagated back into the hidden layers allowing them to adjust. This is how a neural network learns.

Neural Networks can be stacked in different configurations for specific applications. For the digit recognizer, we will use a convolutional neural network.

For more detailed information on how neural networks work, Andrej Karpathy has written a blog which is very helpful.

Classifying Digits

In this instructable, the task that we are going to be using Neural Networks for is classifying digits. The digits are 0-9 and we will be classifying a single digit at a time. One of the most important aspects of machine learning is the quality and amount of training data. Training data is a tuple of an image of a digit and the digit itself. The neurons in the network optimize themselves according to the training data so a greater variety is better. A popular data set available online is the MNIST Digits Dataset.

Setup Environment

Python 2.7 (Install Python)
Setup TensorFlow
- TensorFlow for Windows
- TensorFlow for Mac OSX
Install Text Editor
- Sublime
- Any other

Note:

Unlike Unix systems, Windows does not require Python and does not pre-install Python. Therefore, to make sure Python works perfectly, take a look at the documentation.

Also, you can use this Python Primer to learn how to use Python.

MNIST Dataset

Screen-Shot-2015-08-14-at-2.44.57-PM.png

Before we dive into coding, it might be worthwhile to take a look at the data set that we will be using. The MNIST database is a large collection of images of handwritten digits. It also contains the label for each image, which indicates the correct digit the image represents.

Each image is 28 pixels by 28 pixels. We can think of an image as a matrix with 28 x 28 = 784 numbers in it. It is also important to note that the images are grayscale so each pixel can be represented by a single number.

The training data that we will be using has 55,000 images and 55,000 labels. We also use 5,000 images for validation of the model and 10,000 images to test it.

Note:
If you have setup TensorFlow, you do not need to worry about downloading the dataset.

A Simple Learner

Note:
You can skip this step if you don't want to start from a basic model and dive straight into Convolutional Neural Networks.

Before we dive into using Convolutional Neural Networks, you might want to start off with an easier model. It will also allow us to set a performance baseline. The model that we will be using is Logistic Regression along with the Softmax function.

Let's Get Started!

Create a file called 'simple_learner.py' using your preferred text editor.

First, we want to make sure that we have the data set in place.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

mnist = input_data.read_data_sets("MNIST_data/", one_hot = true)

This stores the data in the variable called 'mnist'. You might be wondering what one_hot means. Basically, it is a column vector with all rows set to 0 except for one. In this example, we have digits 0-9 so the vector would have 10 rows. This allows the label to be represented as a vector.

Now, let's set up all the placeholders that we need to hold the images. We can do that by using TensorFlow's 'placeholder' object. Note that 'x' is the image and has 28 x 28 = 784 rows while 'y_data', which is the label, has 10 rows.

x = tf.placeholder(tf.float32, [None, 784])
y_data = tf.placeholder(tf.float32, [None, 10])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)

'W' is the weight matrix and 'b' is the bias vector. We multiply the 'x' and 'W' matrices using the 'matmul' function and add the bias. We then apply the softmax function to get the predicted label which is stored in 'y'.

In order to train the machine, we try to minimize some cost function. The cost function represents how much further from the truth was our prediction. The cost function we are using is Cross Entropy.

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_data * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

The train-step is to minimize the cross-entropy using a gradient descent optimizer.

Now we initialize all the variables that we created and run the model.

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
n = 1000
for i in range(n):
	batch_xs, batch_ys = mnist.train.next_batch(100)
	sess.run(train_step, feed_dict={x:batch_xs, y_data: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_data, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x:mnist.test.images, y_data:mnist.test.labels}))

We run the training step 'n' number of times on small batches of the data. At the end, we print out the accuracy of our model which we get by getting the mean of correct predictions.

To run this, open up command line and navigate to the directory where you stored this file and type:

python simple_learner.py

You should get about 92% accuracy.

CNNs

From the Logistic Regression model, we were able to get around 92% accuracy but we can do much better. We will now build a Convolutional Neural Network. Our CNN will have two convolution layers.

Start out by creating a file called 'advanced_learner.py'. The first few steps are similar to the Logistic Regression example.

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32, [None, 784])
y_data = tf.placeholder(tf.float32, [None, 10])

Note:
Instead of initializing the session, we are using the interactive mode.

Because of the number of weights that we have to initialize, we can create helper functions. We can also create helper functions for creating the convolution and max-pooling layers.

def weight_variable(shape):
	i = tf.truncated_normal(shape, stddev=0.1)
	return tf.Variable(i)

def bias_variable(shape):
	i = tf.constant(0.1, shape=shape)
	return tf.Variable(i)

def conv2d(x, W):
	return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')

def max_pool(x):
	return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

Note:
In this case, the weights are initialized with some deviation rather than zero. To understand the parameters for the convolution and max-pool layers, you can refer to the Wikipedia article.

Now we can start creating all the weight variables and the network layers. We first reshape the image matrix to be of size 28 pixels by 28 pixels.

x_image = tf.reshape(x, [-1,28,28,1])

We then create the first layer. This will compute 32 features for each 5x5 portion of the image.

Note:
We are using the RELU function.

Wcon1 = weight_variable([5,5,1,32])
bcon1 = bias_variable([32])
hcon1 = tf.nn.relu(conv2d(x_image, Wcon1) + bcon1)
hpool1 = max_pool(hcon1)

The second layer computes 64 features for each 5x5 portion of the output from layer 1.

Wcon2 = weight_variable([5,5,32,64])
bcon2 = bias_variable([64])
hcon2 = tf.nn.relu(conv2d(hpool1, Wcon2) + bcon2)
hpool2 = max_pool(hcon2)

We now add a fully connected layer that considers all the features from layer 2. Convolution and pooling reduce the size of the image at each step. After layer 2, the image is 7x7.

Wfc1 = weight_variable([7*7*64, 1024])
bfc1 = bias_variable([1024])
hpool2_f = tf.reshape(hpool2, [-1,7*7*64])
hfc1 = tf.nn.relu(tf.matmul(hpool2_f, Wfc1) + bfc1)

One of the biggest problems that we face with machine learning is that it tends to overfit to the training data. This means that it learns to correctly identify the images in the training set but fails to generalize to unseen examples. Dropout is one of the solutions in neural networks.

keep = tf.placeholder(tf.float32)
hfc1drop = tf.nn.dropout(hfc1, keep)

Now let's add the final layer that gives the output.

Wfc2 = weight_variable([1024, 10])
bfc2 = bias_variable([10])
ycon = tf.nn.softmax(tf.matmul(hfc1drop, Wfc2) + bfc2)

The last step is to create the cost function and set the optimizer. We again use cross-entropy as the cost function but we use the Adam Optimizer instead of gradient descent. Also, since we are running the model for a longer time, we want to see the intermediate progress. So, we print out the accuracy after every 100 iterations.

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_data * tf.log(yconv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_pred = tf.equal(tf.argmax(ycon,1), tf.argmax(y_data,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
sess.run(tf.initialize_all_variables())

for i in xrange(20000):
	batch = mnist.train.next_batch(50)
	if i%100 == 0:
		train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_data:batch[1], keep:1.0})
		print("step %d, training accuracy %g", i, train_accuracy)
	train_step.run(feed_dict={x : batch[0], y_data : batch[1], keep : 0.5})

print("test accuracy : %g",accuracy.eval(feed_dict={x: mnist.test.images, y_data: mnist.test.labels, keep:1.0}))

Run the file from command line using:

python advanced_learner.py

This should give an accuracy of about 99%.

Moving Forward

Congratulations!

You have built your first Neural Network.

If you do not have much experience with Machine Learning and Neural Networks, a lot of the code might not have made sense. But if you found this interesting, there are many resources that explain in detail how they work.

Digit Recogniser Using Neural Networks