MNist Hello World Example Part 1

This is the MNIST (National Institute of Standards and Technology ) data set that comes with Keras. It has 10,000 test images and 60,000 training images. This data set is sometimes refered to as the hello world of deep learning.

We will step through the Data Process

Data Procoss Steps

  1. Goal or Hypothesis

  2. Data Retrieval

  3. Data Processing

  4. Data Exploration

  5. Model Data

  6. Present Results

1. GOAL

The goal for this project is to come up with a machine learning model in Keras to predict hand written numbers.

2. Data Retrieval

In this example, Keras already comes with the MNist Data set for hand written recognition. The data just needs to be loaded. The data comes in a train and test split, which will be used.

In [116]:
from keras.datasets import mnist #Loads the Data Set

#load the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

3. Data Processing

After loading the data, we need to process it into a format that we can use. In our goal we stated we want to use Keras, so this gives us an idea on the data format we need. The data we loaded by default has pixel values in the range between 0 and 255, in graystale format.

At the data processing point, this is a good place to decided if we will use a validation set or not. We can split this off from the test set, or the training set. In this example, I will not use a validation set.

Lets quickly look at a single image

In [117]:
#Before Data Processing or Data Exploration, lets see what a image looks like
import matplotlib.pyplot as plt #Used to Display data and results

#To display an example image, here is one digit
digit = x_train[0]
plt.imshow(digit, cmap=plt.cm.binary)
plt.show()
                   

Processing the data

We can see what our data looks like, but we need to put the data into a format that Keras can use. First we change the shape for single input dimensions. This is just a flatten array. Then we change the unit8 values (0 to 255 range) to float32 (0 to 1) which is Keras's default. Below is how we do this.

Next we and to change our labels to categorical format. This will turn the labels of values 0-9 into an array of bianary values. For example the label 1 turns into: [0,1,0,0,0,0,0,0,0,0] to to be categorical.

In [118]:
from keras.utils import to_categorical 

#Change the shape for single dimension output
#Change values from uint8 (0 to 255) to float32 (0 to 1)
x_train = x_train.reshape(x_train.shape[0],28*28)
x_train = x_train.astype('float32')/255   

x_test = x_test.reshape(x_test.shape[0], 28*28)
x_test = x_test.astype('float32')/255


#Change labels to categorical format
y_test = to_categorical(y_test)
y_train = to_categorical(y_train)

3. Data Exploration

Now that our data is in a format Keras can use, lets look at the data more closely. What this means is lets see how balanced the data is. For starters, we will see how balanced our class labels are in our training set.

In [119]:
train_count = (y_train).sum()
test_count = (y_test).sum()
print("Number of Training images " +  str(train_count))
print("Number of Testing images " + str(test_count))
Number of Training images 60000.0
Number of Testing images 10000.0

This is ruffly a 85/15 (or closer 84/14) split. A more common approach is a 80/20 train/test split. But since this is already split this way, we will keep this format.

This is a import part, because later on in the Keras model, when fit is used, there is an option to do a validation split. Since we see our test data is ruffly about 15%, we will use this same percent for the validation split.

In [120]:
#Lets explore our training data. This is our total y labels for training and validation data
import numpy as np

plt.clf()
values1 = (y_train).sum(axis=0)

display = [0,1,2,3,4,5,6,7,8,9]
plt.bar(display, values1, color = 'b', align='center', label='Train')

plt.xticks(display,display)
plt.ylim(bottom=5000)
plt.yticks(np.arange(5000, 7001, 200))
plt.ylabel('Frequency')
plt.xlabel('Number')
plt.title('Data Distribuation')
plt.legend()
plt.show()
In [121]:
plt.clf()
values1 = (y_test).sum(axis=0)

display = [0,1,2,3,4,5,6,7,8,9]
plt.bar(display, values1, color = 'b', align='center', label='Train')

plt.xticks(display,display)
plt.ylim(bottom=500)
plt.yticks(np.arange(500, 1201, 100))
plt.ylabel('Frequency')
plt.xlabel('Number')
plt.title('Data Distribuation')
plt.legend()
plt.show()
    

From the data distributions seen, there isn't that big of a difference. In both cases there are less number 5s than any other number, and clearly more number 1s. But the balance isn't drastic.

Model Data

Next step is to apply a model to our data. For this example we will use Keras. Using a basic sequential model, we experiment with different layers. Without getting into Cross Validation, we explore a few common and basic optimizations.

In [122]:
#Create the model. These are two NN implementations.
#This is used as a baseline, while the CNN is the real implementation

# The first step in our process is to import the libraries we will need
from keras import models #Loads models
from keras import layers #Loads Layers

def my_model():

    '''
    #Baseline 1 - First attempt
    #Train - On epoch 6 - loss: 0.1367 - acc: 0.9605 - val_loss: 0.1389 - val_acc: 0.9611
    # Test - 6 epochs - Loss 0.13756338410377503 Accuracy 0.9608
    model = models.Sequential()
    model.add(layers.Dense(32, input_shape=(28*28,), activation='relu'))
    '''
    
    
    #Baseline 2
    # Train - On epoch 6 - loss: 0.1305 - acc: 0.9614 - val_loss: 0.1392 - val_acc: 0.9613
    # Test - Loss 0.13497577327471227 Accuracy 0.9604
    model = models.Sequential()
    model.add(layers.Dense(32, activation = 'relu', input_shape = (28*28,))) 
    model.add(layers.Dense(16, activation = 'relu',)) 
    
        
    '''
    #Baseline 3 - Best results
    # Train - On epoch 3 - loss: 0.0654 - acc: 0.9808 - val_loss: 0.0751 - val_acc: 0.9786
    # Test - 3 epochs - Loss 0.07999640309328679 Accuracy 0.9767
    model = models.Sequential()
    model.add(layers.Dense(512, activation = 'relu', input_shape = (28*28,)))  
    '''
    
    '''
    #Baseline 4
    # Train - on epcoh 3 loss: 0.0839 - acc: 0.9753 - val_loss: 0.0853 - val_acc: 0.9764
    # Test - Loss 0.08825657425741665 Accuracy 0.9729
    model = models.Sequential()
    model.add(layers.Dense(256, activation = 'relu', input_shape = (28*28,))) 
    '''
    
    '''
    #Baseline 5
    # Train - on epoch 3 - loss: 0.0890 - acc: 0.9746 - val_loss: 0.0983 - val_acc: 0.9742
    # Test - 3 epochs - Loss 0.07707974565780605 Accuracy 0.9788
    model = models.Sequential()
    model.add(layers.Dense(512, activation = 'relu', input_shape = (28*28,)))
    model.add(layers.Dense(256, activation = 'relu'))
    '''
    
    '''
    #Baseline 6
    # Train - on epoch 20 - loss: 0.1173 - acc: 0.9784 - val_loss: 0.1158 - val_acc: 0.9812
    # Test - 20 epochs - Loss 0.1360144746005074 Accuracy 0.9808
    model = models.Sequential()
    model.add(layers.Dense(512, activation = 'relu', input_shape = (28*28,)))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(256, activation = 'relu'))
    model.add(layers.Dropout(0.5))
    '''

    #This is the output layer, need 10 for 10 classificationsi, returns an array of 10
    model.add(layers.Dense(10,activation='softmax'))
    
    #Used to compile the model
    model.compile(optimizer="RMSprop", loss="categorical_crossentropy", metrics=["accuracy"])
   


    return model
In [123]:
#Randomly pick 10 epochs and batch_size 64
epochs = 20
batch_size = 64
model = my_model()
history = model.fit(np.array(x_train), np.array(y_train), epochs=epochs, batch_size=batch_size, validation_split=0.15)
Train on 51000 samples, validate on 9000 samples
Epoch 1/20
51000/51000 [==============================] - 3s 64us/step - loss: 0.4731 - acc: 0.8620 - val_loss: 0.2315 - val_acc: 0.9331
Epoch 2/20
51000/51000 [==============================] - 3s 52us/step - loss: 0.2287 - acc: 0.9330 - val_loss: 0.1772 - val_acc: 0.9499
Epoch 3/20
51000/51000 [==============================] - 3s 61us/step - loss: 0.1814 - acc: 0.9478 - val_loss: 0.1521 - val_acc: 0.9571
Epoch 4/20
51000/51000 [==============================] - 3s 55us/step - loss: 0.1534 - acc: 0.9551 - val_loss: 0.1398 - val_acc: 0.9593
Epoch 5/20
51000/51000 [==============================] - 3s 49us/step - loss: 0.1343 - acc: 0.9610 - val_loss: 0.1261 - val_acc: 0.9644
Epoch 6/20
51000/51000 [==============================] - 2s 48us/step - loss: 0.1206 - acc: 0.9647 - val_loss: 0.1351 - val_acc: 0.9624
Epoch 7/20
51000/51000 [==============================] - 3s 61us/step - loss: 0.1100 - acc: 0.9675 - val_loss: 0.1195 - val_acc: 0.9677
Epoch 8/20
51000/51000 [==============================] - 4s 82us/step - loss: 0.1017 - acc: 0.9707 - val_loss: 0.1208 - val_acc: 0.9681
Epoch 9/20
51000/51000 [==============================] - 3s 54us/step - loss: 0.0942 - acc: 0.9723 - val_loss: 0.1225 - val_acc: 0.9659
Epoch 10/20
51000/51000 [==============================] - 2s 47us/step - loss: 0.0891 - acc: 0.9739 - val_loss: 0.1187 - val_acc: 0.9670
Epoch 11/20
51000/51000 [==============================] - 2s 44us/step - loss: 0.0840 - acc: 0.9754 - val_loss: 0.1160 - val_acc: 0.9689
Epoch 12/20
51000/51000 [==============================] - 2s 42us/step - loss: 0.0785 - acc: 0.9768 - val_loss: 0.1183 - val_acc: 0.9672
Epoch 13/20
51000/51000 [==============================] - 2s 43us/step - loss: 0.0751 - acc: 0.9776 - val_loss: 0.1131 - val_acc: 0.9710
Epoch 14/20
51000/51000 [==============================] - 2s 44us/step - loss: 0.0707 - acc: 0.9795 - val_loss: 0.1177 - val_acc: 0.9704
Epoch 15/20
51000/51000 [==============================] - 2s 46us/step - loss: 0.0672 - acc: 0.9800 - val_loss: 0.1198 - val_acc: 0.9688
Epoch 16/20
51000/51000 [==============================] - 2s 43us/step - loss: 0.0644 - acc: 0.9808 - val_loss: 0.1195 - val_acc: 0.9698
Epoch 17/20
51000/51000 [==============================] - 2s 43us/step - loss: 0.0620 - acc: 0.9818 - val_loss: 0.1157 - val_acc: 0.9706
Epoch 18/20
51000/51000 [==============================] - 2s 42us/step - loss: 0.0594 - acc: 0.9824 - val_loss: 0.1267 - val_acc: 0.9681
Epoch 19/20
51000/51000 [==============================] - 2s 39us/step - loss: 0.0568 - acc: 0.9828 - val_loss: 0.1217 - val_acc: 0.9696
Epoch 20/20
51000/51000 [==============================] - 2s 41us/step - loss: 0.0546 - acc: 0.9841 - val_loss: 0.1402 - val_acc: 0.9649
In [127]:
#Now lets evaluate the results we seen
history_dict = history.history

loss_values = history_dict['loss']
loss_values = np.array(loss_values)


val_loss_values = history_dict['val_loss']
val_loss_values = np.array(val_loss_values)

epochs = range(1, len(loss_values) + 1)
plt.xticks(epochs)

plt.plot(epochs, loss_values, label='Training loss')
plt.plot(epochs, val_loss_values, label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.clf()
plt.xticks(epochs)
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']
plt.plot(epochs, acc_values, label='Training acc')
plt.plot(epochs, val_acc_values, label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
In [126]:
#Fit/Train the model based on eval above
#Since accuracy looks best at 5 epochs, we select 5model = my_model()

model = my_model()
model.fit(x_train, y_train, epochs = 6, batch_size=64)
 
#See how accurate the training model is against a test set
loss, accuracy = model.evaluate(x_test,y_test)
print("Loss " + str(loss))
print("Accuracy " + str(accuracy))
Epoch 1/6
60000/60000 [==============================] - 3s 54us/step - loss: 0.4432 - acc: 0.8762
Epoch 2/6
60000/60000 [==============================] - 3s 43us/step - loss: 0.2130 - acc: 0.9397
Epoch 3/6
60000/60000 [==============================] - 3s 42us/step - loss: 0.1720 - acc: 0.9513
Epoch 4/6
60000/60000 [==============================] - 4s 59us/step - loss: 0.1474 - acc: 0.9580
Epoch 5/6
60000/60000 [==============================] - 4s 67us/step - loss: 0.1308 - acc: 0.9625
Epoch 6/6
60000/60000 [==============================] - 3s 57us/step - loss: 0.1184 - acc: 0.9662
10000/10000 [==============================] - 1s 67us/step
Loss 0.13513277810718866
Accuracy 0.9615

Results and Analysis

Results Our goal was to get good accruacy and we did. The initial attempt was about 96% accuracy, and with some slight tuning we were able to get to about 98% accuracy.

Further Exploration:

For a better result we could use a cross validation to find the best perameters. We did however, not us cross validation to tune parameters. So adding and tuning multiple layers might give us better results.

Areas to explore:

Optimisers I did not change the optimiser because I know for this example RMSprop does fine. An topimiser like Adam likely can do better.

loss We used this to see if the model is improving or getting worse in relation to the accuracy. Categorical Crossentropy is the best option for this problem.

Accuracy Since our goal was to get the best accuracy, we used this as a metric.

Next Goal: Lets explore how a CNN performs on the same data.

Part 2

Now try using a convlutional neural network in Part 2.

In [ ]: