Handwritten digit recognition – MNIST via Amazon Machine Learning APIs

In our last post we explored how to use the Amazon Machine Learning APIs to solve machine learning problems.  We looked at the SDK, we looked at the machine learning client documentation, and we even looked at an example on Github.  But we didn’t write any code.

If it is so easy to consume AWS APIs with code then why don’t we take a shot at writing some code?  Point taken, this post will include code!  We will use Python to write some functions to call Amazon Machine Learning APIs to create a machine learning model and run some batch predictions.  The code I’ve written is heavily influenced by AWS Python sample code – I’ve written my functions and main body but borrowed a few bits like the polling functions and random character generation.

MNIST dataset – handwritten digit recognition

The MNIST dataset is a “hello world” type machine learning problem that engineers typically use to smoke test an algorithm or ML process.  The dataset contains 70,000 handwritten digits from 0-9 each scanned into a 28×28 pixel representation of each digit.  Each pixel value represents on of the 784 (28 x 28) pixel’s intensity with a value from 0 (white) to 255 (black).  See the Wikipedia entry for a sample visualization  of a subset of the images.

We are going to pull the MNIST data from Kaggle to make things easy.  Why pull it from Kaggle?  It saves us some time, the Kaggle version has conveniently split the data into 48,000 images in a training CSV and 28,000 images in a test CSV.  The training CSV has the labels in the first column which are just the numbers represented by all the pixels in the row.  The test CVS file has removed the label for use for predictions.  Kaggle has made things a bit easier for us, we don’t have to do any manipulation of the data.

Code on Github – https://github.com/keithxanderson

The code I have written to run MNIST through Amazon Machine learning via APIs is up on Github.  See the README.md for a detailed explanation of how to use the code and what exact steps are required to get things working.  We are going to use Python and specifically the ‘machinelearning’ client of Boto3 in order to pass our calls to AWS.

While the README.md explains things pretty well, let’s summarize here.  The main MNIST.py Python file uses the MNIST training data to create and evaluate an ML model and then uses the test data to create a batch prediction.  Data in the test file contains the pixels but no label, our model is predicting what digit is represented by the pixels after “learning” from the training data with labels (the answers).

There is a Python function written for each of the high level operations :

create_training_datasource – takes a training file from an S3 bucket, a schema file from an S3 bucket, and a percentage argument and splits the training data into a training dataset and an evaluation dataset.  The number of rows in each dataset is determined by the percentage set.  The schema file defines the schema (obviously).

create_model – takes a training datasource ID and a recipe file to create and train a model.  The function is hard-coded to create a multiclass classification model (multinomial logistic regression algorithm).

create_evaluation – takes an our model ID and our evaluation datasource ID and creates an evaluation which simply scores the performance of our model using the reserved evaluation data.  Results are seen manually in the AWS console but could be added to our code if we wanted.

create_batch_prediction_dataset – takes a test file from an S3 bucket and a schema file (a different schema file than our training schema file) and creates a batch dataset.  We will use this to make predictions, this data does not contain any labels.

batch_prediction – takes an existing model ID, a batch datasource ID and runs predictions of the batch data against our model.  Results are stored in the S3 output argument are written to a flat file.

Schema files and recipe

Our datasources all need a schema and we can define these with a schema file.  The schema for the training and evaluation data are the same and can use the same file.  The batch datasource does not contain labels so needs a separate schema file.

The training and evaluation schema file for MNIST can be found in the Github project with the Python code.  We simply define every column and declare each of the columns as “CATEGORICAL” since this is a multiclass classification model.  Each handwritten number is a category (0-9) and each pixel is a category (0-255).  The format of the schema file is defined by AWS, if you create a datasource manually in the console you can copy the AWS generated schema in your schema file.

The batch schema file is also found in the same project.  The only difference between the batch schema file and the training/evaluation schema file is that the batch schema does not contain the labels (numbers).

Lastly, our recipe file is found on Github with the Python code and schema files.  The recipe simply instructs AWS how to treat the input and output data.  Our example is very simply since all features (columns) are categorical and we just define all of our outputs as “ALL_CATEGORICAL”.  This saves us from manually declaring each pixels feature as a category.

Running the code

See the README.md for detailed instructions on running the code.  Provided we have our CSV files, schema files, and recipe file in an S3 bucket, we would just need to modify the MNIST.py file with the S3 locations for each file in the main body of the code along with an output S3 bucket for the prediction results (all can share the same S3 bucket).

After executing the MNIST.py file we will see the training, evaluation, and batch dataset creation execute in parallel and the model and evaluation process go pending.  Once we have our datasets created we will see the model training start and once we see the model training complete we will see the evaluation start.

A polling function is called within the batch prediction function to wait on the evaluation process to complete prior to running a batch prediction.  This is optional, there is really no dependency since the batch prediction can run in parallel with the evaluation scoring of our model.

Model evaluation performance

If all went well we will have an evaluation score in the AWS console.  Let’s take a look!  Looking at our ML model evaluation performance in the console we see an F1 score of .914 compared against a baseline score of .020 which is very good.

ML Model Performance

Exploring our model performance against our evaluation data we see that the model is very good at predicting the handwritten digits. The matrix below shows dark blue for every digit correctly classified as the numbers below.  This is not terribly surprising, most algorithms perform pretty well with the MNIST dataset but give credit to AWS for tuning the multiclass classification algorithm to this level of performance.

Explore Model Performance

Batch prediction results

We should also have an output file in our batch prediction output S3 bucket if everything ran to completion.  I’ve included a sample output CSV file within my MNIST project to view.  See the AWS documentation for more information on how to interpret the output.  Each column contains the categories for prediction which are the digits from 0-9.  Each row is the predicted probability that the test data belongs to the category.

For example, in our sample CSV file the first prediction shows about a 99% probability the number is a 2 digit and our second prediction shows a 99% probability the next number is a 0.

Why use Amazon Machine Learning?

We’ve discussed the value of using Amazon Machine Learning before but it is worth repeating.  It is very easy to incorporate basic machine learning algorithms into your application by offloading this work to AWS.  Avoid spending time and money building out infrastructure or expertise for an in-house machine learning engine.  Just follow the basic workflow outlined above and make a few API calls to Amazon Machine Learning.

Thanks for reading!