How to use the Amazon Machine Learning API

Machine learning, AI, deep learning – all of these technologies are popular and the hype cycle is intense.  When one digs into the topic of machine learning, it turns out to be a lot of math, data wrangling, and writing coding.  Which means one can quickly become overwhelmed by the details and the lose sight of the practicality of machine learning.

Take a step back from all the complexity and ask a fundamental question.  Why should we even look at machine learning?   Simple, we want our applications to get smarter and make better decisions.   Ignore the complexities of machine learning and think about the end result instead.  Machine learning is a powerful tool that can be used to our advantage when building applications.

Now consider how we would implement machine learning into our applications.  We could write lots of code within our app to pull historical data, use this data to create and train models, evaluate and tweak our models, and finally make predictions.  Or we could offload this work to public cloud services like Amazon Machine Learning.  All it takes is some code within our application to make calls Amazon Machine Learning APIs.

Machine learning via APIs?

I’ve been an infrastructure guy for most of my career.  While I have worked for software companies and have worked closely with software developers – I’m not a developer and have not made my living writing code.

That being said, machine learning requires learning to write code.  One wouldn’t go out and buy machine learning from a vendor but would rather write code to build and use machine learning models.  Furthermore, machine learning models don’t provide much value by themselves but are valuable when built into an application.  So coding is important for using machine learning.

Think along the same lines regarding consumption of public cloud.  Cloud isn’t just about cost, it’s about applications consuming infrastructure  and services with code.  Anything that can be created, configured, or consumed in AWS can (and should?) be done so with code.   So coding is also important for consuming public cloud.

Fortunately, writing code to use Amazon Machine Learning isn’t very hard at all.  Amazon publishes a machine learning developer guide and API reference documentation that shows how to write code to add machine learning to your application.  That combined with some machine learning examples on Github make the process easy follow.

Overview of an application using Amazon Machine Learning

What is the high-level architecture of an application using Amazon Machine Learning?  Let’s use an example as we walk through the architecture.  Think of a hypothetical customer loyalty application that sends rebates to customers we think may stop using our theoretical company’s services.  We could incorporate Amazon machine learning into the app to flag at-risk customers for rebates.

Raw data – first we need data.  This could be in a database, data warehouse, or even flat files in a filesystem.  In our example, these are the historical records of all customers that have used our service and have either remained loyal to our company or stopped using our services (the historical data is labeled with the outcomes).

Data processed and uploaded to S3 – Our historical data needs to be extracted from our original source (above), cleaned up, converted into a CSV file,  and then uploaded to an AWS S3 buckets.

Training datasource – our application needs to invoke Amazon Machine learning via an API and create a ML datasource out of our CSV file.

Model – our application needs to create and train a model via an AWS API using the datasource we created above.

Model evaluation – our app will need to measure how well our model performs with some of the data reserved from the datasource.  Our evaluation scoring gives us an idea of how well our model makes predictions on new data.

Prediction Datasource –  if we want to make predictions with machine learning, we need new customer data that matches our training data schema but does not contain the customer outcome.    So we need to collect this new data into a separate CSV file and upload it to S3.

Batch predictions – now for the fun part, we can now call our model via AWS APIs to predict the target value of the new data we uploaded to S3.  The output will be contained in a flat file in an S3 bucket we specify that we can read with our app and take appropriate action (do nothing or send a rebate).

How to get started?

First pick a programming language.  Python is popular both for machine learning and for interacting with AWS so we’ll assume Python is our language.  Next read through the AWS Python SDK Quickstart guide to get a feel for how to use AWS with Python.  Python uses the SDK Boto3 which needs to be installed and also needs to be configured to use your AWS account credentials.

Lastly, read through the machine learning section of the Boto3 SDK guide to get a feel of the logical flow of how to consume Amazon Machine Learning via APIs.  Within this guide we would need to call the Python methods below using the recommended syntax given.

create_data_source_from_s3 – how we create a datasource from a CSV file

create_ml_model – how we create an ML model from a datasource

create_evaluation – how we create an evaluation score for the performance of our model

create_batch_prediction –  how we predict target values for new datasources

Getting started with an example

At this point we have a good idea of what we need to do to incorporate Amazon machine learning into a Python app.  Those already proficient in Python could start writing new functions to call each of these Boto3 ML methods following the architecture we described above.

What about an example for those of use who aren’t as proficient in Python?  Fortunately there are plenty of examples online of how to write apps to call AWS APIs.  Even better, AWS released some in-depth ML examples on Github that walks through examples in several languages including Python.

Specifically lets look at the targeted marketing example for Python.  First check out the  We see that we have to have our AWS credentials setup locally where we will run the code so that we can authenticate to our AWS account.  We’ll also need Python installed locally (as well as Boto3) so that we can run Python executables (.py files).

Next take a look at the Python code to build an ML model.  A few things jump out in the program:

—  The code imports a few libraries to use in the code (boto3, base64, import json, import os, sys).

—  The code then defines the S3 bucket and CSV file that we will use to train our model (“TRAINING_DATA_S3_URL”).

—  Functions are written for each subtask that needs to be performed for our machine learning application.  We have functions to create a datasource (def create_data_sources), build a model (def build_model), train a model (def create_model), and performs an evaluation of the model (def create_evaluation).

—  The main body of the code (line 123 – ‘if __name__‘) ties everything together using the CSV file, a user defined schema file, and a user defined recipe file to call each function and results in a ML model with a performance measurement.

Schema and recipe

Along with the Python code we find a .schema file and recipe.json file.  We explored these in previous posts when using Amazon Machine Learning through the AWS console wizard.  When using machine learning with code through the AWS API, the schema and recipe must be created and formatted manually and passed as arguments when calling the Boto3 ML functions.

Data pulled out of a database probably already has a defined schema – these are just the rows and columns that describe the data fields and define the types of data inside them.  The ML schema needs to be defined in the correct format with our attributes and our target value along with some other information.

A recipe is a bit more complicated, the format for creating a recipe file gives the details.  The recipe simply defines how our model treats the input and output data defined in our schema.  Our example recipe file only defines outputs using some predefined groups (ALL_TEXT, ALL_NUMERIC, ALL_CATEGORICAL, ALL_BINARY ) and does a few transformations on the data (quantile bin the numbers and convert text to lower case).

Making predictions in our application 

Assuming everything worked when we ran the ‘’ executable, we would have a model available to make predictions on unlabeled data.  The ‘’ executable does just that.  We just need to get this new unlabeled in a CVS file and into an S3 bucket (“UNSCORED_DATA_S3_URL”) and allow the ‘’ code create a new datasource to use for predictions.  The main section of this ‘’ code ties everything together and calls the functions with the input/output parameters required.

Think about it, we have all the code syntax, formatting, and necessary steps necessary to add predictive ability to our application by simply calling AWS APIs.  No complex rules in our application, no need to write an entire machine learning framework into our application, just offload to Amazon Machine Learning via APIs.

Thanks for reading!


Amazon Machine Learning for a multiclass classification dataset

We’ve taken a tour of Amazon Machine Learning over the last three posts.  Quickly recapping, Amazon supports three types of ML models with their machine learning as a service (MLaaS) engine – regression, binary classification, and multiclass classification.  Public cloud economics and automation make MLaaS an attractive option to prospective users looking to outsource their machine learning using public cloud API endpoints.

To demonstrate Amazon’s service we’ve taken the Kaggle red wine quality dataset and adjusted the dataset to demonstrate each of the AWS MLaaS model types.  Regression worked fairly well with very little effort.  Binary classification worked better with a bit of effort.  Now to finish the tour of Amazon Machine Learning we will look at altering the wine dataset once more to turn our wine quality dataset prediction engine into a multiclass classification problem.

Multiclass Classification

What is a machine learning multiclass classification problem?  It’s one that tries to predict if an instance or set of data belongs to one of three or more categories.  A dataset is first labeled with the known categories and used to train an ML model.  That model is then fed new unlabeled data and attempts to predict what categories the new data belongs to.

An example of multiclass classification is an image recognition system.  Say we wanted to digitally scan handwritten zip codes on envelopes and have our image recognition system predict the numbers.  Our multiclass classification model would first need to be trained to detect ten digits (0-9) using existing labeled data.  Then our model could use newly scanned handwritten zip codes (unlabeled) and predict (with an accuracy value) what digits were written on the envelope.

Let’s explore how Amazon Machine Learning performs with a mulitclass classification dataset.

Kaggle Red Wine Quality Dataset

We ran the Kaggle Red Wine Quality dataset untouched through the Amazon machine learning regression algorithm.  The algorithm interpreted the scores as floating point numbers rather than integer categories which isn’t necessarily what we were after.  We then doctored up the ratings into binary categories of “great” and “not-so-great” ran the dataset through the binary classification algorithm.  The binary categories were closer to what we wanted, treat the ratings as categories instead of decimal numbers.

This time we want to change things again.  We are going to add a third category to our rating system and change the quality score again.  Rather than try to predict the wide swing of a 0-10 rating system, we will label the quality ratings as either good, bad, or average – three classes.  And Amazon ML will create a multiclass regression model to predict one of the three ratings on new data.

Data preparation – converting to three classes

We will go ahead and download the “winequality-red.csv” file again from Kaggle.  We need to replace the existing 0-10 quality scores with with categories of bad, average, and good.  We’ll use numeric categories to represent quality – bad will be 1, average will be 2, and good will be 3.

Looking at the dataset histogram below, most wines in this dataset are average  – rated a 5 or 6.  Good wines are rated > 6 and the bad wines are < 5.  So for the sake of this exercise, we’ll say wines rated 7 and higher are good (3), wines rated 5 or 6 are average (2), and those rated less than 5 are bad (1).  Even though the rating system is 0-10, we don’t have any wines rated less than 3 or greater than 10.

Wine quality Histogram

We could use spreadsheet formulas or write a simply python ‘for’ loop to append our CSV “quality” column.  Replace the ratings from 3-4 with a 1 (bad), replace our 5-6 ratings with a 2 (average), and replace our 7-8 ratings with a 3 (good).  Then we’ll write out the changes to a new CSV file which we’ll use for our new ML model datasource.

Wine quality Histogram AFTER PROCESSING

Running the dataset through Amazon Machine Learning

Our wine dataset now has a multiclass style rating – 1 for bad wine, 2 for average wine, and 3 for good wine,  Upload this new CSV file to an AWS S3 bucket that we will use for machine learning.

The process to create a dataset, model, and evaluation of the model is the same for binary classification as we documented in the blog post about linear regression.  Create a new datasource and ML model from the main dashboard and point the datasource to the S3 bucket with our multiclass CSV file.  The wizard will verify the schema and confirm that our file is clean.

We need to identify our quality rating as a category rather than numeric for Amazon to recognize this as a multiclass classification problem.  When setting up the schema, edit the ‘quality’ column and set the data type to ‘categorical’ type rather than the default  ‘numerical’.  We will again select “yes” to show our CSV file contains column names.


After finishing the schema, select the target prediction value which is the ‘quality’ column.  Continue through the rest of the wizard and accept all the default values.  Return to the machine learning dashboard and wait for the model to complete and get evaluated.

How did we do?

Let’s first look at our source dataset prior to the 70/30 split just to get an idea of our data distribution.  82% of the wines are rated a 2 (average), 14% are rated a 3 (good),and 4% are rated a 1 (bad).  By default, the ML wizard is going to randomly split this data 70/30 to use for training and performance evaluation.

Datasource Histogram prior to Train/test split

If all went well, we should have a completed model and evaluation of that model with no failures in our dashboard.  Let’s take a look at our evaluation summary and see how we did.


Looking at our evaluation summary, things look pretty good.  We have a summary saying our ML model’s quality score was measured with an average F1 score of 0.421.  That summary is somewhat vague so lets click “explore model performance”.

Model Performance- Confusion matrix

Multiclass prediction accuracy can be measured as a weighted sum of the individual binary predictions.  How well did we predict good wines (3), average wines (2), and bad wines (1)?  Each category’s accuracy score is combined into what is called an F1 score where a higher F1 score is better than a lower score.

A visualization of this accuracy is displayed above in a confusion matrix.  The matrix makes it easy to see where the model is successful and where it is not so successful.  The darker the blue, the more accurate the correct prediction, the darker the orange/red, the more inaccurate the prediction.

Looking at our accuracy above, it seems we are good at predicting average wine (2) – our accuracy is almost 90% and we predicted 361/404 wines correctly as average (2).  However, the model was not so good at predicting good (3) and bad (1) wines.  We only correctly predicted 13/47 (28%) as good and only correctly predicted 2/26 (.08%) as bad (1).  Our model is good at predicting the easy (2) category but not so good at predicting the more difficult and less common (1) and (3) categories.

Can we do better?

Our model is disappointing, we could have done almost as well if we just predicted that every wine was average (2).  We want train a model that is better at predicting good (3) wines and bad (1) wines.  But how?  We could use a few ML techniques to get a better model:

Collect more data – More data would help train our model better, especially having more samples of good (3) and bad (1) wines.  In our case this isn’t possible since we are working with a static public dataset.

Remove some features – Our algorithm may have an easier time evaluating less features.  We have 11 features (pH, sulphates, alcohol, etc) and we really aren’t sure if all 11 features have a direct impact on wine quality.  Removing some features could make for a better model but would require some trial and error.  We’ll skip this for now.

Use a more powerful algorithm – Amazon uses  multinomial logistic regression (multinomial logistic loss + SGD) for multiclass classification problems.  This may not be the best algorithm for our wine dataset, there may be a more powerful algorithm that works better.  However this isn’t an option when using Amazon Machine Learning – we’d have to look at a more powerful tool like Amazon SageMaker if we wanted to experiment with different algorithms.

Data wrangling –  Feeding raw data to the Amazon Machine Learning service untouched is easy but if we aren’t getting good results we will need to perform some pre-processing to get a better model.  Some features have wide ranges and some do not – for example the citric acid values range from 0-1 while the total sulfur dioxide values range from 6-289.  So some feature scaling might be a good idea.

Also, the default random 70/30 training and testing data split may not be the greatest way to train and test our model.  We may want to use a more powerful method to split the folds of the dataset ourselves rather than to let Amazon randomly split it.  Running a Stratified ShuffleSplit might be helpful prior to uploading it to Amazon.

Lastly, Amazon Machine Learning uses a technique called quantile binning for numeric values.  Instead of treating a range of numbers as discreet values, Amazon puts the range of values in “bins” and converts them into categories.  This may work well for non-linear features but may not work great for features with a direct linear correlation to our quality ratings.  Amazon recommends some experimentation with their data transformation recipes to tweak model performance.  The default recipes may not be the best for all problems.

Final Thoughts

Machine learning is hard.  And while Amazon’s MLaaS is a powerful tool – it isn’t perfect and it doesn’t do everything.  Data needs to be clean going in and most likely needs some manipulation using various ML techniques if we want to get decent predictions from Amazon algorithms.

Just for fun I did some data wrangling on the red wine dataset to see if I could get some better prediction results.  I manually split the dataset myself using a stratified shuffle split and then ran it through the machine learning wizard using the “custom” option which allowed me to turn off the AWS 70/30 split.  The results?  Just doing a bit of work I improved the prediction accuracy for our good (3) wines to 65% correct, bad (1) wines to 25% correct, and raised the F1 score to .59 (up from .42, higher is better).

Confusion matrix with stratified shuffling

Thanks for reading!