Amazon Machine Learning – Commodity Machine Learning as a Service

Commodity turns into “as a service”

Find something in computing that is expensive and cumbersome to work with and Amazon will find a way to commoditize it.  AWS created commodity storage, networking, and compute services for end users with resources leftover from their online retail business.  The margins may have not been great, but Amazon could make this type of business profitable due to their scale.

But what happens now that Microsoft and Google can offer the same services at the same scale?  The need for differentiation drives innovative cloud products and platforms with the goal of attracting new customers and keeping those customers within a single ecosystem.  Using basic services like storage, networking, and compute may not be differentiated between the cloud providers but new software as a service (SaaS) offerings are more enticing when shopping for cloud services.

Software as a service (SaaS) probably isn’t the best way to describe these differentiated services offered by public cloud providers.  SaaS typically refers to subscription-based software running in the cloud like Salesforce.com or ServiceNow.  Recent cloud services are better described by their functionality with an “as a service” tacked on.  Database as a service.  Data warehousing as a service.  Analytics as a service.  Cloud providers build the physical infrastructure, write the software, and implement vertical scaling with a self-service provisioning portal for easy consumption.  Consumers simply implement the service within their application without worrying about the underlying details.

Legacy datacenter hardware and software vendors may not appreciate this approach due to lost revenue but the “as a service” model is a good thing for IT consumers.  Services that were previously unavailable to every day users have been democratized and are available to anyone with an AWS account and a credit card.  Cost isn’t necessarily the primary benefit to consumer but rather the accessibility and the ability to consume using a public utility model.  All users can have access now to previous exotic technologies and can pay by the hour to use them.

Machine Learning as a Service

Machine learning is at the peak of the emerging technology hype cycle.  But is it really all hype?  Machine learning (ML) has been around for 20+ years.  ML techniques allow users to “teach” computers to solve problems without hard-coding hundreds or thousands of explicit rules.    This isn’t a new idea but the hype is definitely a recent phenomenon for a number of reasons.

So why all the machine learning hype?  The startup cost both in terms of hardware/software and accessibility are much lower which presents an opportunity to implement machine learning that wasn’t available in the past.  Data is more abundant and data storage is cheap.  Computing resources are abundant and CPU/GPU cycles are (relatively) cheap.  Most importantly, the barriers have been lifted in terms of combining access to advanced computing resources and vast sets of data.  What used to require onsite specialized HPC clusters and expensive data storage arrays can now be performed in the cloud for a reasonable cost.  Services like Amazon Machine Learning are giving everyday users the ability to perform complicated machine learning computations on datasets that were only available to researcher at universities in the past.

How does Amazon provide machine learning as a service?  Think of the service as a black box.  The implementation details within the black box are unimportant.  Data is fed into the system, a predictive model/algorithm is created and trained, the system is tweaked for its effectiveness, and then the system is used to make predictions using the trained model.  Users can use the predictive system without really knowing what is happening inside the black box.

This machine learning black box isn’t magical.  It is limited to a few basic types of models (algorithms) – regression, binary classification, and multiclass classification.  More advanced type operations require users to look to AWS SageMaker and require a higher skill level than the basic AWS machine learning black box.  However, these three basic machine learning models can get you started on some real-world problems very quickly without really knowing much math or programming.

Amazon Machine Learning  Workflow

So how does this process work at a high level?  If a dataset and use case can be identified as a regression or binary/multiclass classification problem, then the data can simply be fed to the AWS machine learning black box.  AWS  will use the data to automatically select a model and train the model using your input data.  The effectiveness of the model is then evaluated with some measurable results that determine effectiveness of the model with a numerical score.  This model is ready to use at this point but can also be tweaked to improve the scoring.  Bulk data can get fed to the trained model for batch predictions or ad-hoc predictions can be performed using the AWS console or programmatically through the AWS API.

Knowing that a problem can be solved by AWS takes a bit of high-level machine learning knowledge.  The end user needs to have an understanding of their data and of the three model types offered in the AWS black box.  Reading through the AWS Machine Learning developer guide is a good start in terms of an overview.  Regressions models solve problems that need to predict numeric values.  Binary classification models predict binary outcomes (true/false, yes/no) and multiclass classification models can predict more than two outcomes (categories).

Why use Amazon machine learning?

For those starting out with machine learning this AWS service may sound overcomplicated or of questionable value.  Most tutorials show this type of work done on a laptop with free programming tools, why is AWS necessary?  The point is the novice user can do some basic machine learning in AWS without the high startup opportunity costs of learning programming or learning how to use machine learning software packages.  Simply feed the AWS machine learning engine data and get instant results.

Anyone can run a proof of concept data pipeline into AWS and perform some basic machine learning predictions.  Some light programming skills would be helpful but are not mandatory.  Having a dataset with a well-defined schema is a start as well as having a business goal of using that dataset to make predictions on similar sets of incoming data.  AWS can provide these predictions for pennies an hour and eliminate the startup costs that would normally delay or even halt these types of projects.

Amazon machine learning is a way to quickly get productive and get a project past the proof of concept phase in making predictions based on company data.  Access to the platform is open to all users so don’t rely on the AWS models for a competitive advantage or a product differentiator.  Instead use AWS machine learning as a tool to quickly get a machine learning project started without much investment.

Thanks for reading, some future blog posts here will include running some well-known machine learning problems through Amazon Machine Learning to highlight the process and results.