Exporting Lightsail instance snapshots to Amazon EC2

I’ve been using Amazon Lightsail for the past six months to host my WordPress blog site. I’m happy with Lightsail – I have full control over both my Linux instance OS and my WordPress blog site administration but I don’t have to deal with the headaches (or the price) of a full-blown EC2 instance.

A few months back I wrote a blog post listing the pros and cons of Amazon Lightsail.  Why would one use Lightsail versus Amazon EC2?  Within the Amazon ecosystem, Lightsail is the cheapest and easiest to use platform if the goal is to get a Linux instance and simple application up and running in AWS.

Think of Lightsail as a lightweight version of AWS EC2.  There really is no learning curve to get started on Lightsail – just pick some basic options and launch the instance.  The Lightsail pricing structure is straightforward and predictable.  Both the pricing and ease of use make Lightsail as a hosting platform  attractive for bloggers and developers to get started on Amazon and the experience is quick, cheap, and easy.

With this simplicity comes limitations.  A lightweight version of an Amazon EC2 instance doesn’t have the full compliment of every EC2 feature.  This makes sense – Lightsail is geared towards bloggers and developers and is kept separate from EC2.  A lightweight version works fine when starting a project but what if we hit a point where the advanced networking features of a VPC are required?  Or autoscaling groups?  Or spot instances?

Fortunately this has changed as of AWS re:Invent 2018 – we can now convert a Lightsail instance into an EC2 instance using Lightsail instance snapshots!   I’ll walk through the process in this blog and show some of the strategies and considerations needed to move a Lightsail instance out of the limited Lightsail environment into the feature-rich EC2 environment.

Why should I move my instance out of Lightsail?

Let’s face it – we aren’t going to move our instance out of Lightsail unless we have to.  Lightsail is the cheapest option for running an instance in AWS.  Instance monthly plans start as low as $3.50/month as long as the instance stays within the resource bounds of the pricing plan.

Adding more horsepower to a Lightsail instance is also not a reason to move to EC2.  Lightsail allows upgrading an instance to use more resources (for a price) – just use an instance snapshot to clone the existing instance to a new plan with more resources (the application inside the instance will persist using instance snapshots).

But what if we want to to take our application out of the Lightsail environment and place it into more of a production environment?  What if we have found limitations in Lightsail and found that our applications needs to reside in EC2 to take advantage of all the available EC2 features?  What if we need web-scale features, advanced networking, and access to services that are limited in the Lightsail environment?  In that use case we need to move to EC2.

AWS terminology and concepts

Before we go through the conversion process, let’s clear up some terminology and concepts that will help illustrate the task.

Instance – this is just Amazon’s fancy word for a virtual machine.  The term “virtual machine” probably sounded too much like VMware so Amazon calls virtual machines “instances”.  An instance is a virtual machine but runs on Amazon’s proprietary hypervisor environment.

Lightsail instance – An AWS image running in the Lightsail environment.  Lightsail is separate from other AWS environments and runs on a dedicated  hardware stack in selected data centers.

EC2 instance – An AWS image running in the EC2 environment.  EC2 is completely separate from Lightsail and runs on its own hardware and has many more management and configuration options.

Instance snapshot  – A snapshot of either a Lightsail or EC2 instance that not only includes the state of the virtual machine but also a snapshot of the attached OS disk.

Disk snapshot – Unlike an instance snapshot, a disk snapshot only tracks changes to the physical disk attached to the virtual machine.  No virtual machine state is captured so this type of snapshot is only good for keeping track of block disk data, not for the virtual machine state.

CloudFormation Stack – a template that tracks AWS resources associated with an AWS service through the use of a template.  CloudFormation can be used to define the configuration details of an AWS instance.

Key pair – quite simply, a .PEM file used to connect to an instance.  We can’t connect to an instance remotely (SSH, Putty) without a .PEM key pair.

AMI – a preconfigured template that is also associated with an EBS snapshot and is used to launch an instance with a specified configuration

How exporting an instance snapshot works

Instance snapshots are portable.  When an instance snapshot is taken of a Lightsail instance, the OS and disk state are stored to flat files and kept in some form of S3 object storage.  The flat files can be shared within Lightsail to create a new instance from the same snapshot – we can even assign more CPU or RAM to an instance snapshot to upgrade the instance to something faster.

With the latest AWS Lightsail EC2 instance snapshot export feature we can now take the same Lightsail snapshot and share it with AWS EC2 infrastructure.  This allows EC2 to import the Lightsail snapshot  to clone our Lightsail instance and spin it up in EC2.  Lightsail snapshots were always portable but the recent EC2 export feature enabled sharing of snapshots outside of Lightsail.

Behind the scenes Amazon uses CloudFormation to enable this conversion process.  Which simply means we automatically get a CloudFormation stack (or template) created when we move our Lightsail instance snapshot out to EC2.  But we don’t have to worry at all about CloudFormation, this is transparent to the user and doesn’t even require the user to open the CloudFormation console or interact at all with CloudFormation.

Walkthrough of moving a Lightsail instance to EC2

Exporting a Lightsail instance is very easy.  First log into your AWS account and open the Lightsail console.  Select your existing Lightsail instance to see the options below.

lightsail Console

Open the snapshots tab and create an instance snapshot (if you don’t already have one).  Use any name you like and click “create snapshot”.  The process will most likely take a few minutes and the progress is shown in the console.

Create Instance Snapshot in Lightsail console

After the snapshot is completed it will display in the list of “recent snapshots”.  We can then select this snapshot (three orange dots on the right) and select “Export to Amazon EC2”.  This will move the snapshot out of Lightsail object storage over to EC2.

Recent Snapshots in Lightsail Console

An informational pop-up will appear explaining the outcome (EBS snapshot and AMI) along with a warning that EC2 billing rates will apply, click “Yes, continue”.

Export Lightsail Snapshot to EC2 in Lightsail console

One last warning will appear alerting that the existing default Lightsail key pairs (.PEM file) should be replaced with an individual key pair.  Click “Acknowledged”.

Export Security Warning in Lightsail Console

The progress can be monitored using the top “gear” in the Lightsail console which will “spin” during the export process.  When completed, the gear will stop “spinning” and the task will show as completed and “Exported to EC2” in the task history.

Running Snapshot export task in Lightsail Console

Completed Snapshot export in lightsail console

Using the exported EC2 Lightsail instance

Now that we have exported our Lightsail instance snapshot to EC2, what happens next?  By default, nothing.  Looking at the completed task screenshot (above), we can open the EC2 console to see what we have or we can use the Lightsail console to launch in EC2 instance from our exported snapshot.

Lightsail is great and I love the simplified console but we really should use the EC2 console at this point.  Lightsail has all new wizard screens to manage our instance in EC2 but my feeling is that if we are going to export to EC2, we should switch over and learn to use the EC2 console.

See the screenshot below, open the EC2 console and make sure region is the same that was used for Lightsail (in my case I’m US East (Ohio)).

EC2 in the AWS Console

Open the “Elastic Block Storage” link on the left side of the EC2 console.  On the top right side of the console we can see the instance snapshot that was created by Lightsail.  Since this snapshot is an instance snapshot we will also find an AMI associated with this EBS snapshot.

elastic block storage snapshots in EC2 Console

EBS snapshots in ec2 console

Open the “Images” – “AMI” link in the EC2 console.  We can see in the upper right frame that the Lightsail snapshot has been converted to an EC2 AMI.  This was done with AWS CloudFormation but that doesn’t really matter, the AMI is ready for us to use without worrying about how it was created.

AMIs in ec2 console

AMis in ec2 console

Now the fun part, we can launch our Lightsail instance in EC2 using the AMI.  Just click “launch”!  A few things to consider when launching the AMI (the details are out of scope for this post):

Instance size – pick an EC2 image size for your Lightsail instance, the EC2 options have different pricing plans and different levels of CPU, RAM, GPU, storage, etc.

Security (key pair, security groups, VPC info) –  design a strategy (if you don’t already have one) to secure your new image with a VPC, security group, key pair, etc.

Additional storage – additional disk storage can be attached during the EC2 launch

Tags – optionally you can tag your instance for easier AWS billing.

Pick your options and you are ready to go, your Lightsail instance is now running in EC2!  Thanks for reading!

Handwritten digit recognition – MNIST via Amazon Machine Learning APIs

In our last post we explored how to use the Amazon Machine Learning APIs to solve machine learning problems.  We looked at the SDK, we looked at the machine learning client documentation, and we even looked at an example on Github.  But we didn’t write any code.

If it is so easy to consume AWS APIs with code then why don’t we take a shot at writing some code?  Point taken, this post will include code!  We will use Python to write some functions to call Amazon Machine Learning APIs to create a machine learning model and run some batch predictions.  The code I’ve written is heavily influenced by AWS Python sample code – I’ve written my functions and main body but borrowed a few bits like the polling functions and random character generation.

MNIST dataset – handwritten digit recognition

The MNIST dataset is a “hello world” type machine learning problem that engineers typically use to smoke test an algorithm or ML process.  The dataset contains 70,000 handwritten digits from 0-9 each scanned into a 28×28 pixel representation of each digit.  Each pixel value represents on of the 784 (28 x 28) pixel’s intensity with a value from 0 (white) to 255 (black).  See the Wikipedia entry for a sample visualization  of a subset of the images.

We are going to pull the MNIST data from Kaggle to make things easy.  Why pull it from Kaggle?  It saves us some time, the Kaggle version has conveniently split the data into 48,000 images in a training CSV and 28,000 images in a test CSV.  The training CSV has the labels in the first column which are just the numbers represented by all the pixels in the row.  The test CVS file has removed the label for use for predictions.  Kaggle has made things a bit easier for us, we don’t have to do any manipulation of the data.

Code on Github – https://github.com/keithxanderson

The code I have written to run MNIST through Amazon Machine learning via APIs is up on Github.  See the README.md for a detailed explanation of how to use the code and what exact steps are required to get things working.  We are going to use Python and specifically the ‘machinelearning’ client of Boto3 in order to pass our calls to AWS.

While the README.md explains things pretty well, let’s summarize here.  The main MNIST.py Python file uses the MNIST training data to create and evaluate an ML model and then uses the test data to create a batch prediction.  Data in the test file contains the pixels but no label, our model is predicting what digit is represented by the pixels after “learning” from the training data with labels (the answers).

There is a Python function written for each of the high level operations :

create_training_datasource – takes a training file from an S3 bucket, a schema file from an S3 bucket, and a percentage argument and splits the training data into a training dataset and an evaluation dataset.  The number of rows in each dataset is determined by the percentage set.  The schema file defines the schema (obviously).

create_model – takes a training datasource ID and a recipe file to create and train a model.  The function is hard-coded to create a multiclass classification model (multinomial logistic regression algorithm).

create_evaluation – takes an our model ID and our evaluation datasource ID and creates an evaluation which simply scores the performance of our model using the reserved evaluation data.  Results are seen manually in the AWS console but could be added to our code if we wanted.

create_batch_prediction_dataset – takes a test file from an S3 bucket and a schema file (a different schema file than our training schema file) and creates a batch dataset.  We will use this to make predictions, this data does not contain any labels.

batch_prediction – takes an existing model ID, a batch datasource ID and runs predictions of the batch data against our model.  Results are stored in the S3 output argument are written to a flat file.

Schema files and recipe

Our datasources all need a schema and we can define these with a schema file.  The schema for the training and evaluation data are the same and can use the same file.  The batch datasource does not contain labels so needs a separate schema file.

The training and evaluation schema file for MNIST can be found in the Github project with the Python code.  We simply define every column and declare each of the columns as “CATEGORICAL” since this is a multiclass classification model.  Each handwritten number is a category (0-9) and each pixel is a category (0-255).  The format of the schema file is defined by AWS, if you create a datasource manually in the console you can copy the AWS generated schema in your schema file.

The batch schema file is also found in the same project.  The only difference between the batch schema file and the training/evaluation schema file is that the batch schema does not contain the labels (numbers).

Lastly, our recipe file is found on Github with the Python code and schema files.  The recipe simply instructs AWS how to treat the input and output data.  Our example is very simply since all features (columns) are categorical and we just define all of our outputs as “ALL_CATEGORICAL”.  This saves us from manually declaring each pixels feature as a category.

Running the code

See the README.md for detailed instructions on running the code.  Provided we have our CSV files, schema files, and recipe file in an S3 bucket, we would just need to modify the MNIST.py file with the S3 locations for each file in the main body of the code along with an output S3 bucket for the prediction results (all can share the same S3 bucket).

After executing the MNIST.py file we will see the training, evaluation, and batch dataset creation execute in parallel and the model and evaluation process go pending.  Once we have our datasets created we will see the model training start and once we see the model training complete we will see the evaluation start.

A polling function is called within the batch prediction function to wait on the evaluation process to complete prior to running a batch prediction.  This is optional, there is really no dependency since the batch prediction can run in parallel with the evaluation scoring of our model.

Model evaluation performance

If all went well we will have an evaluation score in the AWS console.  Let’s take a look!  Looking at our ML model evaluation performance in the console we see an F1 score of .914 compared against a baseline score of .020 which is very good.

ML Model Performance

Exploring our model performance against our evaluation data we see that the model is very good at predicting the handwritten digits. The matrix below shows dark blue for every digit correctly classified as the numbers below.  This is not terribly surprising, most algorithms perform pretty well with the MNIST dataset but give credit to AWS for tuning the multiclass classification algorithm to this level of performance.

Explore Model Performance

Batch prediction results

We should also have an output file in our batch prediction output S3 bucket if everything ran to completion.  I’ve included a sample output CSV file within my MNIST project to view.  See the AWS documentation for more information on how to interpret the output.  Each column contains the categories for prediction which are the digits from 0-9.  Each row is the predicted probability that the test data belongs to the category.

For example, in our sample CSV file the first prediction shows about a 99% probability the number is a 2 digit and our second prediction shows a 99% probability the next number is a 0.

Why use Amazon Machine Learning?

We’ve discussed the value of using Amazon Machine Learning before but it is worth repeating.  It is very easy to incorporate basic machine learning algorithms into your application by offloading this work to AWS.  Avoid spending time and money building out infrastructure or expertise for an in-house machine learning engine.  Just follow the basic workflow outlined above and make a few API calls to Amazon Machine Learning.

Thanks for reading!