Deep learning (DL) frameworks enable machine learning (ML) practitioners to build and train ML models. However, the process of deploying ML models in production to serve predictions (also known as inferences) in real time is more complex. It requires that ML practitioners build a scalable and performant model server, which can host these models and handle inference requests at scale.
Model Server for Apache MXNet (MMS) was developed to address this hurdle. MMS is a highly scalable, production-ready inference server. MMS was designed in a ML/DL framework agnostic way to host models trained in any ML/DL framework.
In this post, we showcase how you can use MMS to host a model trained using any ML/DL framework or toolkit in production. We chose Amazon SageMaker for production hosting. This PaaS solution does a lot of heavy lifting to provide infrastructure and allows you to focus on your use cases.
For this solution, we use the approach outlined in Bring your own inference code with Amazon SageMaker hosting. This post explains how you can bring your models together with all necessary dependencies, libraries, frameworks, and other components. Compile them in a single custom-built Docker container and then host them on Amazon SageMaker.
To showcase the ML/DL framework-agnostic architecture of MMS, we chose to launch a model trained with the PaddlePaddle framework into production. The steps for taking a model trained on any ML/DL framework to Amazon SageMaker using an MMS bring your own (BYO) container are illustrated in the following diagram:
As this diagram shows, you need two main components to bring your ML/DL framework to Amazon SageMaker using an MMS BYO container:
- Model artifacts/model archive: These are all the artifacts required to run your model on a given host.
- Model files: Usually symbols and weights. They are the artifacts of training a model.
- Custom service file: Contains the entry point that is called every time an inference request is received and served by MMS. This file contains the logic to initialize the model in a particular ML/DL framework, preprocess the incoming request, and run inference. It also post-processes the logic that takes the data coming out of the framework’s inference method and converts it to end-user consumable data.
- MANIFEST : The interface between the custom service file and MMS. This file is generated by running a tool called the model-archiver, which comes as a part of MMS distribution.
- Container artifact: To load and run a model written in a custom DL framework on Amazon SageMaker, bring a container to be run on Amazon SageMaker. In this post, we show you how to use the MMS base container and extend it to support custom DL frameworks and other model dependencies. The MMS base container is a Docker container that comes with a highly scalable and performant model-server, which is readily launchable in Amazon SageMaker.
In the following sections, we describe each of the components in detail.
Preparing a model
The MMS container is ML/DL framework agnostic. Write models in a ML/DL framework of your choice and bring it to Amazon SageMaker with an MMS BYO container to get the features of scalability and performance. We show you how to prepare a PaddlePaddle model in the following sections.
Preparing model artifacts
Use the Understand Sentiment example that is available and published in the examples section of the PaddlePaddle repository.
First, create a model following the instructions provided in the PaddlePaddle/book repository. Download the container and run the training using the notebook provided as part of the example. We used the Stacked Bidirectional LSTM network for training, and trained the model for 100 epochs. At the end of this training exercise, we got the following list of trained model artifacts.
These artifacts constitute a PaddlePaddle model.
Writing custom service code
You now have the model files required to host the model in production. To take this model into production with MMS, provide a custom service script that knows how to use these files. This script must also know how to pre-process the raw request coming into the server and how to post-process the responses coming out of the PaddlePaddle framework’s infer method.
Create a custom service file called
paddle_sentiment_analysis.py. Here, define a class called
PaddleSentimentAnalysis that contains methods to initialize the model and also defines pre-processing, post-processing, and inference methods. The skeleton of this file is as follows:
To understand the details of this custom service file, see paddle_sentiment_analysis.py. This custom service code file allows you to tell MMS what the lifecycle of each inference request should look like. It also defines how a trained model-artifact can initialize the PaddlePaddle framework.
Now that you have the trained model artifacts and the custom service file, create a model-archive that can be used to create your endpoint on Amazon SageMaker.
Creating a model-artifact file to be hosted on Amazon SageMaker
To load this model in Amazon SageMaker with an MMS BYO container, do the following:
- Create a MANIFEST file, which is used by MMS as a model’s metadata to load and run the model.
- Add the custom service script created earlier and the trained model-artifacts, along with the MANIFEST file, to a .tar.gz file.
Use the model-archiver tool to do this. Before you use the tool to create a .tar.gz artifact, put all the model artifacts in a separate folder, including the custom service script mentioned earlier. To ease this process, we have made all the artifacts available for you. Run the following commands:
Now you are ready to create the artifact required for hosting in Amazon SageMaker, using the model-archiver tool. The model-archiver tool is a part of the MMS toolkit. To get this tool, run these commands in a Python virtual environment because it provides isolation from the rest of the working environment.
The model-archiver tool comes preinstalled when you install mxnet-model-server.
This generates a file called sentiment.tar.gz in the /model-store directory. This file contains all the artifacts of the models and the manifest file.
You now have all the model artifacts that can be hosted on Amazon SageMaker. Next, look at how to build a container and bring it into Amazon SageMaker.
Building your own BYO container with MMS
In this section, you build your own MMS-based container (also known as a BYO container) that can be hosted in Amazon SageMaker.
To help with this process, every released version of MMS comes with a corresponding MMS base CPU and GPU containers hosted on DockerHub, which can be hosted on Amazon SageMaker.
For this example, use a container tagged
awsdeeplearningteam/mxnet-model-server:base-cpu-py3.6. To host the model created in the earlier section, install the PaddlePaddle and numpy packages in the container. Create a Dockerfile that extends from the base MMS image and installs the Python packages. The artifacts that you downloaded earlier come with the sample Dockerfile necessary to install required packages:
Now that you have the Dockerfile that describes your BYO container, build it:
You have the BYO container with all of the model artifacts in it, and you’re ready to launch it in Amazon SageMaker.
Creating an Amazon SageMaker endpoint with the PaddlePaddle model
In this section, you create an Amazon SageMaker endpoint in the console using the artifacts created earlier. We also provide an interactive Jupyter Notebook example of creating an endpoint using the Amazon SageMaker Python SDK and AWS SDK for Python (Boto3). The notebook is available on the mxnet-model-server GitHub repository.
Before you create an Amazon SageMaker endpoint for your model, do some preparation:
- Upload the model archive sentiment.tar.gz created earlier to an Amazon S3 bucket. For this post, we uploaded it to an S3 bucket called paddle_paddle.
- Upload the container image created earlier, paddle-mms, to an Amazon ECR repository. For this post, we created an ECR repository called “paddle-mms” and uploaded image there.
Creating the Amazon SageMaker endpoint
Now that the model and container artifacts are uploaded to S3 and ECR, you can create the Amazon SageMaker endpoint. Complete the following steps:
- Create a model configuration.
- Create an endpoint configuration.
- Create a user endpoint.
- Test the endpoint.
Create a model configuration
First, create a model configuration.
- On the Amazon SageMaker console, choose Models, Create model.
- Provide values for Model name, IAM role, location of inference code image (or the ECR repository), and Location of model artifacts (which is the S3 bucket where the model artifact was uploaded).
- Choose Create Model.
Create endpoint configuration
After you create the model configuration, create an endpoint configuration.
- In the left navigation pane, choose Endpoint Configurations, Create endpoint configuration.
- Give an endpoint configuration name, choose Add model, and add the model that we created earlier. Then choose create endpoint configuration.
Now we go to the final step, which is creating endpoint for users to send the inference requests to.
Create user endpoint
- In the left navigation pane, choose Endpoints, Create endpoint.
- For Endpoint name, enter a value such as sentiment and select the endpoint configuration that you created earlier.
- Choose Select endpoint configuration, Create endpoint.
You have created an endpoint called “sentiment” on Amazon SageMaker with an MMS BYO container to host a model built with the PaddlePaddle DL framework.
Now test this endpoint and make sure that it can indeed serve inference requests.
Testing the endpoint
Create a simple test client using the Boto3 library. Here is a small test script that sends a payload to the Amazon SageMaker endpoint and retrieves its response:
The corresponding output from running this script is as follows:
In this post, we showed you how to build and host a PaddlePaddle model on Amazon SageMaker using an MMS BYO container. This flow can be reused with minor modifications to build BYO containers serving inference traffic on Amazon SageMaker endpoints with MMS for models built using many ML/DL frameworks, not just PaddlePaddle.
For a more interactive example to deploy the above PaddlePaddle model into Amazon SageMaker using MMS, see Amazon SageMaker Examples. To learn more about the MMS project, see the mxnet-model-server GitHub repository.
About the Authors
Vamshidhar Dantu is a Software Developer with AWS Deep Learning. He focuses on building scalable and easily deployable deep learning systems. In his spare time, he enjoy spending time with family and playing badminton.
Denis Davydenko is an Engineering Manager with AWS Deep Learning. He focuses on building Deep Learning tools that enable developers and scientists to build intelligent applications. In his spare time he enjoys spending time with his family, playing poker and video games.