{"id":946,"date":"2019-09-20T14:08:40","date_gmt":"2019-09-20T12:08:40","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=946"},"modified":"2021-03-24T17:59:08","modified_gmt":"2021-03-24T16:59:08","slug":"machine-learning-on-aws-how-to-create-and-deploy-a-ml-backed-service-with-aws-sagemaker","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/machine-learning-on-aws-how-to-create-and-deploy-a-ml-backed-service-with-aws-sagemaker\/","title":{"rendered":"Machine Learning on AWS: How to create and deploy a ML backed service with AWS SageMaker"},"content":{"rendered":"
In the last decade, the way we deal with and manage information dramatically changed due to two main reasons: on the one hand, the cost of data storage is becoming lower and lower,<\/strong> mainly due to the broad adoption and spread of public cloud services; on the other, thanks to the ubiquitous use of ERPs, CRMs, IoT platforms, and other monitoring and profiling software, a huge amount of data has become available<\/strong> to companies, both about their internal processes and about customer preferences and behaviors. So, basically, we have the opportunity to deal with more and more data, with an always increasing data quality<\/strong>, at a fraction of the cost.<\/span><\/p>\n The availability of these big datasets to be analyzed and mined for information sparked a new interest in Artificial Intelligence<\/strong> and particularly Machine Learning\u00a0 (ML),<\/strong> which can be used to extract a predictive model from an existing dataset.<\/span><\/p>\n The rise of internet integrated devices (IoT) dramatically increased the rate at which data are created and stored and the availability of these data, combined with new ML techniques,\u00a0 in turn, gives rise to a plethora of novel possibilities which would have been unthinkable just a few years ago. For example it is now possible to know how customers are using the products, which mistakes or unintended operations they are doing, which parts of a device wear out first in a real-world situation, which components of an industrial machine are more likely to fail given usage time and sensor readings (predictive maintenance), understand automatically if a produced part is good or faulty based only on the images of the given component and on a huge collections of images of good and faulty components.<\/span><\/p>\n The ability to correctly extract, interpret and leverage the information<\/strong> contained in the collected data is thus a huge added value and a competitive advantage for the companies who undertake the often significant effort to develop it.\u00a0<\/span><\/p>\n Cloud providers, such as Amazon Web Services AWS,<\/strong> nowadays offers a wide range of Machine Learning centered services in order to meet the most common use cases of customers. On AWS currently, these ML backed services are available:<\/span><\/p>\n In addition to these completely managed services AWS also offers a more configurable, customizable and generic service: AWS SageMaker.<\/strong> In the second part of this article, we\u2019ll explain the structure of a SageMaker project, how to create a trivial inference model and how to deploy a SageMaker backed HTTP Machine Learning service.<\/strong><\/span><\/p>\n After opening the AWS SageMaker Console our attention is immediately drawn to the sidebar containing the list of the SageMaker components and configurations.<\/strong> As one can see the SageMaker service consists of four \u201csubservices\u201d:<\/span><\/p>\n It is also possible to deploy complete SageMaker projects directly from the AWS Marketplace to tackle specific tasks.\u00a0 The ML projects offered in the marketplace is quite comprehensive and it is advisable to look for a project which is capable of executing the needed task in the marketplace before starting the development of a new project.<\/span><\/p>\n When developing an ML service usually the first step is to create a Notebook instance<\/strong> using the AWS console. As shown in the screenshot it is possible to choose the name of the instance, the IAM role it will use and decide whether to put the instance in our VPC (in this case it is also possible to choose the security group and the subnet).<\/span><\/p>\n Even if we chose not to put the instance in our VPC,\u00a0 the Jupiter notebook can only be reached using a pre-assigned URL generated from our AWS credentials.<\/strong> Once we click on open Jupiter we are presented with a standard Jupyter interface:<\/span><\/p>\n AWS provides a series of examples in order to help data scientists grow confident with SageMaker which can be found in the SageMaker Examples tab. In this article, we will follow the Image classification example using the Caltech 256 Dataset.<\/strong> In order to start, we can just click on Use Image-classification-fulltraining.ipynb<\/em><\/strong><\/span><\/p>\n A normal Jupiter notebook is shown and we can follow the provided instructions. <\/span><\/p>\n First of all, you need to create an S3 bucket<\/strong> and we should check that the role we assigned to the Notebook instance has the permission to read and write<\/strong> in that bucket.\u00a0<\/span><\/p>\n Running the template you’ll then download the Caltech 256 dataset<\/strong> and prepare the training job of the image classification algorithm. The algorithm used in this example is the AWS Image Classification Algorithm<\/strong> which uses a convolutional neural network (ResNet). AWS provides a wide array of general algorithms already optimized for SageMaker (see AWS docs<\/a><\/span>) however, it is also possible to deploy a custom model to be trained and packaged in a docker container.<\/span><\/p>\n It is important to understand how to split the dataset into two parts: the first one for training<\/strong> and the second for the validation<\/strong> step. In this example each image class (ball, bathtub, Mars, riffle, etc) is divided into two sets: 60 images for training and the rest for validation.<\/span><\/p>\n Before starting the training it is essential to choose sensible Hyperparameters:<\/strong> a wrong choice will result in non-performing algorithms. The choice of these parameters is often a trial and error process and for this particular application, the most relevant parameters are the learning rate and the epochs number.<\/span><\/p>\n The learning rate<\/strong> is the length of the steepest descent algorithm vector: too high and you will overshoot the optimum, too low and the training will take forever. The epochs number<\/strong> is how many time the training dataset is passed forward and backward through the neural network: an epoch number which is too low will result in an underfitting of the data but also in a low training time while a too high epoch will result in an overfitting of data and very long training time. Both overfitting and underfitting are better to be avoided<\/strong> because the trained model will perform badly on validation data.<\/span><\/p>\n For example, using the default values provided by AWS in this example the training time is very low (just a few minutes) but the model will classify every image with something reddish and vaguely round in it as… Mars! (Not so precise :))<\/span><\/p>\n Once the training is complete it is time to deploy and test the trained model.<\/strong> First of all, we can create the model using the training result:<\/span><\/p>\n After this, it is time to choose a part of the dataset to be used for the testing<\/strong>: in this example, bathtubs<\/strong> were chosen:<\/p>\n Now we can create a batch job<\/strong> to feed all the test images to the model to be classified.<\/p>\n Creating a batch job will start an EC2<\/strong> of the chosen type to classify the images. Once the process is finished we can check the results: if only two epochs were used while training very few images will be classified correctly.<\/p>\n However, if we are satisfied with the result we can create an endpoint and thus deploy the model:<\/strong><\/p>\n Once the endpoint is created successfully we can send images to it through HTTP using the aws SDKs,<\/strong> for example using python boto3.<\/p>\n The command is:<\/strong><\/p>\n where the<\/p>\n is the image.<\/p>\n This endpoint can thus be used both by an application deployed on AWS (e.g. Lambda functions, docker Microservices running on Fargate, applications deployed on EC2s) and also by applications running on-premise or even directly by clients.<\/span><\/p>\n To conclude we presented a brief review of AWS Machine Learning services and in particular, we focused on how to deploy a simple ML application using SageMaker.<\/strong> If you are interested in this topic or have any questions contact us!<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":" In the last decade, the way we deal with and manage information dramatically changed due to two main reasons: on […]<\/p>\n","protected":false},"author":9,"featured_media":960,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[248],"tags":[322,300,324],"class_list":["post-946","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml-en","tag-ai-en","tag-how-to-en","tag-ml-en"],"yoast_head":"\n\n
SageMaker: Building Block<\/span><\/h2>\n
\n
The Creation of a Sample ML project<\/span><\/h2>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
<\/p>\n
model_name=\"DEMO-full-image-classification-model\" + time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\r\nprint(model_name)\r\ninfo = sage.describe_training_job(TrainingJobName=job_name)\r\nmodel_data = info['ModelArtifacts']['S3ModelArtifacts']\r\nprint(model_data)\r\n\r\nhosting_image = get_image_uri(boto3.Session().region_name, 'image-classification')\r\n\r\nprimary_container = {\r\n 'Image': hosting_image,\r\n 'ModelDataUrl': model_data,\r\n}\r\n\r\ncreate_model_response = sage.create_model(\r\n ModelName = model_name,\r\n ExecutionRoleArn = role,\r\n PrimaryContainer = primary_container)\r\n\r\nprint(create_model_response['ModelArn'])\r\n<\/pre>\n
batch_input = 's3:\/\/{}\/image-classification-full-training\/test\/'.format(bucket)\r\ntest_images = '\/tmp\/images\/008.bathtub'\r\n\r\n!aws s3 cp $test_images $batch_input --recursive --quiet\r\n<\/pre>\n
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\r\nbatch_job_name = \"image-classification-model\" + timestamp\r\nrequest = \\\r\n{\r\n \"TransformJobName\": batch_job_name,\r\n \"ModelName\": model_name,\r\n \"MaxConcurrentTransforms\": 16,\r\n \"MaxPayloadInMB\": 6,\r\n \"BatchStrategy\": \"SingleRecord\",\r\n \"TransformOutput\": {\r\n \"S3OutputPath\": 's3:\/\/{}\/{}\/output'.format(bucket, batch_job_name)\r\n },\r\n \"TransformInput\": {\r\n \"DataSource\": {\r\n \"S3DataSource\": {\r\n \"S3DataType\": \"S3Prefix\",\r\n \"S3Uri\": batch_input\r\n }\r\n },\r\n \"ContentType\": \"application\/x-image\",\r\n \"SplitType\": \"None\",\r\n \"CompressionType\": \"None\"\r\n },\r\n \"TransformResources\": {\r\n \"InstanceType\": \"ml.p2.xlarge\",\r\n \"InstanceCount\": 1\r\n }\r\n}\r\n\r\nsagemaker = boto3.client('sagemaker')\r\nsagemaker.create_transform_job(**request)\r\n\r\nprint(\"Created Transform job with name: \", batch_job_name)\r\n\r\nwhile(True):\r\n response = sagemaker.describe_transform_job(TransformJobName=batch_job_name)\r\n status = response['TransformJobStatus']\r\n if status == 'Completed':\r\n print(\"Transform job ended with status: \" + status)\r\n break\r\n if status == 'Failed':\r\n message = response['FailureReason']\r\n print('Transform failed with the following error: {}'.format(message))\r\n raise Exception('Transform job failed') \r\n time.sleep(30)\r\n<\/pre>\n
from time import gmtime, strftime\r\njob_name_prefix = \"test\"\r\n\r\ntimestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\r\nendpoint_config_name = job_name_prefix + '-epc-' + timestamp\r\nendpoint_config_response = sage.create_endpoint_config(\r\n EndpointConfigName = endpoint_config_name,\r\n ProductionVariants=[{\r\n 'InstanceType':'ml.m4.xlarge',\r\n 'InitialInstanceCount':1,\r\n 'ModelName':model_name,\r\n 'VariantName':'AllTraffic'}])\r\n\r\ntimestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\r\nendpoint_name = job_name_prefix + '-ep-' + timestamp\r\nprint('Endpoint name: {}'.format(endpoint_name))\r\n\r\nendpoint_params = {\r\n 'EndpointName': endpoint_name,\r\n 'EndpointConfigName': endpoint_config_name,\r\n}\r\nendpoint_response = sagemaker.create_endpoint(**endpoint_params)\r\n<\/pre>\n
runtime.invoke_endpoint(EndpointName=endpoint_name, \r\n ContentType='application\/x-image', \r\n Body=payload)\r\n<\/pre>\n
payload<\/pre>\n