{"id":3577,"date":"2021-10-01T13:59:00","date_gmt":"2021-10-01T11:59:00","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=3577"},"modified":"2021-10-04T09:55:02","modified_gmt":"2021-10-04T07:55:02","slug":"mlops-essentials-four-pillars-for-machine-learning-operations-on-aws","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/mlops-essentials-four-pillars-for-machine-learning-operations-on-aws\/","title":{"rendered":"MLOps essentials: four pillars for Machine Learning Operations on AWS"},"content":{"rendered":"\n
When we approach modern Machine Learning problems in an AWS environment, there is more than traditional data preparation, model training, and final inferences to consider. Also, pure computing power is not the only concern we must deal with in creating an ML solution.<\/p>\n\n\n\n
There is a substantial difference<\/strong> between creating and testing a Machine Learning model<\/strong> inside a Jupyter Notebook locally and releasing it on a production infrastructure capable of generating business value. <\/p>\n\n\n\n The complexities of going live with a Machine Learning workflow in the Cloud are called a deployment gap<\/a> and we will see together through this article how to tackle it by combining speed and agility in modeling and training with criteria of solidity, scalability, and resilience required by production environments.<\/p>\n\n\n\n The procedure we\u2019ll dive into is similar to what happened with the DevOps model for “traditional” software development, and the MLOps paradigm, this is how we call it, is commonly proposed as “an end-to-end process to design, create and manage Machine Learning applications in a reproducible, testable and evolutionary way<\/em><\/a>“.<\/p>\n\n\n\n So as we will guide you through the following paragraphs, we will dive deep into the reasons and principles behind the MLOps paradigm and how it easily relates to the AWS ecosystem and the best practices of the AWS Well-Architected Framework.<\/p>\n\n\n\n Let\u2019s start!<\/p>\n\n\n\n As said before, Machine Learning workloads can be essentially seen as complex pieces of software, so we can still apply “traditional” software practices. Nonetheless, due to its experimental nature, Machine Learning brings to the table some essential differences<\/strong>, which require a lifecycle management paradigm tailored to their needs. <\/p>\n\n\n\n These differences occur at all the various steps of a workload and contribute significantly to the deployment gap we talked about, so a description is obliged:<\/p>\n\n\n\n Managing code in Machine Learning appliances is a complex matter. Let\u2019s see why!<\/p>\n\n\n\n Collaboration on model experiments among data scientists<\/strong> is not as easy as sharing traditional code files: Jupyter Notebooks allow for writing and executing code, resulting in more difficult git chores to keep code synchronized between users, with frequent merge conflicts<\/strong>.<\/p>\n\n\n\n Developers must code on different sub-projects: ETL jobs<\/strong>, model logic<\/strong>, training and validation<\/strong>, inference logic<\/strong>, and Infrastructure-as-Code templates<\/strong>. All of these separate projects must be centrally managed and adequately versioned!<\/p>\n\n\n\n For modern software applications, there are many consolidated Version Control procedures<\/strong> like conventional commit<\/a>, feature branching, squash and rebase<\/a>, and continuous integration<\/a>. <\/p>\n\n\n\n These techniques however, are not always applicable to Jupyter Notebooks<\/strong> since, as stated before, they are not simple text files.<\/p>\n\n\n\n Data scientists need to try many combinations of datasets, features, modeling techniques, algorithms, and parameter configurations to find the solution which best extracts business value<\/strong>. <\/p>\n\n\n\n The key point is finding ways to track both succeeded<\/strong> and failed<\/strong> experiments while maintaining reproducibility<\/strong> and code reusability<\/strong>. Pursuing this goal means having instruments to allow for quick rollbacks and efficient monitoring of results, better if with visual tools.<\/p>\n\n\n\n Testing a Machine Learning workload is more complex<\/strong> than testing traditional software. <\/p>\n\n\n\n Dataset requires continuous validation<\/strong>. Models developed by data scientists require ongoing<\/strong> quality evaluation<\/strong>, training validation, and performance checks<\/strong>. <\/p>\n\n\n\nWhy do we need MLOps?<\/h2>\n\n\n\n
Code<\/h3>\n\n\n\n
Development<\/h3>\n\n\n\n
Testing<\/h3>\n\n\n\n