{"id":2438,"date":"2021-01-22T10:23:28","date_gmt":"2021-01-22T09:23:28","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=2438"},"modified":"2021-03-18T16:27:08","modified_gmt":"2021-03-18T15:27:08","slug":"a-clustering-process-with-sagemaker-experiments-a-real-world-use-case","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/","title":{"rendered":"A clustering process with SageMaker Experiments: a real-world use case"},"content":{"rendered":"\n

The development of an efficient Machine Learning<\/strong> model is a highly iterative process with continuous feedback loops from previous trials and tests, more akin to a scientific experiment than to a software development project. Data Scientists usually train lots of different models every day trying to get to the most robust model for the scenario they are working on and keeping track of all the tests carried out is often a daunting task even in a single person project.<\/p>\n\n\n\n

Amazon offers several tools to help Data Scientists to find the correct set of parameters for their models. Automatic Model Tuning and Amazon SageMaker Autopilot help in exploring quickly and automatically large sections of the phase space, however, these services also contribute to the neverending growth of training jobs parameters and artifacts.<\/p>\n\n\n\n

If the project is big enough, multiple engineers are usually involved. Therefore keeping the project as structured as possible, as well as finding ways of sharing all datasets, notebooks, hyperparameters, and results is crucial for success.<\/p>\n\n\n\n

The main components of a machine learning project are:<\/p>\n\n\n\n