{"id":2438,"date":"2021-01-22T10:23:28","date_gmt":"2021-01-22T09:23:28","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=2438"},"modified":"2021-03-18T16:27:08","modified_gmt":"2021-03-18T15:27:08","slug":"a-clustering-process-with-sagemaker-experiments-a-real-world-use-case","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/","title":{"rendered":"A clustering process with SageMaker Experiments: a real-world use case"},"content":{"rendered":"\n<p>The development of an efficient <strong>Machine Learning<\/strong> model is a highly iterative process with continuous feedback loops from previous trials and tests, more akin to a scientific experiment than to a software development project. Data Scientists usually train lots of different models every day trying to get to the most robust model for the scenario they are working on and keeping track of all the tests carried out is often a daunting task even in a single person project.<\/p>\n\n\n\n<p>Amazon offers several tools to help Data Scientists to find the correct set of parameters for their models. Automatic Model Tuning and Amazon SageMaker Autopilot help in exploring quickly and automatically large sections of the phase space, however, these services also contribute to the neverending growth of training jobs parameters and artifacts.<\/p>\n\n\n\n<p>If the project is big enough, multiple engineers are usually involved. Therefore keeping the project as structured as possible, as well as finding ways of sharing all datasets, notebooks, hyperparameters, and results is crucial for success.<\/p>\n\n\n\n<p>The main components of a machine learning project are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Parameters: hyperparameters, model architectures, training algorithms<\/li><li>Jobs: pre-processing job, training job, post-processing job<\/li><li>Artifacts: training scripts, dependencies, datasets, and trained models<\/li><li>Metrics<\/li><li>Metadata: job parameters, artifact locations, plots, and so on<\/li><\/ul>\n\n\n\n<p>Each team member should always have a clear understanding of which is the latest version of the various components and be able to quickly lookup results and artifacts from previous runs and trials.<\/p>\n\n\n\n<p>To help data scientists with these ML project structuring and management tasks, Amazon released a new service: SageMaker Experiments. This new Amazon Sagemaker component aims to solve this management challenge by giving a unified view for such parameters, training runs, and output artifacts. This new Amazon Sagemaker component aims to solve this management challenge by giving a unified view for such parameters, training runs, and output artifacts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Customer Clustering<\/h2>\n\n\n\n<p>In this article, we present a real world case where we used SageMaker Experiments extensively.<\/p>\n\n\n\n<p>The project dealt with the clustering of a sparse customer dataset containing several millions of customers in order to understand their behavior. The structure of the dataset and the available features made the clustering algorithm choice, and hyperparameter tuning all but trivial. We tested several types of clustering algorithms (Kmeans, Gaussian mixture, DBSCAN) with different combinations of features. PCA and variable correlation were used to understand the relevant features for clustering.<\/p>\n\n\n\n<p>After several iterations, we found that the most stable result could be found using DBSCAN after dimensionality reduction with UMAP (Uniform Manifold Approximation and Projection). KNN analysis was used to find the optimal radius (eps) for DBSCAN.<\/p>\n\n\n\n<p>UMAP, DBSCAN, and KNN algorithms can be massively accelerated through GPU parallelization. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">SageMaker training on AWS GPU Instances<\/h2>\n\n\n\n<p>In order to carry out efficient clustering on our dataset, we decided to use the RapidsAI framework, which includes the CUDA-enabled GPU version of all the algorithms needed for our pipeline. AWS offers several options for GPU enabled Sagemaker ML instances. For our workload we selected an ml.p3.2xlarge for testing and exploration and ml.g4dn.2xlarge for model training.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Installing RapidsAI on ML instance<\/h2>\n\n\n\n<p>To install RapidsAI, we can start from the <a href=\"https:\/\/rapids.ai\/start.html#get-rapids\" target=\"_blank\" rel=\"noopener\">RapidsAI site<\/a>: Here you can find the installation commands for <strong>Conda Python3<\/strong>, which is what will be needed for our testing Sagemaker instance:<\/p>\n\n\n\n<pre>conda create -n rapids-0.17 -c rapidsai -c nvidia -c conda-forge \\\n    -c defaults rapids-blazing=0.17 python=3.7 cudatoolkit=10.1\n<\/pre>\n\n\n\n<p>After that, you\u2019ll need to run the new Jupyter kernel so close and re-open the Notebook, then select \u201cKernel\u201d -&gt; \u201cChange Kernel\u201d; choose the new RapidsAI kernel you\u2019ve just installed.<\/p>\n\n\n\n<p><strong>Note: <\/strong><em>Note: every time you stop a Sagemaker instance you lose the custom kernel because it is not part of the standard image shipped with the instance, so you need to install it again with the procedure above. <\/em> <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DBSCAN<\/h2>\n\n\n\n<p>Typical partitioning methods like K-means or PAM clustering are suitable for finding spherical-shaped clusters or, more in general, convex clusters. They usually work well for compact, well-separated clusters with similar sizes. They are also quite sensitive to the presence of strong anisotropy (noise and outliers) in the data.  <\/p>\n\n\n\n<p>Even if good enough for many clustering tasks they are often not the best choice for several real-world situations, in particular they\u2019ll perform terribly bad for:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>clusters of arbitrary shapes such as those shown in the figure below.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"672\" height=\"672\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T222642.630.png\" alt=\"clustering Machine Learning\" class=\"wp-image-2405\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T222642.630.png 672w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T222642.630-300x300.png 300w\" sizes=\"auto, (max-width: 672px) 100vw, 672px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\"><li>many outliers and noise.<\/li><li>Clusters with very different sizes and shapes.<\/li><\/ul>\n\n\n\n<p>So here comes DBSCAN.<\/p>\n\n\n\n<p><strong>Density-based spatial clustering of applications with noise<\/strong> or in short DBSCAN is a well-known data clustering algorithm that is commonly used in data mining and machine learning.<\/p>\n\n\n\n<p>Based on a dataset of points, DBSCAN groups together those that are close to each other based on distance measurement, typically Euclidean distance, with a threshold based on the minimum number of points. It also marks as outliers those points that are particularly sparse, a.k.a in low-density regions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Parameters:<\/h2>\n\n\n\n<p>DBSCAN algorithm usually requires 2 main parameters:<\/p>\n\n\n\n<p><strong>eps<\/strong>: the <strong>local<\/strong> radius for <em>expanding<\/em> clusters. Think of it as a step size &#8211; DBSCAN never takes a step larger than this, and it specifies how close points should be to each other to be considered a part of a cluster. <\/p>\n\n\n\n<p><strong>minPoints:<\/strong> The minimum number of points required to form a dense region (minPts in radius). This is usually strongly related to the minimum cluster size. For example, a value of 100, means that we need at least 100 points to define a cluster if the clusters have well-defined borders and the dataset does not contain random noise.<\/p>\n\n\n\n<p>The parameter estimation is a problem common to every ML task. To choose good values for eps and minPoints some knowledge of the shape of the dataset is usually necessary. Here follow a few general considerations about how to choose reasonable initial eps and minPoints values.<\/p>\n\n\n\n<p><strong>eps<\/strong>: if the eps value chosen is too small, a large part of the data will not be clustered. It will be considered outliers because it doesn&#8217;t satisfy the number of points to create a dense region. On the other hand, if the value that was chosen is too high, clusters will merge and the majority of objects will be in the same cluster. The eps should be chosen based on the average distance between points of the dataset (we can use a k-distance graph to find it), but in general small eps values are preferable. It should be noted that eps is affected by the <strong>dimensionality curse<\/strong> so the more feature you have the bigger will eps be, in fact, the average Euclidean distance in a normalized dataset is proportional to the dimensionality.<\/p>\n\n\n\n<p><strong>minPoints<\/strong>: As a general rule, a minimum minPoints can be derived from a number of dimensions (D) in the data set, as minPoints \u2265 D + 1. Larger values are usually better for data sets with noise and will form more significant clusters. The minimum value for the minPoints must be 3, but the larger the data set, the larger the minPoints value that should be chosen.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Dimensionality Reduction<\/strong><\/h2>\n\n\n\n<p>In order to avoid the dimensionality curse pitfalls and to have more stable and efficient clustering runs a common technique is to use a dimensionality reduction algorithm. In our case, we used correlation analysis for feature selection and then PCA to find which features have more significant impacts on principal components, thus being good candidates as features in the final dataset we\u2019ve used to train our model.<\/p>\n\n\n\n<p>However, directly using PCA for dimensionality reduction led to bad results: the number of principal components required to explain a significant percentage of the dataset variance was quite high and all clustering algorithms performed badly.&nbsp;<\/p>\n\n\n\n<p>The suboptimal PCA performance forced us to look elsewhere for dimensionality reduction. Other commonly used algorithms for dimensionality reduction are t-SNE and UMAP.&nbsp;<\/p>\n\n\n\n<p>Both algorithms perform very slowly in the single-core Scikit-learn implementation on our big dataset, however, things changed dramatically with the RapidsAI GPU CUDA implementation. With the help of the GPU, we were able to carry out dim reduction using the 2 methods in just a few minutes.&nbsp;<\/p>\n\n\n\n<p>However, UMAP is faster, and, despite while being a novel algorithm, is generally considered to be preferable to t-SNE since it does a good job in preserving both the local and the global structure of the dataset (e.g. see <a href=\"https:\/\/towardsdatascience.com\/how-exactly-umap-works-13e3040e1668\">https:\/\/towardsdatascience.com\/how-exactly-umap-works-13e3040e1668<\/a>).&nbsp;<\/p>\n\n\n\n<p>Furthermore, UMAP does not use <strong>random initialization<\/strong>, so the results are consistent between runs. By reducing the dataset to 3 dimensions, individual clusters could be easily distinguished at glance, and the UMAP trustiness score was very high.<\/p>\n\n\n\n<p>UMAP has a number of important hyper-parameters that influence its performance. These hyper-parameters are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The dimensionality of the target embedding<\/li><li>The number of neighbors <strong>k<\/strong>. Choosing a small value means the interpretation will be very local but able to capture fine detail structure while choosing a large value means that the estimation will be based on larger regions, and thus, will miss some of the fine detail structure.<\/li><li>The minimum allowed distance between points in the embedding space. Lower values of this minimum distance will more accurately capture the true manifold structure but may lead to dense clouds that make visualization difficult.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"694\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223009.244-1024x694.png\" alt=\"Esempio di dataset dopo riduzione UMAP\" class=\"wp-image-2407\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223009.244-1024x694.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223009.244-400x271.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223009.244-768x521.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223009.244-1536x1042.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223009.244.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Example UMAP reduced dataset output<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Metrics Objective: Silhouette score<\/h2>\n\n\n\n<p>Measuring how well a ML algorithm is performing is always an essential step in a ML workflow.&nbsp;<\/p>\n\n\n\n<p>While in a supervised machine learning project the way to do so is usually straightforward (train the model on a subset of the data set and test it on another subset), for an unsupervised ML flow, the objective evaluation of the performance is more tricky and even more if the existence and number of clusters are not known nor knowable a priori.&nbsp;<\/p>\n\n\n\n<p>A common way to solve this conundrum is to select one or more performance objective metrics, targets of the clustering output, and check if a given run meets the target performance.&nbsp;<\/p>\n\n\n\n<p>A very general, robust, and widely used metric is the Silhouette score, which is the metric we selected to guide our clustering performance evaluation.<\/p>\n\n\n\n<p>Silhouette score can be used to study the distance between some resulting clusters in a clustering problem; it displays a measure of how close each point in one cluster is to points in the neighboring clusters and thus provides a way to verify parameters like the number of clusters visually.&nbsp;<\/p>\n\n\n\n<p>This measure has a range between <strong>-1<\/strong> and <strong>1<\/strong>.<\/p>\n\n\n\n<p>Silhouette coefficients near 1 indicate that the sample is far away from the neighboring clusters, thus good. A value of 0, instead, says that the point is near to the decision boundary between two near clusters. Finally, negative values indicate that samples might have been assigned to the wrong cluster.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"398\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223332.929-1024x398.png\" alt=\"silhouette score\" class=\"wp-image-2409\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223332.929-1024x398.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223332.929-400x156.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223332.929-768x299.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223332.929-1536x597.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223332.929.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>We used this score to verify the quality of our customers\u2019 clusters and to evaluate the performance of different clustering algorithms and sets of hyperparameters.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Amazon SageMaker Experiments: an overview<\/h2>\n\n\n\n<p>Completing the clustering workflow took several hundreds of iterations with different data cleaning, heuristics, feature selection, clustering algorithms.<\/p>\n\n\n\n<p>Amazon SageMaker Experiments helps you track iterations to ML models by capturing the input parameters, configurations, and results, and storing them as &#8220;experiments&#8221;. In SageMaker Studio you can browse active experiments, search for previous ones, review them alongside their results, and compare those results.<\/p>\n\n\n\n<p>The goal of SageMaker Experiments &#8211; with its Python API &#8211; is to make it as simple as possible to create those experiments, populate them with trials, and run analytics across trials and experiments.<\/p>\n\n\n\n<p>You can run complex queries to quickly find the past trial you\u2019re looking for. You can also visualize real-time model leaderboards and metric charts.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"421\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223457.577-1024x421.png\" alt=\"sagemaker studio\" class=\"wp-image-2411\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223457.577-1024x421.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223457.577-400x165.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223457.577-768x316.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223457.577.png 1045w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>In the course of this article, you\u2019ll be guided through SageMaker Experiment\u2019s features and setup by following the real world scenario we faced.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setup Users for SageMaker Studio<\/h2>\n\n\n\n<p>In order for us to use SageMaker Experiments, we first need to get access to Sagemaker Studio, to do so we created some users &#8211; one for each collaborator &#8211; in AWS Console.<\/p>\n\n\n\n<p>You need to have the right permissions to access SageMaker services through the console, if you have, you just need to click on the left sidebar of the service page and access SageMaker Studio property panel. There you\u2019ll find the \u201cAdd user\u201d button to create a new user.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"102\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223714.020-1024x102.png\" alt=\"Setup Users for SageMaker Studio\" class=\"wp-image-2413\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223714.020-1024x102.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223714.020-400x40.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223714.020-768x77.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223714.020-1536x154.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223714.020.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The creation of a user is straightforward: just give a name and a valid IAM role like in the example below and then you\u2019re good to go. See the image below for clarity:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"638\" height=\"568\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223841.425.png\" alt=\"users and iam role\" class=\"wp-image-2415\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223841.425.png 638w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T223841.425-337x300.png 337w\" sizes=\"auto, (max-width: 638px) 100vw, 638px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This is a simple yet crucial point: when managing a project, complex like a Machine Learning one, constant checks with the customer are extremely important. Giving him the ability to easily assert the results of every experiment, makes it easier to have feedback and significantly improves the chances of a satisfactory result for the ML project.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Some Key concepts<\/h2>\n\n\n\n<p>Before starting to understand how Experiments works, let\u2019s define some key concepts you must know:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Experiment<\/strong>: a collection of Trials, which is typically intended as a group of related Training Jobs.&nbsp;<\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Trial<\/strong>: a collection of training steps involved in a single training job.&nbsp;<\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Training steps<\/strong>: these are typically what compose the operative part of a Machine Learning Pipeline. They usually include preprocessing, training, model evaluation, and so on.&nbsp;<\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Metadata<\/strong>: extra information for inputs (e.g. algorithm, parameters, data sets) and outputs (e.g. models, checkpoints, metrics). These<strong> can be included in each trial<\/strong> to give some more hints, and can also be used to download useful charts or documents.<\/li><\/ul>\n\n\n\n<p>The goal of SageMaker Experiments is &#8211; of course &#8211; to make it as simple as possible to access and manipulate experiments, populate them with trials, and moreover doing analytics over trials and experiments to make ponderate decisions.&nbsp;<\/p>\n\n\n\n<p>To help with this task, AWS offers a Python SDK &#8211; containing logging and analytics APIs &#8211; to integrate Experiments in notebooks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How to create Experiments in notebooks<\/h2>\n\n\n\n<p>Before starting to describe the process in details, just a clarification: we used the most elastic approach in dealing with Machine Learning problems with SageMaker, which is <strong>bring your own container<\/strong>: that means we define our ML code in <strong>scripts<\/strong> passed to an <strong>Estimator<\/strong>, which runs on <strong>custom images pushed to AWS ECR<\/strong>.&nbsp;<\/p>\n\n\n\n<p>Why do this extra work, instead of using precooked material from AWS? Because we needed <strong>extra flexibility in dealing with our problem<\/strong>, and in general &#8211; you\u2019ll see that &#8211; apart from extreme rare cases, you\u2019ll probably end up doing the same thing with your use-case.&nbsp;<\/p>\n\n\n\n<p>Real world scenarios involving companies have <strong>very specific conditions<\/strong> that can be addressed with higher precision, only when the <strong>algorithms and models are tailored and fine-tuned on that specific problem<\/strong>.<\/p>\n\n\n\n<p>In Jupyter Notebooks, users can add automatisms using SageMaker API; when defining an <a href=\"https:\/\/sagemaker.readthedocs.io\/en\/stable\/api\/training\/estimators.html\">Estimator<\/a> for evaluating a model, it is now possible to add an Experiment definition, this way we are basically <strong>linking the operation to a tuple in the Experiment console<\/strong> in SageMaker Studio.&nbsp;<\/p>\n\n\n\n<p>In our clustering process this capability was important, because we had the <strong>explicit need to give weekly updates<\/strong>, on how our models were performing. Usually this is, as said, a good practice to help create a clear understanding of the problem with the client, and in general, to avoid confusion.<\/p>\n\n\n\n<p>Before passing an experiment to an estimator, as stated by the official AWS documentation, we need to generate it from scratch; in a new Jupyter notebook cell we write these lines:<\/p>\n\n\n\n<pre>import sagemaker\nfrom smexperiments.experiment import Experiment\nfrom sagemaker import get_execution_role\nfrom sagemaker.session import Session\n\nexperiment_name=\"my-TestExperiment\"\n\nrole = get_execution_role()\nsession=Session()\nsm_client=session.boto_session.client('sagemaker')\n\nexperiment = None\nfor exp in  Experiment.list():\n    if exp.experiment_name==experiment_name:\n        experiment = Experiment.load(experiment_name=experiment_name,sagemaker_boto_client=sm_client)\n        print(f\"Experiment {experiment_name} is loaded!\")\n        break;\n        \nif experiment == None:\n    experiment = Experiment.create(\n        experiment_name = experiment_name,\n        description = \"My Test Experiment\",\n        tags = [{'Key': 'project', 'Value': 'customer-segmentation'}]\n    )\n    print(f\"Experiment {experiment_name} is created!\")\n<\/pre>\n\n\n\n<p>In these lines of code we are describing a new experiment and not only: if the experiment already exists in the experiment list, which depends on the credentials of the user logged on Jupyter, that experiment is loaded, instead of creating a new one.<\/p>\n\n\n\n<p>Why did we do this? Because we wanted to divide experiments by categories, then use trials to group different training jobs. This way you can keep things cleaner.&nbsp;<\/p>\n\n\n\n<p>An experiment can then be added to an Estimator, but before doing so, <strong>Trials<\/strong> must be <strong>created and added to the experiment itself<\/strong> (that\u2019s why we can also load an existing experiment).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Trials<\/h2>\n\n\n\n<p>Trials are groups in which every training occurrence is kept. They are defined in an experiment and can be used to divide and keep different training jobs, with different sets of parameters, separated and clean.&nbsp;<\/p>\n\n\n\n<p>We did this because we tried a great number of hyperparameters sets in order to fine tune the models, partly also because we had a very sparse initial dataset which was very difficult to analyze in terms of clustering.<\/p>\n\n\n\n<p>To add a Trial to a SageMaker Experiment, here is an example code to use in a cell below the one with the Experiment initialization.<\/p>\n\n\n\n<pre>from smexperiments.trial import Trial\nfrom smexperiments.trial_component import TrialComponent\nfrom smexperiments.tracker import Tracker\n\nimport time\nfrom time import strftime\n\ntrial_create_date = strftime(\"%Y-%m-%d-%H-%M-%S\")\ntrial_name =\"test-trial-{}\".format(trial_create_date)\n\ntrial = None\ntry:\n    trial = Trial.load(trial_name, sagemaker_boto_client=sm_client)\n    print(f\"Trial {trial_name} is loaded!\")\nexcept:\n    trial = Trial.create(trial_name = trial_name,\n                         experiment_name = experiment.experiment_name,\n                         tags = [{'Key': 'my-experiments', 'Value': 'trial 1'}])\n    print(f\"Trial {trial_name} is created!\")\n<\/pre>\n\n\n\n<p>In this case &#8211; to be able to distinguish among all the trials&nbsp; &#8211; we generate the trial name dynamically, which is a common pattern; you can make it depending on the category of the trial or on the parameters involved. Just be aware of the trial name characters limit (120 characters), so don\u2019t make it too long and avoid special characters as they are not permitted.<\/p>\n\n\n\n<p>Now that we have added a Trial to an experiment it is possible to add it to an estimator to make it available from the SageMaker Studio IDE.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Estimators<\/h2>\n\n\n\n<p>Estimators handle end-to-end Amazon SageMaker training and deployment tasks. It\u2019s a Python class that defines all the characteristics of a training. To add Experiment support just add an extra line of code to an estimator\u2019s definition like in the example below:<\/p>\n\n\n\n<pre>estimator.fit(\ninputs={\"train\":sagemaker.session.s3_input(s3_data=data_location)},\n       wait=True,\n       logs='All',\n       experiment_config={\n             \"ExperimentName\":experiment.experiment_name,\n             \"TrialName\" : trial.trial_name,\n             \"TrialComponentDisplayName\" : \"Training\",\n       }\n)\n<\/pre>\n\n\n\n<p>Basically, when we run the .fit() method to start a training job, SageMaker recognizes the <strong>experiment_config<\/strong> parameter and links that particular job to the Sagemaker Studio Experiment dashboard.<\/p>\n\n\n\n<p>This demonstrates how simple it is to create an experiment, but you also need to define a <strong>Tracker<\/strong>, in order to actually push information to the Experiment Dashboard.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Log information to SageMaker Experiment Dashboard<\/h2>\n\n\n\n<p>To effectively push information to the SageMaker Studio Dashboard you need to setup a Tracker. A tracker is <strong>basically a logger <\/strong>which takes different inputs and sends them to specific tabs of the Experiment console, depending on the method used.<\/p>\n\n\n\n<p>Thanks to the tracker and some predefined logs offered by SageMaker Experiments, you can end up with a comprehensive description of your trial like in the image below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"330\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224431.486-1024x330.png\" alt=\"trial description\" class=\"wp-image-2417\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224431.486-1024x330.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224431.486-400x129.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224431.486-768x248.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224431.486-1536x495.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224431.486.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Clicking on the \u201cDescribe Trial Component\u201d tab, you can choose any column headings to see information about each trial component, in particular:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Metrics<\/strong> \u2013 Metrics that are logged by a Tracker during a trial run. Will also contain your <strong>custom inputs<\/strong>.<\/li><li><strong>Parameters<\/strong> \u2013 Hyperparameter values and instance information.<\/li><li><strong>Artifacts<\/strong> \u2013 Amazon <strong>S3 locations for the input dataset and the output model<\/strong>. Will also contain your custom outputs.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Tracker<\/h2>\n\n\n\n<p>A Tracker can &#8211; and typically must &#8211; be <strong>added to the estimator script<\/strong> which contains <strong>your Machine Learning logic<\/strong>. After defining the entry point parameters (i.e. the args for your script), you can load a Tracker this way:<\/p>\n\n\n\n<pre># Loading Sagemaker Experiments Training Tracker\ntracker=tracker.Tracker.load()\n<\/pre>\n\n\n\n<p>After this line of code it is now possible to start sending information towards SageMaker Studio. In our case we used 3 specific tracker methods:<\/p>\n\n\n\n<pre>tracker.log_parameters({ \"param1\": NUM, \"param2\": NUM, ...})<\/pre>\n\n\n\n<p>Which is used to send the <strong>extra parameters<\/strong> to the \u201cParameters\u201d tab in the dashboard of that particular trial.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"308\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224845.463-1024x308.png\" alt=\"extra parameters\" class=\"wp-image-2419\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224845.463-1024x308.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224845.463-400x120.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224845.463-768x231.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224845.463-1536x462.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T224845.463.png 1542w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<pre>tracker.log_input(\u201ckey\u201d, \u201cvalue\u201d, media_type=TYPE)<\/pre>\n\n\n\n<p>Which is used to <strong>log as many input parameters as you want<\/strong>, also specifying the <strong>media type <\/strong>to allow previewing them &#8211; if possible &#8211; in the browser. These parameters will be added to the \u201cMetrics\u201d tab:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"251\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225052.621-1024x251.png\" alt=\"metrics\" class=\"wp-image-2421\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225052.621-1024x251.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225052.621-400x98.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225052.621-768x188.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225052.621.png 1410w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<pre>tracker.log_output(name=\"Some link name\", media_type=\"s3\/uri\", value=s3_uri)<\/pre>\n\nIs used to send specific outputs to the \u201cArtifacts\u201d tab which already contains Amazon S3 bucket storage locations for the input dataset and the output model.\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"669\" height=\"1024\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225212.631-669x1024.png\" alt=\"artifacts\" class=\"wp-image-2423\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225212.631-669x1024.png 669w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225212.631-196x300.png 196w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225212.631.png 696w\" sizes=\"auto, (max-width: 669px) 100vw, 669px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Finally, always remember to <strong>close a tracker before ending the script<\/strong>. <\/p>\n\n\n\n<pre>tracker.close()<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Compare Trials on SageMaker Studio Dashboard<\/h2>\n\n\n\n<p>As stated also by AWS <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/experiments-view-compare.html\">documentation<\/a>, you can compare experiments, trials, and trial components by opening them in the <strong>Studio Leaderboard<\/strong>. In this screen you can do the following actions:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>View information about entities<\/li><li>Compare entities<\/li><li>Stop a training job<\/li><li>Deploy a model<\/li><\/ul>\n\n\n\n<p>When you have one or more experiments you can open the <strong>Leaderboard<\/strong> to compare the results. Choose the experiments or trials that you want to compare; right-click on one of the elements, and then click on \u201cOpen in trial component list<strong>\u201d<\/strong>. The Leaderboard will open and you\u2019ll be presented with a list of associated Experiments entities as shown in the image below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"692\" height=\"357\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225312.756.png\" alt=\"leaderboard\" class=\"wp-image-2425\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225312.756.png 692w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/unnamed-2021-01-21T225312.756-400x206.png 400w\" sizes=\"auto, (max-width: 692px) 100vw, 692px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Caveats<\/h2>\n\n\n\n<p>From our experimentation we found that the <strong>Tracker can only be used when the Estimator is launched in online mode<\/strong>, i.e. specifying an instance type which is not \u201clocal\u201d. It means that you\u2019re running the training on a newly created instance, instead of running it directly on the instance where the notebook resides.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/experiments-view-compare.html\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/experiments-view-compare.html<\/a><\/li><li><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/experiments-mnist.html\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/experiments-mnist.html<\/a><\/li><li><a href=\"https:\/\/scikit-learn.org\/0.15\/auto_examples\/cluster\/plot_cluster_comparison.html\">https:\/\/scikit-learn.org\/0.15\/auto_examples\/cluster\/plot_cluster_comparison.html<\/a><\/li><li><a href=\"https:\/\/scikit-learn.org\/stable\/auto_examples\/cluster\/plot_kmeans_silhouette_analysis.html\">https:\/\/scikit-learn<\/a><\/li><li><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.DBSCAN.html\">https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.DBSCAN.html<\/a><a href=\"https:\/\/scikit-learn.org\/stable\/auto_examples\/cluster\/plot_kmeans_silhouette_analysis.html\">.org\/stable\/auto_examples\/cluster\/plot_kmeans_silhouette_analysis.html<\/a><\/li><li><a href=\"https:\/\/umap-learn.readthedocs.io\/en\/latest\/\">https:\/\/umap-learn.readthedocs.io\/en\/latest\/<\/a><\/li><li><a href=\"https:\/\/sagemaker.readthedocs.io\/en\/stable\/overview.html\">https:\/\/sagemaker.readthedocs.io\/en\/stable\/overview.html<\/a><\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Let\u2019s sum-up!<\/h2>\n\n\n\n<p>In this article, we have seen how we leveraged the new capabilities offered by Amazon SageMaker Experiments to smother the clustering of a remarkable amount of data. In our journey, we used these new features to better share results in our team and with the client, and to coordinate the effort of the engineers involved.<\/p>\n\n\n\n<p>We didn\u2019t go too much in depth with the characteristics of SageMaker Experiments, but we shared the most important traits as we would like to encourage the reader in trying and playing by himself.<\/p>\n\n\n\n<p>Along the way, we\u2019ve also seen how to use two interesting clustering algorithms: DBSCAN and UMAP. These are better suited for finding groups of similar data when the original dataset is sparse and inhomogeneous, but require a CUDA enabled instance.&nbsp;<\/p>\n\n\n\n<p>Finally, we have also briefly discussed the key concept of \u201cbring your own container\u201d in SageMaker, which we found, fundamental to complete our analysis: as we have said, when dealing with complex real life scenarios, pre-cooked algorithms offered by AWS SageMaker are not flexible enough.<\/p>\n\n\n\n<p>To conclude we hope you enjoyed the reading, for any questions feel free to contact us, to talk about Machine Learning or any other topic related to the amazing world of Cloud Computing!<\/p>\n\n\n\n<p>Stay tuned for more articles about Machine Learning! See you in 14 days on <a href=\"#Proud2beCloud\" target=\"_blank\" rel=\"noreferrer noopener\">#Proud2beCloud<\/a>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The development of an efficient Machine Learning model is a highly iterative process with continuous feedback loops from previous trials [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":2443,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[248],"tags":[436],"class_list":["post-2438","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml-en","tag-sagemaker-en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A clustering process with SageMaker Experiments: a real-world use case - Proud2beCloud Blog<\/title>\n<meta name=\"description\" content=\"How to use Amazon SageMaker Experiments to manage clustering for Machine Learning models. A real-world use case!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A clustering process with SageMaker Experiments\" \/>\n<meta property=\"og:description\" content=\"How to use Amazon SageMaker Experiments to manage clustering for Machine Learning models. A real-world use case!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/\" \/>\n<meta property=\"og:site_name\" content=\"Proud2beCloud Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-01-22T09:23:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-03-18T15:27:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/facebook-link-image.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Matteo Moroni\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"A clustering process with SageMaker Experiments\" \/>\n<meta name=\"twitter:description\" content=\"How to use Amazon SageMaker Experiments to manage clustering for Machine Learning models. A real-world use case!\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/twitter-shared-link.png\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Matteo Moroni\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"19 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/\",\"url\":\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/\",\"name\":\"A clustering process with SageMaker Experiments: a real-world use case - Proud2beCloud Blog\",\"isPartOf\":{\"@id\":\"https:\/\/blog.besharp.it\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/SageMaker-Experiment-Clustering_sageMaker-experiments_sageMaker-experiments.png\",\"datePublished\":\"2021-01-22T09:23:28+00:00\",\"dateModified\":\"2021-03-18T15:27:08+00:00\",\"author\":{\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc\"},\"description\":\"How to use Amazon SageMaker Experiments to manage clustering for Machine Learning models. A real-world use case!\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#primaryimage\",\"url\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/SageMaker-Experiment-Clustering_sageMaker-experiments_sageMaker-experiments.png\",\"contentUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/SageMaker-Experiment-Clustering_sageMaker-experiments_sageMaker-experiments.png\",\"width\":1668,\"height\":1251,\"caption\":\"A clustering process with SageMaker Experiments: a real-world use case!\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.besharp.it\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A clustering process with SageMaker Experiments: a real-world use case\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.besharp.it\/#website\",\"url\":\"https:\/\/blog.besharp.it\/\",\"name\":\"Proud2beCloud Blog\",\"description\":\"il blog di beSharp\",\"alternateName\":\"Proud2beCloud Blog\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.besharp.it\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc\",\"name\":\"Matteo Moroni\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g\",\"caption\":\"Matteo Moroni\"},\"description\":\"DevOps e Solution Architect di beSharp, mi occupo di sviluppare soluzioni Saas, Data Analysis, HPC e di progettare architetture non convenzionali a complessit\u00e0 divergente. Appassionato di informatica e fisica, da sempre lavoro nella prima e ho un PhD nella seconda. Parlare di tutto ci\u00f2 che \u00e8 tecnico e nerd mi rende felice!\",\"url\":\"https:\/\/blog.besharp.it\/author\/matteo-moroni\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A clustering process with SageMaker Experiments: a real-world use case - Proud2beCloud Blog","description":"How to use Amazon SageMaker Experiments to manage clustering for Machine Learning models. A real-world use case!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/","og_locale":"en_US","og_type":"article","og_title":"A clustering process with SageMaker Experiments","og_description":"How to use Amazon SageMaker Experiments to manage clustering for Machine Learning models. A real-world use case!","og_url":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/","og_site_name":"Proud2beCloud Blog","article_published_time":"2021-01-22T09:23:28+00:00","article_modified_time":"2021-03-18T15:27:08+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/facebook-link-image.png","type":"image\/png"}],"author":"Matteo Moroni","twitter_card":"summary_large_image","twitter_title":"A clustering process with SageMaker Experiments","twitter_description":"How to use Amazon SageMaker Experiments to manage clustering for Machine Learning models. A real-world use case!","twitter_image":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/twitter-shared-link.png","twitter_misc":{"Written by":"Matteo Moroni","Est. reading time":"19 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/","url":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/","name":"A clustering process with SageMaker Experiments: a real-world use case - Proud2beCloud Blog","isPartOf":{"@id":"https:\/\/blog.besharp.it\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#primaryimage"},"image":{"@id":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/SageMaker-Experiment-Clustering_sageMaker-experiments_sageMaker-experiments.png","datePublished":"2021-01-22T09:23:28+00:00","dateModified":"2021-03-18T15:27:08+00:00","author":{"@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc"},"description":"How to use Amazon SageMaker Experiments to manage clustering for Machine Learning models. A real-world use case!","breadcrumb":{"@id":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#primaryimage","url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/SageMaker-Experiment-Clustering_sageMaker-experiments_sageMaker-experiments.png","contentUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/01\/SageMaker-Experiment-Clustering_sageMaker-experiments_sageMaker-experiments.png","width":1668,"height":1251,"caption":"A clustering process with SageMaker Experiments: a real-world use case!"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.besharp.it\/a-clustering-process-with-sagemaker-experiments-a-real-world-use-case\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.besharp.it\/"},{"@type":"ListItem","position":2,"name":"A clustering process with SageMaker Experiments: a real-world use case"}]},{"@type":"WebSite","@id":"https:\/\/blog.besharp.it\/#website","url":"https:\/\/blog.besharp.it\/","name":"Proud2beCloud Blog","description":"il blog di beSharp","alternateName":"Proud2beCloud Blog","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.besharp.it\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc","name":"Matteo Moroni","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g","caption":"Matteo Moroni"},"description":"DevOps e Solution Architect di beSharp, mi occupo di sviluppare soluzioni Saas, Data Analysis, HPC e di progettare architetture non convenzionali a complessit\u00e0 divergente. Appassionato di informatica e fisica, da sempre lavoro nella prima e ho un PhD nella seconda. Parlare di tutto ci\u00f2 che \u00e8 tecnico e nerd mi rende felice!","url":"https:\/\/blog.besharp.it\/author\/matteo-moroni\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/2438","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/comments?post=2438"}],"version-history":[{"count":0,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/2438\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media\/2443"}],"wp:attachment":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media?parent=2438"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/categories?post=2438"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/tags?post=2438"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}