{"id":3006,"date":"2021-04-16T13:59:00","date_gmt":"2021-04-16T11:59:00","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=3006"},"modified":"2023-03-29T15:33:26","modified_gmt":"2023-03-29T13:33:26","slug":"aws-glue-elastic-views-an-almost-no-code-etl-and-aggregation-framework","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/aws-glue-elastic-views-an-almost-no-code-etl-and-aggregation-framework\/","title":{"rendered":"AWS Glue Elastic Views! An almost no code ETL and Aggregation Framework"},"content":{"rendered":"\n

Introduction<\/h2>\n\n\n\n

ETL<\/strong> is a fundamental step of a Machine Learning process as it is the stepping stone on which all the dataset for the model definition is based. Because of that, data scientists and MLOps experts carefully plan jobs and pipelines to manage the extraction of data from databases<\/strong>, often of different natures, clean<\/strong> and normalize data<\/strong>, and finally generate a data lake<\/strong> to make further enhancement on data during the investigation process.<\/p>\n\n\n\n

Usually, this process involves different steps, coordinating their resolution, accessing different databases with different technologies, preparing many scripts, knowing different languages to query the relevant data, and so on.<\/p>\n\n\n\n

Taking care of all these steps is a daring task and requires a lot of expertise, and of course, is time-consuming, undercutting the efficiency of the entire project at hand.<\/p>\n\n\n\n

In the last couple of years AWS has been aggressively developing tools and services to help in Machine Learning and ETL tasks and at the last re:Invent<\/strong> introduced another important component for ETL-ML preparation: AWS Elastic Views<\/strong>. <\/p>\n\n\n\n

AWS Elastic Views allows a user to request data from different data sources being completely agnostic on their nature, to query for data in a SQL-compatible language, and to send all the queried data to a target, typically S3 or another data store in order to aggregate the heterogenous data in a data lake.<\/strong><\/p>\n\n\n

\n
\"AWS<\/figure><\/div>\n\n\n

Some of the main advantages are: <\/p>\n\n\n\n