{"id":2803,"date":"2021-03-16T12:35:44","date_gmt":"2021-03-16T11:35:44","guid":{"rendered":"https:\/\/blog.besharp.it\/costruire-un-data-lake-su-aws-con-aws-lake-formation\/"},"modified":"2021-04-08T15:29:42","modified_gmt":"2021-04-08T13:29:42","slug":"costruire-un-data-lake-su-aws-con-aws-lake-formation","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/costruire-un-data-lake-su-aws-con-aws-lake-formation\/","title":{"rendered":"Building a Data Lake from scratch on AWS using AWS Lake Formation"},"content":{"rendered":"\n
Leveraging available data (Big Data) has become a significant focus for most companies in the last decades. In the last few years, the advent of Cloud Computing has democratized access to more powerful IT resources, thus eliminating the costs and hassles of managing the necessary infrastructure required in an on-premises data center.<\/p>\n\n\n\n
Cloud Computing also helps companies use their data efficiently, lowering engineering costs thanks to its managed services’ powerfulness.<\/p>\n\n\n\n
It also promotes the use of on-demand infrastructures, making it easier to re-think, re-engineer, and re-architect a data lake to explore new use cases.<\/p>\n\n\n\n
Being data a focal point for business decisions, managing it efficiently becomes a priority.<\/p>\n\n\n\n
Among many ways to do so, the data lake concept, a scalable, low-cost, centralized data repository for storing raw data from various sources, has grown to success. It enables users to store data as-is without structuring it first to run different analytics types, gaining insights, and guiding more accurate strategic business decisions.<\/p>\n\n\n\n
Building a data lake is not an easy task: it involves numerous manual steps, making the process complex and, more importantly, very time-consuming. Data usually comes from diverse sources and should be carefully monitored.<\/p>\n\n\n\n
Moreover, managing this amount of data requires several procedures to avoid leaks and security breaches, which means you need to set up access management policies, enable encryption of sensitive data and manage keys for it.<\/p>\n\n\n\n
Without the right choices about technology, architecture, data quality, and data governance, a data lake can quickly become an isolated mess of difficult-to-use, hard-to-understand, often inaccessible data.<\/p>\n\n\n\n
Fortunately, AWS Cloud comes to the rescue with many services designed to manage a data lake, such as AWS Glue and S3.<\/p>\n\n\n\n
For this article, we will assume the reader already has some knowledge about AWS Services and understands the concepts behind AWS Glue and S3. If this is not the case, we encourage you to read our latest stories about ingesting data for Machine Learning workloads<\/a> and managing complex Machine Learning projects via Step Functions<\/a>.<\/p>\n\n\n\n We will explore how to build a very simple data lake using Lake Formation quickly. Then, we will focus on the security and governance advantages that this service offers over plain AWS Glue.<\/p>\n\n\n\n Let us dig into it!<\/p>\n\n\n\n Before focusing on the advantages of managing a data lake through AWS Lake Formation, we first need to create a simple one.<\/p>\n\n\n\n Let us go to the AWS console and choose AWS Lake Formation in the service list or via the search bar. We will find this dashboard:<\/p>\n\n\n\nQuick Setup<\/h2>\n\n\n\n