{"id":3775,"date":"2021-11-12T14:00:00","date_gmt":"2021-11-12T13:00:00","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=3775"},"modified":"2021-11-12T12:00:05","modified_gmt":"2021-11-12T11:00:05","slug":"lake-formation-data-security-and-data-governance-with-lf-tbac","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/","title":{"rendered":"Lake Formation: Data Security and Data Governance with LF-TBAC"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<p>Big Data has rapidly grown as a way to describe information obtained from heterogeneous sources when it becomes incredibly complex to manage in terms of <strong>Variety<\/strong>, <strong>Veracity<\/strong>, <strong>Value<\/strong>, <strong>Volume<\/strong>, and <strong>Velocity<\/strong>. Still, it can be considered the \u201cNew Gold because of the potential to generate business value.\u201d<\/p>\n\n\n\n<p>Without adequate governance or quality, data lakes can quickly turn into unmanageable data swamps. Data engineers know the data they need lives in these swamps, but they won&#8217;t be able to find, trust, or use it without a clear data governance strategy.<\/p>\n\n\n\n<p>A very common challenge is <strong>maintaining <\/strong>Governance,<strong> access contro<\/strong>l over users who operate on the Data Lake, and protecting sensitive information.&nbsp;<\/p>\n\n\n\n<p>Companies need to centralize governance, access control, and a strategy backed by managed services to fine-grain control user access to data.<\/p>\n\n\n\n<p>Dealing with these situations typically requires two approaches: <em>manual<\/em>, <strong>more flexible<\/strong> but <strong>complex<\/strong>; <em>managed<\/em> which <strong>requires your solution to fit into specific standards<\/strong> but in return <strong>takes away all management complexities<\/strong> for the developers.<\/p>\n\n\n\n<p>This article will guide you through setting up your Data Lake with Lake Formation, showing all the challenges that must be addressed during the process with a particular eye on Security and Governance through the LF-TBAC approach.&nbsp;<\/p>\n\n\n\n<p>Tag-Based Access Control, in short <strong>TBAC<\/strong>, is an increasingly popular way to solve these challenges, applying constraints based on tags associated with specific resources.<\/p>\n\n\n\n<p>So, without further ado, let\u2019s dig in!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is TBAC access<\/h2>\n\n\n\n<p>Tag-based access control allows administrators of&nbsp; IAM-enabled resources to create access policies based on existing tags associated with eligible resources.&nbsp;<\/p>\n\n\n\n<p>Cloud providers manage permissions of both users and applications with policies, documents with rules that reference resources. By applying tags to those resources is possible to define simple and effective allow\/deny conditions.<\/p>\n\n\n\n<p>Using access management tags may reduce the number of access policies needed within a cloud account while also providing a simplified way to grant access to a heterogeneous group of resources.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why S3 alone is not enough<\/h2>\n\n\n\n<p>S3,&nbsp; like most AWS services, <strong>leverages the IAM principals for access management<\/strong>, meaning that it is possible to define which parts of a bucket (files and folders\/prefixes) a single IAM principal can read\/write; however is not possible to further restrict IAM access to specific parts of an object, nor to certain data segments stored inside objects.<\/p>\n\n\n\n<p>For example, let\u2019s assume that our application data is stored as a collection of parquet files divided per country in different folders.<\/p>\n\n\n\n<p><strong>It is possible to constrain a user to access only the users belonging to a given country<\/strong>. Still, there is no way to prevent them from reading the anagraphic information (e.g., username and address) stored as columns in the parquet.&nbsp;<\/p>\n\n\n\n<p>The <strong>only way to prevent users from accessing sensitive information would be to encrypt the columns before writing the files to S3, <\/strong>which can be <strong>slow<\/strong>, <strong>cumbersome,<\/strong> and open a whole new \u2018can of worm\u2019 regarding <strong>key storage<\/strong>, <strong>sharing,<\/strong> and eventually <strong>key decommissioning<\/strong>.<\/p>\n\n\n\n<p>Furthermore, <strong>giving access to external entities using IAM principals is often a non-trivial problem on its own<\/strong>.<\/p>\n\n\n\n<p>Luckily, AWS offers a <strong>battery included solution to the S3 Data Lake permission problem<\/strong>: enters AWS Lake Formation!<\/p>\n\n\n\n<p>AWS Lake Formation is a fully managed service that simplifies building, securing, and managing data lakes, automating many of the complex manual steps required to create them.&nbsp;<\/p>\n\n\n\n<p>Lake Formation also provides<strong> its own permissions model, which is what we want to explore in detail, that augments the classical AWS IAM permissions model<\/strong>.&nbsp;<\/p>\n\n\n\n<p>This centrally defined permissions model enables fine-grained access to data stored in data lakes through a simple grant\/revoke mechanism.<\/p>\n\n\n\n<p>So, by leveraging the power of Lake Formation, we would like to demonstrate, with a simple solution, how to address the aforementioned S3 challenges; let\u2019s continue!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Leveraging TBAC approach in Lake Formation<\/h2>\n\n\n\n<p>To accompany the reader in understanding why AWS Lake Formation can be a good choice in dealing with the complexities of managing a DataLake, we have prepared a simple tutorial on how to migrate heterogeneous data.<\/p>\n\n\n\n<p>From legacy on-prem databases into S3 while also creating a Lake Formation catalog to deal with data cleansing, permissions, and further operations.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"664\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image6-1024x664.png\" alt=\"TBAC approach in Lake Formation\" class=\"wp-image-3786\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image6-1024x664.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image6-400x259.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image6-768x498.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image6.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Our example implementation<\/figcaption><\/figure><\/div>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AWS Glue migration of on-prem data<\/h3>\n\n\n\n<p>The first step for creating a Data Lake is obviously to fetch, transform and insert the data. In this simple example, we used a mocked users dataset from a MySQL database. AWS glue is the natural way to connect to the heterogeneous data source, infer their schema import and transform the data and finally write them on S3 <a href=\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/\" target=\"_blank\" rel=\"noreferrer noopener\">as we explained in detail here<\/a>.<\/p>\n\n\n\n<p>After the data is loaded in a temporary S3 bucket, you need to create a <strong>Database in Lake Formation <\/strong>to connect to a <strong>Glue Crawler<\/strong> and run it on your S3 prefix to populate a Glue Catalog for your data.&nbsp;<br>Just go to the <strong>AWS Lake Formation console, <\/strong>in the <em>Databases <\/em>page under the <strong>Data catalog tab,<\/strong> and fill in a Database name and your S3 path.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"923\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image11-1024x923.png\" alt=\"create db from lake formation\" class=\"wp-image-3796\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image11-1024x923.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image11-333x300.png 333w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image11-768x692.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image11-1536x1385.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image11.png 1790w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption><em>Create a new Database from Lake Formation<\/em><\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p><em>Note: creating a database from Lake Formation assures correct permissions are associated with it, we could have done the same thing from AWS Glue but we would have needed extra effort to modify permissions for the next steps.<\/em><\/p>\n\n\n\n<p>After the database is created, we need the Glue Catalog, which is a metastore containing the schema (schema-on-read) of your data saved in S3 (usually as parquet files).&nbsp;<br>Having a Glue Schema is <strong>necessary to set up the AWS Lake Formation access layer in front of your S3 Data Lake<\/strong>. To make it, just create a Crawler and link it to the same S3 path as the Database, and <strong>set that DB as the crawler output<\/strong>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image14-1024x536.png\" alt=\"AWS Glue Crawler\" class=\"wp-image-3802\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image14-1024x536.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image14-400x210.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image14-768x402.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image14-1536x804.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image14.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Setup of a basic AWS Glue Crawler<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>In order to use the Crawler, an IAM role is necessary, but luckily AWS has a step for that in the Crawler creation wizard:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"440\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image7-1024x440.png\" alt=\"IAM role for crawler\" class=\"wp-image-3788\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image7-1024x440.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image7-400x172.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image7-768x330.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image7.png 1396w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>How to create an IAM role for using the Crawler<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Once the Crawler is created, and data is imported into the catalog, we are ready for the next step.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"269\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image10-1024x269.png\" alt=\"Cloudwatch Logs for crawler\" class=\"wp-image-3794\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image10-1024x269.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image10-400x105.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image10-768x202.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image10-1536x404.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image10.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Cloudwatch Logs demonstrating that Crawler worked correctly<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AWS Lake Formation<\/h3>\n\n\n\n<p>By having a Glue Data catalog in place, it is time to set up Lake Formation to finally manage user access permissions.&nbsp;<\/p>\n\n\n\n<p>In order to do so, let\u2019s start by going to the Lake Formation dashboard and <strong>removing the usual S3 access permissions<\/strong>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"333\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image3-1024x333.png\" alt=\"Lake Formation dashboard\" class=\"wp-image-3780\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image3-1024x333.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image3-400x130.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image3-768x250.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image3-1536x500.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image3.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Lake Formation dashboard<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>So we can go to <em>Data Catalog Settings<\/em> and uncheck <em>Use only IAM access control for new databases <\/em>and <em>Use only IAM access control for new tables in new databases.<\/em>&nbsp;<br>By default, access to Data Catalog resources and Amazon S3 locations are controlled solely by AWS Identity and Access Management (IAM) policies, unchecking the values allows Individual Lake Formation <a href=\"https:\/\/docs.aws.amazon.com\/lake-formation\/latest\/dg\/change-settings.html\">permissions<\/a> to take effect.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"511\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image5-1024x511.png\" alt=\"Lake Formation data catalog setting\" class=\"wp-image-3784\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image5-1024x511.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image5-400x199.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image5-768x383.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image5-1536x766.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image5.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Lake Formation data catalog setting: disable both the Use only flag<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Once access <strong>responsibilities are delegated to Lake Formation<\/strong>, we can remove the access for the standard IAMAllowedPrincipals IAM group, in the data lake <em>Permissions<\/em> tab, select the <strong>permission of the IAM group<\/strong> and click <em>Revoke<\/em>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"375\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image1-1024x375.png\" alt=\"revoke IAM group permission \" class=\"wp-image-3776\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image1-1024x375.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image1-400x147.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image1-768x282.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image1-1536x563.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image1.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Revoke standard IAMAllowedPrincipals permissions<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The user creating the DataLake will also be listed in this section with admin privileges, if you want that user to retain access to the data you can leave the permission as they are, otherwise you can either <strong>revoke the permission to the user\/role or restrict them<\/strong>.<br><em>Note: if you need to add a Data lake administrator principal, you can do so by going to the Administrative roles and tasks and adding a <\/em><strong><em>Data lake admin<\/em><\/strong><em>.<\/em><\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"481\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image4-1024x481.png\" alt=\"adding a Data lake admin\" class=\"wp-image-3782\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image4-1024x481.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image4-400x188.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image4-768x361.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image4-1536x722.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image4.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Add admin and db creator console<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Once all these steps are completed, it is time to start defining Lake Formation tags (<strong>LF-Tags<\/strong> from now on), which will be used to restrict access to the data lake.&nbsp;<br>From the <em>LF-Tags<\/em> page under the <em>Permissions<\/em> tab <strong>create a new LF-Tag<\/strong> and for key use <em>level<\/em> and add <em>private, sensitive, <\/em>and<em> public<\/em> as value separated by comma just like in the figure. Click <strong>Add LF-tag<\/strong>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"764\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image8-1024x764.png\" alt=\"Add LF-Tag\" class=\"wp-image-3790\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image8-1024x764.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image8-400x298.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image8-768x573.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image8.png 1204w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>LF-Tag creation<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Now once created, how can we use these tags to enforce access control? First of all, let\u2019s go to the database section and <strong>select our database<\/strong>, created at the beginning of the tutorial. In <em>database actions,<\/em> you can select the tag you\u2019ve created and the permission level.&nbsp;<br>Usually, we leave the database access open and restrict permissions on a per table and fields basis, but this is different for each database. In our example, we assign the level <strong>public<\/strong> to the whole example database.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"569\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image12-1024x569.png\" alt=\"Edit LF-Tag\" class=\"wp-image-3798\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image12-1024x569.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image12-400x222.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image12-768x427.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image12.png 1210w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Edit LF-Tag for the entire database<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Now if we want to <strong>restrict access to the columns in the user table containing personal info<\/strong>, we can go to the table to modify, select the column and change its LF-tag from <strong>public<\/strong> to <strong>private <\/strong>(see figures).<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1010\" height=\"1024\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image9-1010x1024.png\" alt=\"database schema\" class=\"wp-image-3792\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image9-1010x1024.png 1010w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image9-296x300.png 296w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image9-768x779.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image9-1514x1536.png 1514w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image9.png 1690w\" sizes=\"auto, (max-width: 1010px) 100vw, 1010px\" \/><figcaption>Schema of our example database in which we select a column<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"596\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image13-1024x596.png\" alt=\"Editing a per column LF-Tag\" class=\"wp-image-3800\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image13-1024x596.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image13-400x233.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image13-768x447.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image13.png 1202w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Editing a per column LF-Tag<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Now we just need to define which IAM principals (i.e, our test user) will have access to a given LF-Tag. To do so, let\u2019s go to <em>Data lake permissions<\/em> and <strong>grant permissions to an IAM user\/role\/group to access resources tagged with a given LF-Tag<\/strong>.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"502\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image15-1024x502.png\" alt=\"read permission\" class=\"wp-image-3804\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image15-1024x502.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image15-400x196.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image15-768x377.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image15.png 1474w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Grant read permission<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This example shows how to give a user access to all the resources tagged with \u201clevel\u201d: \u201cpublic\u201d.&nbsp;<\/p>\n\n\n\n<p>This user will thus be able to see all our databases except for the personal data tagged as private. Another user may have access to both public and private information, just add the private level in the LF-Tag section or modify columns tags according to your needs.<\/p>\n\n\n\n<p>We can now query the database table using our test user which, based on our set of permissions, is not able to see the first_name column (which is tagged as private).<\/p>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"598\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image2-1024x598.png\" alt=\" not able to see the first_name column\" class=\"wp-image-3778\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image2-1024x598.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image2-400x234.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image2-768x449.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image2-1536x897.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/image2.png 1999w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Athena is used to querying data and demonstrating that first_name is not shown in the table select because is tagged as private<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>As shown in the figure we have successfully managed to deny our test user the right to see a \u201csensible\u201d column of our choice.&nbsp;<\/p>\n\n\n\n<p>We would like to encourage the user to experiment in adding or removing also describe and select options from the LF-Tag permissions in the Data Lake section to see that we can also deny listing both database and tables.<\/p>\n\n\n\n<p><em>Note: as of <\/em><em>Nov 3, 2021<\/em><em>: to enhance security, AWS Lake Formation also added support for <\/em><em>managed VPC endpoints via <\/em><a href=\"https:\/\/aws.amazon.com\/privatelink\/\"><em>AWS PrivateLink<\/em><\/a><em> to access a data lake in a Virtual Private Cloud.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Feature in preview: row-level security<\/h2>\n\n\n\n<p>Lake Formation is still a young service, so there is much room for improvement. AWS is constantly working on increasing features for its services, and Lake Formation is no exception.<\/p>\n\n\n\n<p>AWS Lake Formation already allows setting access policies to hide data, such as a column with sensitive information, from users who do not have permission to view that data.&nbsp;<\/p>\n\n\n\n<p>Row-level security will add up to that by allowing to set row-level policies in addition to column-level policies.&nbsp;<\/p>\n\n\n\n<p>An example could be setting a policy that gives a data scientist access to only the experiment data marked with a specific id.<\/p>\n\n\n\n<p>Another interesting aspect would be to share the same Data Lake for different datasets to reduce costs and management efforts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">To Sum up<\/h2>\n\n\n\n<p>In this article, we have seen how we can leverage the power of AWS Services for Storage and Data Analytics to tackle the challenge imposed by Big Data, in particular how to manage access, permissions, and governance.<\/p>\n\n\n\n<p>We have shown that AWS Glue crawlers can effectively retrieve unstructured data from temporary repositories, being them databases like RDS or on-premises, or object storages like S3, and obtain a schema to populate a Glue Catalog.<\/p>\n\n\n\n<p>We have seen that starting from S3 and a metadata store, it is possible to create a Lake Formation Catalog on top of S3, entirely managed by AWS, to drastically reduce the management effort to set up and administrate a Data lake.<\/p>\n\n\n\n<p>We have briefly seen what is a Tag-Based Access Control (TBAC) methodology and how can be effectively used to manage access and permissions.<\/p>\n\n\n\n<p>We have shown that AWS Lake Formation can apply IAM policies and TBAC rules to give or restrain access to users and groups even on a per-column\/row basis. We demonstrated that with Lake Formation and AWS Glue, we could obscure sensitive data to specific principals.<\/p>\n\n\n\n<p>We have described LF-Tags in detail, with a simple tutorial. Finally, We have talked about Row-Level Security.<\/p>\n\n\n\n<p>To conclude, we can say that for challenges regarding Big Data and proper storage solutions, with an eye for security and governance matters, there are always two possible choices to make: DIY or opt for a managed solution.<\/p>\n\n\n\n<p>In this article, we chose a <strong>managed<\/strong> solution to show all the benefits of a more rigid approach to the problem. Despite being less flexible to adaptation, it offers a service more adherent to best practices and less burden in administration and governance.<\/p>\n\n\n\n<p>As always, feel free to comment in the section below, and reach us for any doubt, question or idea! <\/p>\n\n\n\n<p>See you on <strong>Proud2beCloud<\/strong> in a couple of weeks for a new story!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Big Data has rapidly grown as a way to describe information obtained from heterogeneous sources when it becomes incredibly [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":3815,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[468],"tags":[445,278,466,462,540],"class_list":["post-3775","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-management-governance-en","tag-aws-glue-en","tag-aws-identity-and-access-management-iam-en","tag-aws-lake-formation-en","tag-data-security-and-governance-en","tag-database"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Lake Formation: Data Security and Data Governance with LF-TBAC - Proud2beCloud Blog<\/title>\n<meta name=\"description\" content=\"Setting up a Data Lake with Lake Formation with a particular eye on Security and Governance through the LF-TBAC approach.\u00a0\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Lake Formation: Data Security and Data Governance with LF-TBAC\" \/>\n<meta property=\"og:description\" content=\"Setting up a Data Lake with Lake Formation with a particular eye on Security and Governance through the LF-TBAC approach.\u00a0\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/\" \/>\n<meta property=\"og:site_name\" content=\"Proud2beCloud Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-12T13:00:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21-social-eng.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Alessandro Gaggia\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Lake Formation: Data Security and Data Governance with LF-TBAC\" \/>\n<meta name=\"twitter:description\" content=\"Setting up a Data Lake with Lake Formation with a particular eye on Security and Governance through the LF-TBAC approach.\u00a0\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21-social-eng.png\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Alessandro Gaggia\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/\",\"url\":\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/\",\"name\":\"Lake Formation: Data Security and Data Governance with LF-TBAC - Proud2beCloud Blog\",\"isPartOf\":{\"@id\":\"https:\/\/blog.besharp.it\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21_12-11-21.png\",\"datePublished\":\"2021-11-12T13:00:00+00:00\",\"author\":{\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/f27fc12d10867c6ea6e0158ce4dd8924\"},\"description\":\"Setting up a Data Lake with Lake Formation with a particular eye on Security and Governance through the LF-TBAC approach.\u00a0\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#primaryimage\",\"url\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21_12-11-21.png\",\"contentUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21_12-11-21.png\",\"width\":1600,\"height\":900,\"caption\":\"Lake Formation: Data Security e Data Governance mediante LF-TBAC\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.besharp.it\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lake Formation: Data Security and Data Governance with LF-TBAC\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.besharp.it\/#website\",\"url\":\"https:\/\/blog.besharp.it\/\",\"name\":\"Proud2beCloud Blog\",\"description\":\"il blog di beSharp\",\"alternateName\":\"Proud2beCloud Blog\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.besharp.it\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/f27fc12d10867c6ea6e0158ce4dd8924\",\"name\":\"Alessandro Gaggia\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f58dc28050f26409e22ab60346d06220?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f58dc28050f26409e22ab60346d06220?s=96&d=mm&r=g\",\"caption\":\"Alessandro Gaggia\"},\"description\":\"Head of software development di beSharp, Full-Stack developer, mi occupo di garantire lo stato dell\u2019arte di tutta la nostra codebase. Scrivo codice in quasi ogni linguaggio, ma prediligo Typescript. Respiro Informatica, Game design, Cinema, Fumetti e buona cucina. Disegno per passione!\",\"url\":\"https:\/\/blog.besharp.it\/author\/alessandro-gaggia\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Lake Formation: Data Security and Data Governance with LF-TBAC - Proud2beCloud Blog","description":"Setting up a Data Lake with Lake Formation with a particular eye on Security and Governance through the LF-TBAC approach.\u00a0","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/","og_locale":"en_US","og_type":"article","og_title":"Lake Formation: Data Security and Data Governance with LF-TBAC","og_description":"Setting up a Data Lake with Lake Formation with a particular eye on Security and Governance through the LF-TBAC approach.\u00a0","og_url":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/","og_site_name":"Proud2beCloud Blog","article_published_time":"2021-11-12T13:00:00+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21-social-eng.png","type":"image\/png"}],"author":"Alessandro Gaggia","twitter_card":"summary_large_image","twitter_title":"Lake Formation: Data Security and Data Governance with LF-TBAC","twitter_description":"Setting up a Data Lake with Lake Formation with a particular eye on Security and Governance through the LF-TBAC approach.\u00a0","twitter_image":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21-social-eng.png","twitter_misc":{"Written by":"Alessandro Gaggia","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/","url":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/","name":"Lake Formation: Data Security and Data Governance with LF-TBAC - Proud2beCloud Blog","isPartOf":{"@id":"https:\/\/blog.besharp.it\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#primaryimage"},"image":{"@id":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21_12-11-21.png","datePublished":"2021-11-12T13:00:00+00:00","author":{"@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/f27fc12d10867c6ea6e0158ce4dd8924"},"description":"Setting up a Data Lake with Lake Formation with a particular eye on Security and Governance through the LF-TBAC approach.\u00a0","breadcrumb":{"@id":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#primaryimage","url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21_12-11-21.png","contentUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/11\/Copertina-blog-12-11-21_12-11-21.png","width":1600,"height":900,"caption":"Lake Formation: Data Security e Data Governance mediante LF-TBAC"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.besharp.it\/lake-formation-data-security-and-data-governance-with-lf-tbac\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.besharp.it\/"},{"@type":"ListItem","position":2,"name":"Lake Formation: Data Security and Data Governance with LF-TBAC"}]},{"@type":"WebSite","@id":"https:\/\/blog.besharp.it\/#website","url":"https:\/\/blog.besharp.it\/","name":"Proud2beCloud Blog","description":"il blog di beSharp","alternateName":"Proud2beCloud Blog","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.besharp.it\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/f27fc12d10867c6ea6e0158ce4dd8924","name":"Alessandro Gaggia","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f58dc28050f26409e22ab60346d06220?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f58dc28050f26409e22ab60346d06220?s=96&d=mm&r=g","caption":"Alessandro Gaggia"},"description":"Head of software development di beSharp, Full-Stack developer, mi occupo di garantire lo stato dell\u2019arte di tutta la nostra codebase. Scrivo codice in quasi ogni linguaggio, ma prediligo Typescript. Respiro Informatica, Game design, Cinema, Fumetti e buona cucina. Disegno per passione!","url":"https:\/\/blog.besharp.it\/author\/alessandro-gaggia\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/3775","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/comments?post=3775"}],"version-history":[{"count":0,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/3775\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media\/3815"}],"wp:attachment":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media?parent=3775"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/categories?post=3775"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/tags?post=3775"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}