<\/figure><\/div>\n\n\n<\/p>\n\n\n\n
Another quick tip: spend some time adding metadata and descriptions to your databases, tables, and columns to improve discoverability. For example, you can add business descriptions so that business users can use their own terminology to find the right data, dramatically reducing the time spent on data discovery.<\/p>\n\n\n\n
Another best practice is to associate tags to data inside AWS LakeFormation. LF-Tags<\/strong> (Lake Formation Tags) are key-value pairs that you can associate with databases, tables, and columns to describe features like the sensitivity of the data, the business domain, or any other relevant entity that is valuable to you.<\/p>\n\n\n\nWe will use these tags later for implementing attribute-based access control (ABAC), a far more scalable approach than traditional role-based access control.<\/p>\n\n\n\n
Data Governance<\/h4>\n\n\n\n Now that we have all the sources in our data lake, let\u2019s start building the foundations to perform data lake administration and grant data access in an easy way.<\/p>\n\n\n\n
As mentioned before, we start by defining our tag strategy with LF-Tags to implement attribute-based access control (ABAC) and manage data access at scale. This step is crucial for data democratization since we will use these tags to both let users self-discover data and grant access to that data.<\/p>\n\n\n\n
In our example, we developed this simple strategy:<\/p>\n\n\n\n
\nArea: Marketing, Sales<\/li>\n\n\n\n Domain: Customers, Food, Electronics<\/li>\n\n\n\n Sensitivity: Public, Private, PII<\/li>\n<\/ul>\n\n\n\nAfter creating LF-Tags, with their allowed value sets, you can start assigning them to databases, tables, or views, and even to specific columns. Remember that LF-Tags are propagated in lower-level structures, therefore, all tags associated with a database will be associated with all its tables. Likewise, all tags associated with a table\/view will be associated with all its columns.<\/p>\n\n\n\n
The real power of LF-Tags, with respect to traditional approaches, appears when implementing complex access patterns, being able to follow the principle of least privilege with extraordinary precision. Data stewards can create tag-based access policies that automatically grant permissions to users who possess matching tags in their LF-Tag expressions. Few quick examples here based on our test data:<\/p>\n\n\n\n
\nA policy stating that users with tags UserRole=FoodAnalyst<\/em> can access any data tagged with Domain=Food<\/em> and Sensitivity=Public<\/em>.<\/li>\n\n\n\nA policy stating that users with tags UserRole=MarketingAnalyst<\/em> can access any data tagged with Sensitivity=PII<\/em>. This classification framework is very useful for quickly identifying all resources containing PII or other regulated data types across your entire data lake. We can use it to hide some columns from users who don\u2019t need this information. Moreover, it enables comprehensive audit and compliance reporting.<\/li>\n<\/ul>\n\n\n\nThis approach drastically reduces operational overhead as the data lake grows. As new datasets are added, their administrators need only to apply appropriate tags and correct access controls will be enforced by design. Similarly, when a new user joins, data lake administrators just need to assign the appropriate LF-Tag expressions, and access to all relevant data inside the organization will be granted.<\/p>\n\n\n\n
Data Roles and Administration<\/h4>\n\n\n\n Data is now categorized with tags, and we have developed tag policies to define how to access it. It\u2019s time to define the entities that will access data and start granting permissions.<\/p>\n\n\n\n
First, let\u2019s define the entities that will access our data. <\/p>\n\n\n\n
The idea here is to create roles that reflect business functions inside the organization. This approach is very effective since as a new person joins the project, it simply needs to be assigned to the appropriate role rather than requiring custom permission configurations, significantly reducing administrative overhead.<\/p>\n\n\n\n
In our case, we defined:<\/p>\n\n\n\n
\ntwo administrative roles: one for the marketing area and one for the sales area<\/li>\n\n\n\n an analyst role: for the food domain of the sales area<\/li>\n<\/ul>\n\n\n\nThe core idea is to delegate database administration to the administrative roles so that they can grant permissions independently, promoting a democratized approach to data.<\/p>\n\n\n\n
Finally, it\u2019s time to grant permissions!<\/p>\n\n\n\n
Self-Service Data Access<\/h2>\n\n\n\n Now that we have placed all the necessary pieces, we have the foundation to achieve self-service data access. <\/p>\n\n\n\n
Having all data structured and organized in AWS LakeFormation, with the content properly described via tags, in a very precise way, up to the column level, allows users to easily search for needed data and start extracting value from it.<\/p>\n\n\n\n
As AWS LakeFormation administrators, we start by delegating data administration of every organization\u2019s area to local administrators. As mentioned before, we define two administrators, one for the marketing area and one for sales, using the LF-Tag expressions:<\/p>\n\n\n\n
\nArea = Marketing<\/em><\/li>\n\n\n\nArea = Sales<\/em><\/li>\n<\/ul>\n\n\n\nWith these simple expressions, we select all databases, tables, and columns that are tagged with those LF-Tag values.<\/p>\n\n\n\n
Since we are creating administrator roles, we grant super permissions on both databases and tables. Moreover, very importantly, we also allow grantable read permissions to delegate data access administration to these roles so that they can self-serve their team.<\/p>\n\n\n\n
Here, we see the real power of the AWS LakeFormation permission model, which, instead of IAM policies, provides data-specific controls<\/strong> that operate independently from the underlying storage layer. This is a crucial advantage since data access policies remain consistent regardless of how users access the data. Whether it is a query on a table through Athena, an analysis with SageMaker, or a dashboard using QuickSight, AWS LakeFormation permissions remain the same and apply consistently. <\/p>\n\n\n\nArea administrators\u2019 can now see all the tables associated with their respective areas inside the organization. Now, to complete the example, we can use the sales area administrator to grant access to food sales data to its team, represented by the data analyst role. By doing so, the analyst will be able to see and use food sales data to perform analysis and create dashboards for visualization.<\/p>\n\n\n\n
We have finally democratized data access. Users can now freely discover data using tags and access it!<\/p>\n\n\n\n
<\/p>\n\n\n
\n
<\/figure><\/div>\n\n\n<\/p>\n\n\n\n
Monitoring and Auditing<\/h2>\n\n\n\n Before wrapping up, let’s have a look at AWS LakeFormation monitoring and auditing features.<\/p>\n\n\n\n
AWS LakeFormation offers a centralized dashboard that security teams can use to review all data access events. This centralization enables more effective governance, simplifies compliance, and faster audit responses. <\/p>\n\n\n\n
Here is an example image of the dashboard:<\/p>\n\n\n\n
<\/p>\n\n\n
\n
<\/figure><\/div>\n\n\n<\/p>\n\n\n\n
At first sight, the dashboard reports the event alongside the user and the time of the event. Each event has a detailed description, which is its log inside CloudTrail.<\/p>\n\n\n\n
Key takeaways and some little spoilers<\/h2>\n\n\n\n In this article, we’ve explored how AWS LakeFormation can transform data access in organizations through a self-service data platform.<\/p>\n\n\n\n
From the setup, registering data locations, comprehensive data cataloging with business metadata, and LF-Tags for attribute-based access control, organizations can achieve true data democratization while maintaining robust security.<\/p>\n\n\n\n
The power of AWS LakeFormation lies in its ability to define granular permissions at database, table, column, and row levels, allowing administrators to delegate access control to area-specific data stewards. This approach significantly reduces administrative overhead while ensuring users can discover and access only the data they need.<\/p>\n\n\n\n
Through proper implementation of role-based structures aligned with business functions and consistent permission models across all AWS analytics services, AWS LakeFormation creates a foundation for data-driven decision-making across the enterprise.<\/p>\n\n\n\n
Now that you know how to democratize data with AWS LakeFormation, in the next chapter of this series of articles, we will explore services that build on top of it, like DataZone.<\/p>\n\n\n\n
Have you ever tried to democratize data access on your own? Let us know in the comments!<\/p>\n\n\n\n
\n\n\n\nAbout Proud2beCloud<\/h4>\n\n\n\n Proud2beCloud<\/strong> is a blog by beSharp<\/a>, an Italian APN Premier Consulting Partner expert in designing, implementing, and managing complex Cloud infrastructures and advanced services on AWS. Before being writers, we are Cloud Experts working daily with AWS services since 2007. We are hungry readers, innovative builders, and gem-seekers. On Proud2beCloud, we regularly share our best AWS pro tips, configuration insights, in-depth news, tips&tricks, how-tos, and many other resources. Take part in the discussion!<\/p>\n","protected":false},"excerpt":{"rendered":"In this series of articles, we will describe how to properly create and structure a self-service Data Platform for data […]<\/p>\n","protected":false},"author":16,"featured_media":7704,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[475],"tags":[419,721,723],"class_list":["post-7690","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics-en","tag-data-ingestion-en","tag-data-platform","tag-medallion-architecture"],"yoast_head":"\n
Democratize data access through a self-service Data Platform using AWS LakeFormation - Part 2 - Proud2beCloud Blog<\/title>\n \n \n \n \n \n \n \n \n \n \n \n \n\t \n\t \n\t \n \n \n \n \n \n \n\t \n\t \n\t \n