{"id":1980,"date":"2020-11-26T16:51:14","date_gmt":"2020-11-26T15:51:14","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=1980"},"modified":"2021-03-18T16:25:38","modified_gmt":"2021-03-18T15:25:38","slug":"etl-orchestration-on-aws-with-aws-step-functions","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/","title":{"rendered":"ETL Orchestration on AWS with AWS Step Functions"},"content":{"rendered":"\n<p>In the latest years, the engineering, governance, and analysis of data has become a very common talking point.<\/p>\n\n\n\n<p>The need for data-driven decision-making, in fact, has grown the need of collecting and analyzing data in many ways and AWS has shown a particular interest in this field developing multiple tools for achieving these business goals.<\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Before being able to allow the figure of the data analyst to explore and visualize the data, a crucial step is needed. This procedure is commonly identified as ETL (extract, transform, and load) and, usually, it\u2019s far from being simple.<\/span><\/p>\n\n\n\n<p>He who carries out this process has the responsibility of the following tasks:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><b>Extraction<\/b><span style=\"font-weight: 400;\">: data usually comes from numerous heterogeneous sources, such as databases, web spidering, data streams, semi-structured data etc.<\/span><span style=\"font-weight: 400;\">\n<\/span><span style=\"font-weight: 400;\">Due to the potential diversity of the data sources, a validation of the incoming data is mandatory, in order to not introduce information with an unexpected format or pattern. <\/span><\/li><li><b>Transformation<\/b><span style=\"font-weight: 400;\">: after the load of the valid data into <\/span><i><span style=\"font-weight: 400;\">staging storage<\/span><\/i><span style=\"font-weight: 400;\">, a set of common transformations are applied to it. Typically, this stage is also identified as data preparation and it usually involves the removal of incomplete or inaccurate data (data cleansing), aggregation with other data, records deduplication, and all the steps of normalization and encoding.<\/span><\/li><li><b>Load<\/b><span style=\"font-weight: 400;\">: finally, data that has been validated and transformed, is stored in the persistent data stores. These data stores may vary according to business needs. In fact, they can be identified according to different attributes. Lately, the most common data stores for ETL are data warehouses and data lakes. The first is generally used to store data with a strict schema definition in relational databases like Amazon Redshift. The latter, instead, is commonly made up of semi-structured data and it\u2019s mostly employed for machine learning, exploratory analysis, big data analysis, visualization etcetera. The coupling of Amazon S3 (for low-cost storage) and Amazon Athena (for fast and serverless queries on files), allows an excellent development of data lakes on AWS.<\/span><\/li><\/ul>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"929\" height=\"293\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl.png\" alt=\"ETL process\" class=\"wp-image-1968\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl.png 929w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl-400x126.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl-768x242.png 768w\" sizes=\"auto, (max-width: 929px) 100vw, 929px\" \/><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">ETL on AWS<\/h2>\n\n\n\n<p>As briefly seen, a couple of AWS services have been cited as important components of an infrastructure capable of hosting an ETL process.<\/p>\n\n\n\n<p>However, other services have been developed by AWS and they have already become the state of the art in the construction of data ingestion pipelines.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ETL Extraction on AWS<\/h3>\n\n\n\n<p>The extraction of data from which business analytics of an organization can profit from, may come with disparate paces and dimensions. From the hundreds of orders per second submitted to an e-commerce store during black Friday, to the ingestion of a monthly business report. The ETL infrastructure must always be ready to welcome the new information into the staging storage.<\/p>\n\n\n\n<p>AWS services can help to accommodate such dissimilar business needs by making the data convey into the same repository, which is commonly identified by S3 buckets.<\/p>\n\n\n\n<p>Depending on the mole of the expected data, it\u2019s possible to defer the validation of incoming files to different AWS services. In order to accomplish the best cost\/performance ratio, it\u2019s necessary to choose between AWS Lambda for an event-driven pattern when small files are expected, and AWS Glue with scheduled batch job runs when data may reach volumes that may exceed AWS Lambda\u2019s computational limits.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ETL Transformation on AWS<\/h3>\n\n\n\n<p>The transformation of the incoming data is commonly a heavy duty job to be executed in batches. For this reason, the best candidates for this task are Glue resources. AWS Glue is based on serverless clusters that can seamlessly scale to terabytes of RAM and thousands of core workers<\/p>\n\n\n\n<p>It is possible to run python scripts or either PySpark and Spark code for optimal scalability. Python shell glue jobs are mostly indicated for low-to-medium loads due to the fact that cannot scale to more than a single worker (4 vCPU and 16 GB of RAM).<\/p>\n\n\n\n<p>However, although with Spark, Glue Jobs, and Glue Studio it\u2019s possible to create very meticulous transformation jobs, it\u2019s most likely that the new AWS Glue DataBrew service can fulfill this need with its very clear and complete web-interface.<\/p>\n\n\n\n<p>It\u2019s important to note that, in order to allow the Glue Jobs to retrieve the needed data from a single source, AWS Glue incorporates in its interface the Data Catalog. As the well-explanatory name explains, an archive of the data present in our data stores is maintained and used for ingestion. For the purpose of maintaining and updating the catalog, an AWS Glue component called Crawler is used. The crawler, in fact, will give visibility on new files and partitions to the jobs trying to fetch data from the sources.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ETL Load on AWS<\/h3>\n\n\n\n<p>After the transformation process, a specific Glue Job or the same component employed in the previous step can finally store the valid, clean, and transformed data to the targets used for business analysis and visualization via, for example, Amazon QuickSight dashboards.<\/p>\n\n\n\n<p>In order to preserve the privacy of the sensitive data that may progress in the pipeline, it\u2019s important to set-up the needed security measures such as KMS encryption for the data at rest in the buckets and databases and SSL protected transfers for data in transit. Moreover, it is a good practice to introduce obfuscation in PII stored in the domain. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"975\" height=\"333\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl1.png\" alt=\"ETL process on AWS\" class=\"wp-image-1970\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl1.png 975w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl1-400x137.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl1-768x262.png 768w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">ETL orchestration on AWS<\/h2>\n\n\n\n<p>The management of the bits and bytes flowing in the whole ETL data pipeline is commonly not an easy task.<\/p>\n\n\n\n<p>In order to apply appropriate governance on the data produced from the process, ad-hoc quality checks are usually performed. It\u2019s important, in fact, to check any inadequacy in business requirements such as the lack of data in the data lake due to an error in the code of the validation job.\n<\/p>\n\n\n\n<p>AWS Glue has the tools to create workflows and triggers to build some sort of data pipelines. However, the possible solutions you can achieve are very limited by the lack of directives that allow loops, retries, proper error handling, and the invocation of other AWS services outside of AWS Glue.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"827\" height=\"467\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl2.png\" alt=\"deduplicate and fix etl\" class=\"wp-image-1972\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl2.png 827w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl2-400x226.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl2-768x434.png 768w\" sizes=\"auto, (max-width: 827px) 100vw, 827px\" \/><\/figure><\/div>\n\n\n\n<p>In AWS, however, a specific tool allows scrupulous orchestration of serverless services: AWS Step Functions. This tool allows the management of retry logic and error handling to make our distributed applications better react in case of unexpected behaviours.<\/p>\n\n\n\n<p>In the following sections, we are going to discover and employ Step Functions for the orchestration of a realistic use case of ETL.\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AWS Step Functions<\/h3>\n\n\n\n<p>This AWS service allows the construction of highly scalable finite-state machines, that in the express configuration can handle up to one-hundred-thousands state changes per second.\n<\/p>\n\n\n\n<p>It\u2019s important to note that a workflow built with this service is mainly composed of:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>States: the steps that are transitioned during the workflow.<\/li><li>Directives: branching, retries, error handling, parallel processing, and loops.<\/li><li><span style=\"font-weight: 400;\">Service integrations: thanks to the different <\/span><a href=\"https:\/\/docs.aws.amazon.com\/step-functions\/latest\/dg\/concepts-service-integrations.html\"><span style=\"font-weight: 400;\">service integration of Step Functions with other AWS services<\/span><\/a><span style=\"font-weight: 400;\">, it is possible to invoke some of the many AWS serverless services. The almost inevitable AWS Lambda functions are one of these and they can act, however, as middleware invokers for the other AWS services not directly integrated with Step Functions.<\/span><\/li><\/ul>\n\n\n\n<p>All of the components just listed are then linked one to the other in an Amazon States Language, that is the JSON-based language used to generate Step Functions definitions.<\/p>\n\n\n\n<p>Moreover, AWS Step Functions allows the monitoring of each run from the AWS console, supporting the real-time oversight of a workflow.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The use case<\/h2>\n\n\n\n<p>As said previously, we are going to put ourselves in the shoes of an architect that needs to deploy an orchestrated ETL flow. In the next sections, in fact, we are going to examine the business requirements of an imaginary customer and create a solution to accommodate them.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Customer\u2019s requirements<\/h3>\n\n\n\n<p>For the use case we will architect, we are going to pretend that an imaginary customer set some business requirements to guide our decisions:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>An audit table to always be aware of the state of each dataset is needed. This needs to give the possibility to build a web interface for the end-user in the future.<\/li><li>At this moment, incoming files can be discerned in:<ul><li>Dataset type A: daily files with sizes from 1 MB to 20 MB<\/li><li>Dataset type B: monthly reports with sizes from 15 MB to 50 MB<\/li><\/ul><\/li><li>Invalid files must be moved to a failed bucket for further human analysis.<\/li><li>Immediate email notifications in case of failures in any step.<\/li><li>The transformation and loading phases are managed with Spark code provided by the customer organization\u2019s data analysts.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Requirements analysis<\/h3>\n\n\n\n<p>In order to keep an audit trail of the state of each dataset flowing in and out, a DynamoDB table can be employed. This table will be automatically populated on the insertion of a new file in the input bucket via a Lambda function and a Cloudwatch Event.<\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Due to the directives allowed by Step Functions to interface with DynamoDB, it\u2019s possible to get, insert, update and delete records in the table. In this way, it\u2019s possible to directly update the state of each file whenever it\u2019s validated or transformed and loaded.<\/span>\n\n<span style=\"font-weight: 400;\">The table will be structured in the following way:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Type of dataset (daily or monthly) as the partition key<\/li><li>Bucket name and key of the file as the sort key<\/li><li>Ingestion state set to NEW when the file is created<\/li><li>File size &#8211; can be employed in the future in case of the need to ingest bigger datasets. Step functions could be able to address the dataset to a Glue Job instead of a Lambda Function.<\/li><\/ul>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"225\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl3-1024x225.png\" alt=\"DynamoDB Table for etl orchestration\" class=\"wp-image-1973\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl3-1024x225.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl3-400x88.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl3-768x169.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl3.png 1518w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>In view of the sizes of the two types of datasets that the customer expects to come in the workflow, AWS Lambdas are the best candidates to reach the best cost\/performance ratio. <\/p>\n\n\n\n<p>Moreover, lambda functions can be adopted to firstly retrieve the list of datasets that need to be ingested and move the invalid ones to the failed bucket.<\/p>\n\n\n\n<p>It\u2019s possible to employ the service integration of AWS Step Functions with SNS to promptly notify the customer when an error state is caused by the failure of one of the ETL steps.\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The state machine<\/h3>\n\n\n\n<p>It\u2019s been decided to start the Step Function in a scheduled manner, in order to reduce the costs of AWS Glue by ingesting file batches per-run.<\/p>\n\n\n\n<p>In order to be able to validate all the new files that came from the last Step Function\u2019s iteration, a lambda function needs to retrieve the records having the ingestion_state set to NEW. In this way, the workflow will be able to loop through the files list and accomplish the extraction and validation phase.<\/p>\n\n\n\n<p>As we can see from the state diagram, the outcome of the validation lambda is stored in the dynamo table and then used to discern valid from invalid files.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl4.png\" alt=\"state diagram etl orchestration\" class=\"wp-image-1976\" width=\"548\" height=\"853\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl4.png 651w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/etl4-193x300.png 193w\" sizes=\"auto, (max-width: 548px) 100vw, 548px\" \/><\/figure><\/div>\n\n\n\n<p>Whenever the file validation lambda gets in an error state and the retry mechanism doesn\u2019t arise any success, the file is categorized as invalid and automatically moved to the failed bucket and an email notification is sent to the customer.<\/p>\n\n\n\n<p>When all iterations are completed and the invalid files have been discarded for further analysis, the Glue Crawler needs to be run in order to update the data catalog.<\/p>\n\n\n\n<p>Unfortunately, at the time of writing, there\u2019s no service integration that allows the direct start of the crawler from Step Functions. For this reason, a lambda function will be in charge of this task. Via a wait state in Step Functions,<\/p>\n\n\n\n<p> we can specify after how much time we\u2019ll check the crawler\u2019s run state again in a polling fashion.<\/p>\n\n\n\n<p>At this stage, the data is validated and cataloged, and we are ready to transform and load it in the data lake.<\/p>\n\n\n\n<p>Step Functions have been lately enriched with the possibility to synchronously run Glue Jobs thanks to a service integration with AWS Glue. This will allow us to run the transformation scripts provided by the data analysts without the need of a lambda invocation in the middle<\/p>\n\n\n\n<p> In fact, the synchronous run of services from Step Functions allows the state machine to be stopped in the current state until the service hasn\u2019t responded back. In this way we\u2019ll be able to discern successful job runs from failures. In the first case it\u2019s possible to flow forward in the ETL workflow, while in the latter, a notification is sent to the customer again.<\/p>\n\n\n\n<p>The next step is for auditability purposes only, in fact the step function will manage the update of the dynamo table setting the ingestion state to INGESTED for all the files that were loaded in this run.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Tips and tricks<\/h3>\n\n\n\n<p>We saw that the Lambda Functions employed in Step Functions are commonly simple invocations of other services, retrieval of lists and state checks for AWS resources spinning up. <\/p>\n\n\n\n<p>For this purpose, in fact, it\u2019s possible to maintain a single codebase and a single Lambda Function to execute these simple tasks by selecting the method to be run according to the state in which the lambda was invoked.<\/p>\n\n\n\n<p>An example of this behaviour is the state that retrieves the list of new files from the dynamo table:<\/p>\n\n\n\n<pre> \n\"Get New List\": {\n   \"Type\": \"Task\",\n   \"Resource\": \"arn:aws:states:::lambda:invoke.sync\",\n   \"Parameters\": {\n       \"FunctionName\": \"arn:aws:lambda:eu-west-1:XXXXXXXXXXXXX:function:glue-orc-sfn-lambda\",\n       \"Payload\": {\n           \"NeededState\": \"NEW\",\n           \"DatasetName.$\": \"$.DatasetName\",\n           \"SFNState.$\": \"$$.State.Name\"\n       }\n   },\n   \"ResultPath\": \"$.NewFilesList\",\n   \"Next\": \"New Files Loop\"\n}\n<\/pre>\n\n\n\n<p><span style=\"font-weight: 400;\">This snippet of Amazon States Language defines the <\/span><i><span style=\"font-weight: 400;\">Get New List <\/span><\/i><span style=\"font-weight: 400;\">state. To the parameters passed to the Lambda Function, the state name is included to allow the selection of the correct Python method.<\/span><\/p>\n\n\n\n<pre> \ndef action_switcher(sfn_state: str) -&gt; function:\n   switcher = {\n       \"Get New List\": get_new_list,\n       \"Run Crawler State\": run_crawler,\n       \"Move Invalid File To Failed Bucket\": move_file_to_bucket\n   }\n   return switcher.get(sfn_state, lambda: None)\n \n<\/pre>\n\n\n\n<p><span style=\"font-weight: 400;\">This has been possible thanks to the possibility to access the <\/span><a href=\"https:\/\/docs.aws.amazon.com\/step-functions\/latest\/dg\/input-output-contextobject.html\"><span style=\"font-weight: 400;\">Step Functions context objects<\/span><\/a><span style=\"font-weight: 400;\"> that may be very helpful in many definitions.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusions<\/h2>\n\n\n\n<p>In this article we explored the world of ETL and how complex it can be to orchestrate a data pipeline in order to be error-proof, scalable and easily auditable.<\/p>\n\n\n\n<p>With Step Functions we saw how a single service is enough to engineer a resilient solution to enable business analytics at any scale.<\/p>\n\n\n\n<p>What about your ETL orchestration process on AWS? Tell us more about it!<\/p>\n\n\n\n<p>Many hidden gems are still available for us to better work with everyday data and we can\u2019t wait to show you more:<\/p>\n\n\n\n<p>So, keep following us on <strong>#Proud2beCloud<\/strong>!<\/p>\n\n\n\n<p>See you in 14 days with a new article!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the latest years, the engineering, governance, and analysis of data has become a very common talking point. The need [&hellip;]<\/p>\n","protected":false},"author":15,"featured_media":1992,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[475],"tags":[264,411,268],"class_list":["post-1980","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics-en","tag-aws-lambda-en","tag-data-analytics-en","tag-serverless-en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>ETL Orchestration on AWS with AWS Step Functions - Proud2beCloud Blog<\/title>\n<meta name=\"description\" content=\"ETL orchestration on Amazon Web Services: how to do it with AWS Step Functions.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"ETL Orchestration on AWS with AWS Step Functions\" \/>\n<meta property=\"og:description\" content=\"ETL orchestration on Amazon Web Services: how to do it with AWS Step Functions.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/\" \/>\n<meta property=\"og:site_name\" content=\"Proud2beCloud Blog\" \/>\n<meta property=\"article:published_time\" content=\"2020-11-26T15:51:14+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-03-18T15:25:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/facebook-link-image-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Christian Calabrese\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"ETL Orchestration on AWS with AWS Step Functions\" \/>\n<meta name=\"twitter:description\" content=\"ETL orchestration on Amazon Web Services: how to do it with AWS Step Functions.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/twitter-shared-link-1.png\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Christian Calabrese\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/\",\"url\":\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/\",\"name\":\"ETL Orchestration on AWS with AWS Step Functions - Proud2beCloud Blog\",\"isPartOf\":{\"@id\":\"https:\/\/blog.besharp.it\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/copertine-blog_60-60.png\",\"datePublished\":\"2020-11-26T15:51:14+00:00\",\"dateModified\":\"2021-03-18T15:25:38+00:00\",\"author\":{\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/b426de7cb01c2be795d117ac34ed15f7\"},\"description\":\"ETL orchestration on Amazon Web Services: how to do it with AWS Step Functions.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#primaryimage\",\"url\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/copertine-blog_60-60.png\",\"contentUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/copertine-blog_60-60.png\",\"width\":1667,\"height\":1251,\"caption\":\"ETL orchestration on aws\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.besharp.it\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"ETL Orchestration on AWS with AWS Step Functions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.besharp.it\/#website\",\"url\":\"https:\/\/blog.besharp.it\/\",\"name\":\"Proud2beCloud Blog\",\"description\":\"il blog di beSharp\",\"alternateName\":\"Proud2beCloud Blog\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.besharp.it\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/b426de7cb01c2be795d117ac34ed15f7\",\"name\":\"Christian Calabrese\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/fd7c6025dc1878a6b0d719e7095fc761?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/fd7c6025dc1878a6b0d719e7095fc761?s=96&d=mm&r=g\",\"caption\":\"Christian Calabrese\"},\"description\":\"DevOps Engineer and Cloud-native Applications Developer @ beSharp. HiFi enthusiast and hardened videogames player!\",\"url\":\"https:\/\/blog.besharp.it\/author\/christian-calabrese\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"ETL Orchestration on AWS with AWS Step Functions - Proud2beCloud Blog","description":"ETL orchestration on Amazon Web Services: how to do it with AWS Step Functions.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/","og_locale":"en_US","og_type":"article","og_title":"ETL Orchestration on AWS with AWS Step Functions","og_description":"ETL orchestration on Amazon Web Services: how to do it with AWS Step Functions.","og_url":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/","og_site_name":"Proud2beCloud Blog","article_published_time":"2020-11-26T15:51:14+00:00","article_modified_time":"2021-03-18T15:25:38+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/facebook-link-image-1.png","type":"image\/png"}],"author":"Christian Calabrese","twitter_card":"summary_large_image","twitter_title":"ETL Orchestration on AWS with AWS Step Functions","twitter_description":"ETL orchestration on Amazon Web Services: how to do it with AWS Step Functions.","twitter_image":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/twitter-shared-link-1.png","twitter_misc":{"Written by":"Christian Calabrese","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/","url":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/","name":"ETL Orchestration on AWS with AWS Step Functions - Proud2beCloud Blog","isPartOf":{"@id":"https:\/\/blog.besharp.it\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#primaryimage"},"image":{"@id":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/copertine-blog_60-60.png","datePublished":"2020-11-26T15:51:14+00:00","dateModified":"2021-03-18T15:25:38+00:00","author":{"@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/b426de7cb01c2be795d117ac34ed15f7"},"description":"ETL orchestration on Amazon Web Services: how to do it with AWS Step Functions.","breadcrumb":{"@id":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#primaryimage","url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/copertine-blog_60-60.png","contentUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/copertine-blog_60-60.png","width":1667,"height":1251,"caption":"ETL orchestration on aws"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.besharp.it\/etl-orchestration-on-aws-with-aws-step-functions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.besharp.it\/"},{"@type":"ListItem","position":2,"name":"ETL Orchestration on AWS with AWS Step Functions"}]},{"@type":"WebSite","@id":"https:\/\/blog.besharp.it\/#website","url":"https:\/\/blog.besharp.it\/","name":"Proud2beCloud Blog","description":"il blog di beSharp","alternateName":"Proud2beCloud Blog","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.besharp.it\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/b426de7cb01c2be795d117ac34ed15f7","name":"Christian Calabrese","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/fd7c6025dc1878a6b0d719e7095fc761?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fd7c6025dc1878a6b0d719e7095fc761?s=96&d=mm&r=g","caption":"Christian Calabrese"},"description":"DevOps Engineer and Cloud-native Applications Developer @ beSharp. HiFi enthusiast and hardened videogames player!","url":"https:\/\/blog.besharp.it\/author\/christian-calabrese\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/1980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/comments?post=1980"}],"version-history":[{"count":0,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/1980\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media\/1992"}],"wp:attachment":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media?parent=1980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/categories?post=1980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/tags?post=1980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}