{"id":5242,"date":"2022-12-23T09:30:00","date_gmt":"2022-12-23T08:30:00","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=5242"},"modified":"2022-12-22T17:52:30","modified_gmt":"2022-12-22T16:52:30","slug":"serverless-etl-on-aws-sns-to-kinesis-direct-integration","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/serverless-etl-on-aws-sns-to-kinesis-direct-integration\/","title":{"rendered":"Serverless ETL on AWS: SNS to Kinesis direct integration"},"content":{"rendered":"\n
Today more than ever, companies understand the actual potential value of data. Therefore, ETL solutions are increasingly common and varied<\/strong>.<\/p>\n\n\n\n
In this article, we will talk about an ETL pipeline that uses a not-so-widely adopted integration in this kind of use case to adhere to business needs.<\/p>\n\n\n\n
Extraction, Transformation, and Loading (ETL) is now a standard pattern for data pipelines. The extraction step gathers data from various sources: having a properly organized Data Lake here is the key to accomplishing this step seamlessly. Once you have data, you can apply every kind of transformation needed to extract the best value from the data. These processing steps, summarized in what is called the \u201ctransformation step\u201d, are very specific to every use case. Their output is finally stored to save it for later: this is the \u201cloading step\u201d.<\/p>\n\n\n\n
Outputs can now be queried and visualized to have insights that can help and guide decisions. The benefit of this structured process is the ease of modifying or implementing an additional transformation. The time needed to do it drastically reduces, giving an enormous advantage in answering arising business questions that can be key drivers for management decisions.<\/p>\n\n\n\n
Every step can be accomplished in several ways, in this article we will show you one of the possible ways to do it, with some advice to do it in the best way. <\/p>\n\n\n\n
Before starting up with the description of the technical solution, let\u2019s contextualize this data flow. The idea here is a service where people subscribe and send a continuous stream of data while using the application, which has to be stored and processed near-real time. The correct data ingestion without data loss is key for this data flow. Components must be able to scale efficiently.<\/p>\n\n\n\n
The technical solution: SNS – Kinesis Firehose and SQS integration<\/h2>\n\n\n\n
Now that we have the full picture, we can start describing the building blocks of the technical solution. We’ll explain their features, how to connect them and how they interact with each other.<\/p>\n\n\n\n
SNS service can send messages via several protocols, from classic emails and messages, and endpoint requests – like HTTP\/HTTPS – to direct integrations with AWS services, like SQS, Lambda functions, and Kinesis Data Firehose. For these reasons, SNS topics are a fantastic way to decouple pieces of your infrastructure<\/strong>. <\/p>\n\n\n\n
The SNS topic will send data to multiple subscribers: a Kinesis Delivery Stream<\/strong> and an SQS queue<\/strong>. We can also use an email subscription to get notifications for a subset of the input using SNS filters.<\/p>\n\n\n\n
The idea is to process input data and, at the same time, store it with a different part of our infrastructure so that we can both visualize it and, if we need it, re-process it later with other transformations or even with an updated version of our data flow. Hence, the SNS topic will interact with both the extraction and transformation components of the ETL flow: the Delivery Stream will store input data that will become part of our Data Lake, meanwhile, the SQS queue will direct the same data to the selected processor that will do the transformation.<\/p>\n\n\n\n