{"id":2001,"date":"2020-12-11T14:00:40","date_gmt":"2020-12-11T13:00:40","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=2001"},"modified":"2023-03-27T17:20:32","modified_gmt":"2023-03-27T15:20:32","slug":"using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/","title":{"rendered":"Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena"},"content":{"rendered":"\n<p>With the Internet of Things quickly becoming a thing of the present (rather of the future&#8230;) the number of devices sending collected on the field is increasing exponentially and so does the amount of data, thus data ingestion and analysis has become of the hottest topics of the current IT landscape. AWS offers a wide range of services which allow us to ingest, collect, store, analyze and visualize huge amounts of data quickly and efficiently.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"205\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/unnamed-87-1024x205.png\" alt=\"data analytics overview\" class=\"wp-image-1948\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/unnamed-87-1024x205.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/unnamed-87-400x80.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/unnamed-87-768x154.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/unnamed-87-1536x308.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/11\/unnamed-87.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>In this brief article, we would like to present a very simple real-world application we developed as a proof of concept demonstration to show the data ingestion and analysis pipeline in AWS and IoT events and conferences. <\/p>\n\n\n\n<p>We customized an existing Nespresso coffee machine to take photos of people making coffees using custom electronics, a Raspberry Pi Zero, and a micro camera. The image is immediately uploaded to Amazon S3 and an AWS Lambda triggered by the upload analyzes the image using Amazon Rekognition. After the analysis of the image is complete, if the image contains the face of a person, a record is written by the Lambda function in an Amazon Aurora MySQL serverless together with metadata output from the Amazon Rekognition ML algorithm: does the person have eyeglasses? beard? mustaches? is she\/he smiling? Finally, a very simple web application was developed and connected to the database to show statistics.<\/p>\n\n\n\n<p>Furthermore, an Amazon Athena query cleans the data and moves them to a new Amazon S3 bucket as parquet files.<\/p>\n\n\n\n<p>Obviously, for our trivial application, many of these steps are redundant but they aim to demonstrate the power of AWS building blocks in creating very complex data pipelines.<\/p>\n\n\n\n<p>A scheme of the proposed infrastructure is shown below.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1600\" height=\"900\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-88.png\" alt=\"data ingestion and analysis pipeline in AWS\" class=\"wp-image-2002\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-88.png 1600w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-88-400x225.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-88-1024x576.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-88-768x432.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-88-1536x864.png 1536w\" sizes=\"auto, (max-width: 1600px) 100vw, 1600px\" \/><\/figure><\/div>\n\n\n<p>Hereafter we describe all the steps of a common data ingestion and transformation and how we are doing them in our trivial application. Let\u2019s dive deep!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Ingestion\/Storage Step<\/h2>\n\n\n\n<p>A very common AWS data ingestion flow is to use AWS IoT Core (Secure MQTT) or Api Gateway (REST APIs or Websocket) as the data entry point, directly connect the data entry point to Amazon Kinesis Firehose (using IoT rules or Api gateway Service integrations) and finally leverage the powerful Firehose features for data buffering, buffer transformation (AWS Lambda functions), stream encryption (AWS KMS), data compression (GZIP) and data delivery of compressed and automatically encrypted message batches to both long term object storage (Amazon S3) and\/or to a data warehouse (AWS Redshift) for complex analytical queries on the huge amount of data collected.\u00a0<\/p>\n\n\n\n<p>Always having all the ingested data saved in Amazon S3 is an essential step, not only as a lifesaver in case of problems with other hotter storages but also to create a shared data lake which can be later analysed with Amazon Athena EMR, AWS Glue Jobs, AWS Glue databrew and also external tools.<\/p>\n\n\n\n<p>Furthermore, you can use Firehose to directly deliver data to AWS ElasticSearch for real-time analysis and if needed it is also very easy to deliver the batches of ingested data to a relational database (e.g. Amazon Aurora Serverless Postgres\/MySQL) using either AWS Data Migration tasks or event-based Lambda functions. Migrating inserted data (or aggregation of inserted data) to an existing relational database is often quite useful if you need them to enrich an existing legacy application already using the database.<\/p>\n\n\n\n<p>If you decide to use Lambda functions to move the ingested data to Amazon Aurora,\u00a0 which is usually faster and more scalable, you can either use the Firehose transform functions directly or a different function triggered each time Firehose writes an object to Amazon S3.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-89-1024x576.png\" alt=\"The Ingestion\/Storage Step\" class=\"wp-image-2004\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-89-1024x576.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-89-400x225.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-89-768x432.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-89-1536x864.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-89.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>The beauty of Firehose is that you can also add it as a subsequent step! In our simple coffee application, we are not using it, and images and analyses are saved directly in Amazon S3 and Amazon Aurora Serverless MySQL by Lambda Functions. Anyway, if the app grows bigger we can integrate it flawlessly!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Analysis Step<\/h2>\n\n\n\n<p>Once your data is in storage, it is time to analyse them. Methodologies can differ greatly and common examples range from simple queries run in relational databases to long and complex analytical jobs run in Redshift data warehouses and to near real-time processing using Amazon Kinesis connected EMR or ElasticSearch.<\/p>\n\n\n\n<p>In our case, we can just run simple queries using our web application backend and display the results in the browser.<\/p>\n\n\n\n<p>However, in the future, we may be interested in running much more advanced queries on our data and maybe doing some data quality inspection and machine learning training. So we need to have these data out of Amazon Aurora and into Amazon S3 in order to analyze them with AWS Glue jobs and Databrew and if needed to load them easily with Apache Spark either from AWS Glue or AWS EMR. To do this, we can follow several paths: for example, we could use AWS DataMigration service to move the data to Amazon S3 as Parquet files or maybe we could create a AWS Glue Job, load the data using AWS Glue Connection to RDS and Spark and then write them into Amazon S3.\u00a0<\/p>\n\n\n\n<p>After this would need to run an AWS Glue crawler in order to create DataCatalog that will be used by Amazon Athena and AWS Glue for queries and jobs.<\/p>\n\n\n\n<p>Here however we will show you a different and sometimes <strong>much more <\/strong>flexible path to export, clean, and catalog our data from a relational database:<strong> Amazon Athena custom data source<\/strong>.<\/p>\n\n\n\n<p>By default, Amazon Athena comes with Amazon S3 &#8211; AWS Glue data Catalog integrations but AWS recently added the possibility to add customized data sources such as JDBC connected databases, AWS CloudWatch or to query Amazon S3 but using a custom Apache Hive metastore. In our case we are interested in connecting to MySQL Amazon Aurora Serverless so we need to go to Amazon Athena Home, configure a workgroup named AmazonAthenaPreviewFunctionality and then add an Amazon S3 query output path:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"236\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-90-1024x236.png\" alt=\"configure a workgroup in Amazon Athena\" class=\"wp-image-2006\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-90-1024x236.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-90-400x92.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-90-768x177.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-90-1536x354.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-90.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>After this, we can go back to Amazon Athena home and select Connect Data Source:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"828\" height=\"306\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-91.png\" alt=\"Connect Data Source in Amazon Athena\" class=\"wp-image-2008\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-91.png 828w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-91-400x148.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-91-768x284.png 768w\" sizes=\"auto, (max-width: 828px) 100vw, 828px\" \/><\/figure><\/div>\n\n\n<p>We are presented with a web page where we need to select the type of data source: we go for Query a data source (beta) ad MySQL:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"542\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-92-1024x542.png\" alt=\"Query a data source (beta)\" class=\"wp-image-2010\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-92-1024x542.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-92-400x212.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-92-768x407.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-92-1536x813.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-92.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>After that, you are requested to enter the name and description of the new catalog and to select or create a Lambda Function to manage the connection. Choose the name you like the most and click Configure new AWS Lambda Function.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"456\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-93-1024x456.png\" alt=\"Configure new AWS Lambda Function\" class=\"wp-image-2012\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-93-1024x456.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-93-400x178.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-93-768x342.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-93-1536x684.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-93.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>You are presented with this page where you need to enter the JDBC connection URI for Amazon Aurora and select the subnet and security group for the Lambda function that Amazon Athena will use to establish the JDBC connection. <strong>Choose them wisely<\/strong> otherwise the Lambda won\u2019t reach the Amazon Aurora instance!<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1014\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-94-1024x1014.png\" alt=\"JDBC connection\" class=\"wp-image-2014\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-94-1024x1014.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-94-303x300.png 303w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-94-768x761.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-94-1536x1522.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-94.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>Secret Name prefix is used to store the DB creds in AWS Secret Manager. This is essential for a production environment. Leaving it blank means no integration will be created. After you select deploy and the Lambda you just created in the Amazon Athena dashboard you\u2019ll see a new catalog different from the standard AwsGlueCatalog:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"852\" height=\"194\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-95.png\" alt=\"\" class=\"wp-image-2016\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-95.png 852w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-95-400x91.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-95-768x175.png 768w\" sizes=\"auto, (max-width: 852px) 100vw, 852px\" \/><\/figure><\/div>\n\n\n<p>Note that at first, Databases and tables won\u2019t appear. Fear not: If you go to the lambda functions you\u2019ll see failures and in Amazon Cloudwatch you\u2019ll see an error like:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Catalog is not supported in multiplexer. After registering the catalog in Athena, must set 'iotarticolo_connection_string' environment variable in Lambda. See JDBC connector README for further details.: java.lang.RuntimeException<\/pre>\n\n\n\n<p>Go on and set the required Lambda function env variable by using the same JDBC connection string used as DefaultConnection string in the preceding step. After this, the connection will work and you\u2019ll be able to query your DB directly from Amazon Athena! Sweet!<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"365\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-96-1024x365.png\" alt=\"DB query from Athena\" class=\"wp-image-2018\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-96-1024x365.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-96-400x143.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-96-768x274.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-96-1536x548.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-96.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>However, at a closer look we immediately notice that something is afoul with the data: here is a screen of what we can read directly from MySQL:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"142\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-97-1024x142.png\" alt=\"Athena could not fetch mysql datetime columns\" class=\"wp-image-2020\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-97-1024x142.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-97-400x56.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-97-768x107.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-97-1536x213.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-97.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>As you can see, Amazon Athena was smart enough to convert tinyint(1) data to bool but could not fetch mysql datetime columns. This is due to a very well-known problem with jdbc connector and the easier fix is to just create a new field where the datetime is a string in Java datetime format:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">UPDATE coffees SET coffees.coffee_hour_str=DATE_FORMAT(coffee_hour, '%Y-%m-%d %H:%i:%s');\nALTER TABLE coffees ADD COLUMN coffee_hour_str VARCHAR(255) AFTER coffee_hour;<\/pre>\n\n\n\n<p>At this point, Amazon Athena will be able to read the new field.<br><strong>And now we are ready for a beautiful trick:<\/strong> let\u2019s just go to the AWS Glue dashboard and create a new Database. A database is just a logical container for metadate. You can choose the name you prefer:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"555\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-98-1024x555.png\" alt=\"create a new Database\" class=\"wp-image-2022\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-98-1024x555.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-98-400x217.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-98-768x416.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-98-1536x832.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-98.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>At this point we can go back to Amazon Athena and run a query like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE table iotarticologlue.coffees<br>WITH (<br>&nbsp;&nbsp;format='PARQUET', external_location='s3:\/\/besharp-athena\/coffees_parquet', parquet_compression='GZIP'<br>) AS SELECT photo_url,smile,beard,mustache,glasses,coffee_hour_str FROM \"iotarticolo\".\"iot\".\"coffees\"&nbsp;<br>WHERE photo_url LIKE 'https:\/\/%';<\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"166\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-99-1024x166.png\" alt=\"run a query Amazon Athena\" class=\"wp-image-2024\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-99-1024x166.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-99-400x65.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-99-768x125.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-99-1536x250.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-99.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>This will create a new Table in the Database we just added to our AWS Glue data catalog and save all the data in Amazon S3 as a GZIP Parquet file. Furthermore, you could also change the compression (e.g. Snappy or BZIP) if you like.<\/p>\n\n\n\n<p>The query will also filter out the date with a bad Amazon S3 url in photo_url!<\/p>\n\n\n\n<p>So we now have a super-fast way to export our DB to Amazon S3 as parquet while automatically creating the AWS Glue catalog (the query does also that for free under the hood).<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-100-1024x576.png\" alt=\"a super fast way to export our DB to S3 as parquet while automatically creating the Glue catalog\" class=\"wp-image-2026\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-100-1024x576.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-100-400x225.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-100-768x432.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-100-1536x864.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-100.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>And now it is trivial to visualize this new catalog in AWS Glue databrew: go to dashboard and create a new project:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"182\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175425.819-1024x182.png\" alt=\"AWS Glue databrew create a new project\" class=\"wp-image-2028\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175425.819-1024x182.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175425.819-400x71.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175425.819-768x137.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175425.819-1536x274.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175425.819.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>now create a new dataset in the add dataset section:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"629\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175558.801-1024x629.png\" alt=\"AWS Glue databrew: Connect to new data set\" class=\"wp-image-2030\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175558.801-1024x629.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175558.801-400x246.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175558.801-768x472.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175558.801-1536x944.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175558.801.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<p>And create the project! <\/p>\n\n\n\n<p>If you encounter an error try to set the object name to parquet in Amazon S3 and crawl again the table with AWS Glue crawlers (Databrew is pretty new too!)<\/p>\n\n\n\n<p>And voil\u00e0 a beautiful data visualization of our dataset complete with column statistics!<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"470\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175732.127-1024x470.png\" alt=\"data visualization of our dataset complete with column statistics\" class=\"wp-image-2032\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175732.127-1024x470.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175732.127-400x184.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175732.127-768x353.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175732.127-1536x706.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/unnamed-2020-12-11T175732.127.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>In this article, we described a very simple <strong>IoT application<\/strong> using <strong>Amazon Rekognition<\/strong> and <strong>Amazon Aurora<\/strong>. We explained how it can be enhanced with Amazon Kinesis firehose and finally, we used Amazon Athena to transform and clean the collected data and save them very easily to parquet to be analyzed with AWS Glue Databrew, Amazon Athena, and other AWS tools such as EMR.<\/p>\n\n\n\n<p>Have you ever tried something similar for your Data Analysis process?&nbsp;<\/p>\n\n\n\n<p>Feel free to write to us about your solutions: we\u2019ll be glad to offer you a \u201cconnected\u201d coffee \ud83d\ude00&nbsp;<\/p>\n\n\n\n<p>That\u2019s all for today.&nbsp;<\/p>\n\n\n\n<p>Keep reading and see you in 14 days on <strong>#Proud2beCloud!<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the Internet of Things quickly becoming a thing of the present (rather of the future&#8230;) the number of devices [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":2035,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[475],"tags":[415,262,427,252,396,423,411,419,421,425,413],"class_list":["post-2001","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics-en","tag-amazon-athena-en","tag-amazon-kinesis-data-firehose-en","tag-amazon-rekognition-en","tag-amazon-s3-en","tag-aurora-serverless-en","tag-aws-glue-databrew-en","tag-data-analytics-en","tag-data-ingestion-en","tag-data-visualization-en","tag-dataset-en","tag-internet-of-things-iot-en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena - Proud2beCloud Blog<\/title>\n<meta name=\"description\" content=\"Using AWS to ingest and analyze data from an IoT device: a simple data ingestion and analysis pipeline example with Aurora and Athena.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using AWS to ingest and analyze data from an IoT device\" \/>\n<meta property=\"og:description\" content=\"A simple data ingestion and analysis pipeline example with Aurora and Athena.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/\" \/>\n<meta property=\"og:site_name\" content=\"Proud2beCloud Blog\" \/>\n<meta property=\"article:published_time\" content=\"2020-12-11T13:00:40+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-03-27T15:20:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/facebook-link-image.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Matteo Moroni\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Using AWS to ingest and analyze data from an IoT device\" \/>\n<meta name=\"twitter:description\" content=\"A simple data ingestion and analysis pipeline example with Aurora and Athena.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/twitter-shared-link.png\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Matteo Moroni\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/\",\"url\":\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/\",\"name\":\"Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena - Proud2beCloud Blog\",\"isPartOf\":{\"@id\":\"https:\/\/blog.besharp.it\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/Copertina-nuova-blog_Tavola-disegno-1.png\",\"datePublished\":\"2020-12-11T13:00:40+00:00\",\"dateModified\":\"2023-03-27T15:20:32+00:00\",\"author\":{\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc\"},\"description\":\"Using AWS to ingest and analyze data from an IoT device: a simple data ingestion and analysis pipeline example with Aurora and Athena.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#primaryimage\",\"url\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/Copertina-nuova-blog_Tavola-disegno-1.png\",\"contentUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/Copertina-nuova-blog_Tavola-disegno-1.png\",\"width\":1667,\"height\":1250,\"caption\":\"Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.besharp.it\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.besharp.it\/#website\",\"url\":\"https:\/\/blog.besharp.it\/\",\"name\":\"Proud2beCloud Blog\",\"description\":\"il blog di beSharp\",\"alternateName\":\"Proud2beCloud Blog\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.besharp.it\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc\",\"name\":\"Matteo Moroni\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g\",\"caption\":\"Matteo Moroni\"},\"description\":\"DevOps e Solution Architect di beSharp, mi occupo di sviluppare soluzioni Saas, Data Analysis, HPC e di progettare architetture non convenzionali a complessit\u00e0 divergente. Appassionato di informatica e fisica, da sempre lavoro nella prima e ho un PhD nella seconda. Parlare di tutto ci\u00f2 che \u00e8 tecnico e nerd mi rende felice!\",\"url\":\"https:\/\/blog.besharp.it\/author\/matteo-moroni\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena - Proud2beCloud Blog","description":"Using AWS to ingest and analyze data from an IoT device: a simple data ingestion and analysis pipeline example with Aurora and Athena.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/","og_locale":"en_US","og_type":"article","og_title":"Using AWS to ingest and analyze data from an IoT device","og_description":"A simple data ingestion and analysis pipeline example with Aurora and Athena.","og_url":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/","og_site_name":"Proud2beCloud Blog","article_published_time":"2020-12-11T13:00:40+00:00","article_modified_time":"2023-03-27T15:20:32+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/facebook-link-image.png","type":"image\/png"}],"author":"Matteo Moroni","twitter_card":"summary_large_image","twitter_title":"Using AWS to ingest and analyze data from an IoT device","twitter_description":"A simple data ingestion and analysis pipeline example with Aurora and Athena.","twitter_image":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/twitter-shared-link.png","twitter_misc":{"Written by":"Matteo Moroni","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/","url":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/","name":"Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena - Proud2beCloud Blog","isPartOf":{"@id":"https:\/\/blog.besharp.it\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#primaryimage"},"image":{"@id":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/Copertina-nuova-blog_Tavola-disegno-1.png","datePublished":"2020-12-11T13:00:40+00:00","dateModified":"2023-03-27T15:20:32+00:00","author":{"@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc"},"description":"Using AWS to ingest and analyze data from an IoT device: a simple data ingestion and analysis pipeline example with Aurora and Athena.","breadcrumb":{"@id":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#primaryimage","url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/Copertina-nuova-blog_Tavola-disegno-1.png","contentUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2020\/12\/Copertina-nuova-blog_Tavola-disegno-1.png","width":1667,"height":1250,"caption":"Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.besharp.it\/using-aws-to-ingest-and-analyze-data-from-an-iot-device-a-simple-example-with-aurora-and-athena\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.besharp.it\/"},{"@type":"ListItem","position":2,"name":"Using AWS to ingest and analyze data from an IoT device: a simple example with Aurora and Athena"}]},{"@type":"WebSite","@id":"https:\/\/blog.besharp.it\/#website","url":"https:\/\/blog.besharp.it\/","name":"Proud2beCloud Blog","description":"il blog di beSharp","alternateName":"Proud2beCloud Blog","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.besharp.it\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc","name":"Matteo Moroni","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g","caption":"Matteo Moroni"},"description":"DevOps e Solution Architect di beSharp, mi occupo di sviluppare soluzioni Saas, Data Analysis, HPC e di progettare architetture non convenzionali a complessit\u00e0 divergente. Appassionato di informatica e fisica, da sempre lavoro nella prima e ho un PhD nella seconda. Parlare di tutto ci\u00f2 che \u00e8 tecnico e nerd mi rende felice!","url":"https:\/\/blog.besharp.it\/author\/matteo-moroni\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/2001","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/comments?post=2001"}],"version-history":[{"count":0,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/2001\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media\/2035"}],"wp:attachment":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media?parent=2001"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/categories?post=2001"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/tags?post=2001"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}