{"id":2627,"date":"2021-02-19T11:05:48","date_gmt":"2021-02-19T10:05:48","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=2627"},"modified":"2023-02-22T17:05:12","modified_gmt":"2023-02-22T16:05:12","slug":"orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/","title":{"rendered":"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions"},"content":{"rendered":"\n<p>ETL pipelines on AWS usually have a linear behavior: starting from one service and ending to another one. This time though, we would like to present a more flexible setup, in which some ETL jobs could be skipped depending on data. Furthermore, some of the transformed data in our data lake need to be queried by AWS Athena in order to generate BI dashboards in QuickSight while other data partitions are used to train ad-hoc anomaly detection via Sagemaker.<\/p>\n\n\n\n<p>A powerful tool to orchestrate this type of ETL pipelines is the AWS StepFunctions service.<\/p>\n\n\n\n<p>In this article, we want to show you some of the steps involved in the creation of the pipeline as well as how many AWS services for data analytics can be used in near real-time scenarios to manage a high volume of data in a scalable way.<\/p>\n\n\n\n<p>In particular, we\u2019ll investigate AWS Glue connectors and Crawlers, AWS Athena, QuickSight, Kinesis Data Firehose, and finally a brief explanation on how to use of SageMaker to create forecasts starting from the collected data. To learn more about Sagemaker you can also take a look at our other <a href=\"https:\/\/blog.besharp.it\/category\/ai-ml-en\/\" target=\"_blank\" rel=\"noreferrer noopener\">articles<\/a>.<\/p>\n\n\n\n<p>Let\u2019s start!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Our setup<\/h2>\n\n\n\n<p>In this example, we\u2019ll set up several temperature sensors to send temperature and diagnostic data to our pipeline and we\u2019ll perform different BI analyses to verify efficiency, and we\u2019ll use a Sagemaker model to check for anomalies.<\/p>\n\n\n\n<p>To keep things interesting we also want to grab historical data from two different locations: an S3 bucket and a Database residing on an EC2 instance in a different VPC from one of our ETL pipelines.<\/p>\n\n\n\n<p>We will use different ETL jobs to recover and extract cleaned data from row data and AWS Step Functions to orchestrate all the crawlers and jobs.<\/p>\n\n\n\n<p>Kinesis Data Firehose will continuously fetch sensors\u2019 data and with AWS Athena we will query information out of both aggregated and per-sensor data to show Graphical stats in Amazon Quicksight.<\/p>\n\n\n\n<p>Here is a simple schema illustrating the services involved and the complete flow.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"658\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image8-1-1024x658.png\" alt=\"infrastructure diagram\" class=\"wp-image-2652\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image8-1-1024x658.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image8-1-400x257.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image8-1-768x494.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image8-1.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>infrastructure diagram<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Kinesis Data Firehose <\/h2>\n\n\n\n<p>Kinesis Data Firehose can be used to obtain near real-time data from sensors leveraging IoT Core SDK to connect to the actual devices. As seen in <a href=\"https:\/\/blog.besharp.it\/iot-ingestion-and-ml-analytics-pipeline-with-aws-iot-kinesis-and-sagemaker\/\" target=\"_blank\" rel=\"noreferrer noopener\">this article<\/a>, we can create a \u201cThing\u201d, thus generating a <strong>topic<\/strong>. By connecting to that <strong>topic<\/strong>, several devices can collect their metrics through Firehose by sending messages using the <a href=\"https:\/\/mqtt.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">MQTT protocol<\/a>, and, should you need it, IoT Core can also manage device <strong>authentication<\/strong>.<\/p>\n\n\n\n<p>To start sending sensors\u2019 data, we need to download the connection kit from the <a href=\"https:\/\/eu-west-1.console.aws.amazon.com\/iot\/home?region=eu-west-1#\/connectdevice\/\" target=\"_blank\" rel=\"noreferrer noopener\">AWS IoT<\/a> page following in-page instructions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"392\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image4-1-1024x392.png\" alt=\"Select OS and Language for downloading the connection kit\" class=\"wp-image-2644\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image4-1-1024x392.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image4-1-400x153.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image4-1-768x294.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image4-1-1536x588.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image4-1.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Select OS and Language for downloading the connection kit<\/em><\/figcaption><\/figure>\n\n\n\n<p>Once downloaded, initialize a new Node.js project and install <strong>AWS-IoT-device-SDK<\/strong>. After that, it is possible to run the included <strong>start.sh<\/strong> script, making sure all the certificates, downloaded alongside the kit, are in the same directory. We can now create a local script to send data to a topic, passing the required modules and using<strong>device.publish (&#8220;&lt;topic&gt;&#8221;, payload)<\/strong>:<\/p>\n\n\n\n<pre>const deviceModule = require('aws-iot-device-sdk').device;\nconst cmdLineProcess = require('aws-iot-device-sdk\/examples\/lib\/cmdline');\n\u2026\ndevice.publish('topic', JSON.stringify(payload));\n<\/pre>\n\n\n\n<p>The data sent is structured in JSON format with the following structure:<\/p>\n\n\n\n<pre>{\t\n   \u201ctimestamp\u201d: \u201cYYYY-MM-DD HH:MM:SS\u201d,\n   \u201croom_id\u201d: \u201cXXXX\u201d,\n   \u201ctemperature\u201d: 99\n}\n<\/pre>\n\n\n\n<p>To create a Firehose delivery stream go to the <strong>Kinesis firehose<\/strong> service dashboard in the AWS web console, click \u201cCreate delivery stream\u201d, select a name, and then \u201cDirect PUT or other sources\u201d like in figure:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"823\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image7-1-1024x823.png\" alt=\"Delivery stream\" class=\"wp-image-2650\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image7-1-1024x823.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image7-1-373x300.png 373w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image7-1-768x618.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image7-1-1536x1235.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image7-1.png 1716w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Creating a new Firehose delivery stream<\/em><\/figcaption><\/figure>\n\n\n\n<p>Leave \u201cData transformation\u201d and \u201cRecord format conversion\u201d as default. Choose an S3 destination as the target. Remember to also define an <strong>IoT Rule<\/strong> to send IoT messages to a Firehose delivery stream.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Glue crawlers and connectors<\/h2>\n\n\n\n<p>AWS Glue can be used to Extract and Transform data from a multitude of different data sources, thanks to the possibility of defining different types of connectors.<\/p>\n\n\n\n<p><strong>Database on EC2 instance<\/strong><\/p>\n\n\n\n<p>We want to be able to generate a Glue Data Catalog from a Microsoft SQL Server DB residing on an EC2 Instance in another VPC. To do so we need to create a JDBC connection, which can be done easily by going to the AWS Glue service page and by adding a new connection, found under the \u201cData Catalog &#8211; Databases\u201d section of the sidebar menu.<\/p>\n\n\n\n<p>Just add a name to the connection (which will be used by the related Crawler Job), the JDBC URL, following the right convention for ORACLE DBs, username and password, and the required VPC and subnet.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"956\" height=\"1024\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image2-1-956x1024.png\" alt=\"JDBC connection parameters\" class=\"wp-image-2640\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image2-1-956x1024.png 956w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image2-1-280x300.png 280w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image2-1-768x822.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image2-1.png 1300w\" sizes=\"auto, (max-width: 956px) 100vw, 956px\" \/><figcaption class=\"wp-element-caption\"><em>JDBC connection parameters<\/em><\/figcaption><\/figure>\n\n\n\n<p>In order to establish a glue connection to the database, we need to create a new dedicated VPC that will be only used by Glue. The VPC is connected to the one containing the data-warehouse using <a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/peering\/what-is-vpc-peering.html\" target=\"_blank\" rel=\"noreferrer noopener\">VPC peering<\/a> but other options are also possible, for example, we could have used AWS Transit Gateway. Once the peering is established remember to add routes to both the Glue and the DB subnets so that they can exchange traffic and to open the DB security group to allow incoming traffic on the relevant port from the Glue security group in the new VPC.<\/p>\n\n\n\n<p><strong>Data on S3<\/strong><\/p>\n\n\n\n<p>Data on S3 doesn\u2019t need a connector and can be set up directly from the AWS Glue console. Create a new crawler, selecting \u201cdata stores\u201d f<strong>or the crawler source type<\/strong>; then check also \u201cCrawl all folder\u201d. After that is just a matter of setting the S3 bucket, the right IAM role and creating a new Glue Schema for this crawler. Also set \u201cRun on demand\u201d.<\/p>\n\n\n\n<p><strong>Glue Jobs<\/strong><\/p>\n\n\n\n<p>Glue jobs are the steps of the ETL pipeline. They allow extracting, transforming, and saving data back to a datalake. In our example, we would like to show two different approaches: jobs <strong>managed by AWS Glue Studio<\/strong> and using <strong>custom code<\/strong>. Both jobs will be later called by AWS Step Function.<\/p>\n\n\n\n<p>For historical data on S3, we can define Jobs from Glue Studio. For S3 select the following options in order:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>On the <strong>Manage Jobs<\/strong> page, choose the source and target added to the graph option. Then, choose S3 for the Source and S3 for the Target.<\/li>\n\n\n\n<li>Click on the S3 Data source, select the source bucket.<\/li>\n\n\n\n<li>On the Node Properties tab, enter a name. Choose the Data source properties \u2013 S3 tab in the node details panel. Select your schema from the list of available databases in the Glue Data Catalog. Choose the correct table from the Catalog.<\/li>\n\n\n\n<li>Verify the mapping is correct.<\/li>\n\n\n\n<li>On the Node S3 Data target, select the output bucket, CSV as format (parquet is better, but we need CSV for Random Cut forest), no compression.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"600\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image1-1-1024x600.png\" alt=\"Target node properties\" class=\"wp-image-2638\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image1-1-1024x600.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image1-1-400x234.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image1-1-768x450.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image1-1.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Target node properties<\/em><\/figcaption><\/figure>\n\n\n\n<p>In order to extract data from our EC2 instance instead, we need a custom job. To create it we need to write a script ourselves, don\u2019t worry, it\u2019s fairly easy! Here are the key points you need to know to create a Spark Job with Glue: the ETL process is composed of 6 distinct areas in the script:<\/p>\n\n\n\n<p><strong>Import libraries<\/strong><\/p>\n\n\n\n<p>Basic set needed for a script:<\/p>\n\n\n\n<pre>import sys\nfrom awsglue.transforms import *\nfrom awsglue.utils import getResolvedOptions\nfrom pyspark.context import SparkContext\nfrom awsglue.context import GlueContext\nfrom awsglue.job import Job\nfrom awsglue.dynamicframe import DynamicFrame\n<\/pre>\n\n\n\n<p><strong>Prepare connectors and other variables<\/strong><\/p>\n\n\n\n<p>To be used inside the script:<\/p>\n\n\n\n<pre>args = getResolvedOptions(sys.argv, ['JOB_NAME'])\nsc = SparkContext()\nglueContext = GlueContext(sc)\nspark = glueContext.spark_session\njob = Job(glueContext)\njob.init(args['JOB_NAME'], args)\n<\/pre>\n\n\n\n<p><strong>Get Dynamic Frames out of a Glue Catalog obtained by a Crawler<\/strong><\/p>\n\n\n\n<p>Use these dynamic frames to perform queries and transform data<\/p>\n\n\n\n<pre>rooms_temperatures_df = glueContext.create_dynamic_frame.from_catalog(database = \"raw_temperatures\", table_name = \"temperatures\", transformation_ctx = \"temperature_transforms\").toDF()\nrooms_temperatures_df.createOrReplaceTempView(\"TEMPERATURES\")\n<\/pre>\n\n\n\n<p>The last line enables modifying the dynamic frame.<\/p>\n\n\n\n<p><strong>Apply SQL operations<\/strong><\/p>\n\n\n\n<p>To extract distinct information<\/p>\n\n\n\n<pre>result = glueContext.sql(\"<query>\u201d)<\/query><\/pre>\n\n\n\n<p>In our case, we needed to generate 3 distinct results, one for each room using a simple <strong>WHERE room_id = &lt;value&gt;<\/strong><\/p>\n\n\n\n<p><strong>Apply mapping<\/strong><\/p>\n\n\n\n<p>To generate a conversion schema<\/p>\n\n\n\n<pre>dynamicFrameResult = DynamicFrame.fromDF(result, glueContext, \"Result\")\napplymapping = ApplyMapping.apply(frame = dynamicFrameResult, mappings = [(\"temp\", \"bigint\", \"temp\",\"bigint\"), (\"room_id\", \"string\", \"room_id\",\"string\"), (\"timestamp\", \"string\", \"timestamp\",\"string\")])\n<\/pre>\n\n\n\n<p><strong>Save back to S3<\/strong><\/p>\n\n\n\n<p>To manipulate data later on<\/p>\n\n\n\n<pre>to_be_written = glueContext.write_dynamic_frame.from_options(frame = applymapping, connection_type = \"s3\", connection_options = {\"path\": \"s3:\/\/<path>\", \"partitionKeys\": [\"timestamp\"]}, format = \"csv\", transformation_ctx = \"to_be_written\")\njob.commit()\n<\/path><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Step Function<\/h2>\n\n\n\n<p>Step function represents the core, the logic of our sample solution. Its main purpose is to manage all the ETL jobs, keep them synchronized, and manage errors. One advantage is that we can use Step Function to regulate the data being injected into the central S3 bucket which is where we save all cleaned data.<\/p>\n\n\n\n<p>To start, this is the step function schema we used for this example:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"676\" height=\"699\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image3-1.png\" alt=\"Our example pipeline\" class=\"wp-image-2643\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image3-1.png 676w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image3-1-290x300.png 290w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><figcaption class=\"wp-element-caption\"><em>Our example pipeline<\/em><\/figcaption><\/figure>\n\n\n\n<p>In our example there are a couple of things we would like to share about Step Function; firstly we have 2 main crawler loops: the first one, has branches and runs 2 crawlers (one for S3 standard, and one for the EC2 database which is the custom one); the second one takes all the data retrieved from both historical data sources and live one (from Kinesis Firehose), and extract per-room datasets in order to use them with Amazon SageMaker.<\/p>\n\n\n\n<p>As Crawlers are asynchronous we can\u2019t wait for them so we needed to create 2 waiting loops for both of the execution steps<\/p>\n\n\n\n<p>AWS Lambda is used to call AWS Glue APIs in order to start the jobs we have configured before.<\/p>\n\n\n\n<p>To give a hint here are some interesting parts described in the JSON file representing the state machine.<\/p>\n\n\n\n<pre>\"Type\": \"Parallel\",\n  \"Branches\": [\n        {\n          \"StartAt\": \"Import Raw from EXTERNAL_DB\",\n          \"States\": {\n            \"Import Raw from EXTERNAL_DB\": {\n              \"Type\": \"Task\",\n              \"Resource\": \"arn:aws:states:::glue:startJobRun.sync\",\n<\/pre>\n\n\n\n<p>In AWS Step Function, we can launch tasks in parallel (for us, the two historical data glue jobs) using \u201cType: Parallel\u201d and \u201cBranches\u201d. Also after the key \u201cBranches\u201d, it is possible to retrieve the parallel result.<\/p>\n\n\n\n<pre>\"ResultPath\": \"$.ParallelExecutionOutput\",\n\"Next\": \"Start LAKE_DATA Crawler\"\n<\/pre>\n\n\n\n<p>We can run a synchronous Glue job defined in the console by passing the job\u2019s name, and you can also enable the generation of a glue catalog during the process.<\/p>\n\n\n\n<pre>\"Parameters\": {\n                \"JobName\": \"EXTERNAL_DB_IMPORT_TO_RAW\",\n                \"Arguments\": {\n                  \"--enable-glue-datacatalog\": \"true\",\n<\/pre>\n\n\n\n<p>It is possible to catch exceptions directly in Step Function by moving to an error state using \u201cCatch\u201d:<\/p>\n\n\n\n<pre>\"Catch\": [\n        {\n          \"ErrorEquals\": [\n            \"States.TaskFailed\"\n          ],\n          \"Next\": \"Data Pipeline Failure\"\n        }\n],\n<\/pre>\n\n\n\n<p>Because we don\u2019t have a standard way to wait for the jobs to finish, we use the parallel jobs output and a StepFunctions wait cycle to check if the operation is done; for that, we use the \u201cWait\u201d key:<\/p>\n\n\n\n<pre>\"Wait for LAKE_DATA Crawler\": {\n      \"Type\": \"Wait\",\n      \"Seconds\": 5,\n      \"Next\": \"Check LAKE_DATA Crawler\"\n},\n<\/pre>\n\n\n\n<p>The rest of the flow is pretty much a repetition of these components.<\/p>\n\n\n\n<p>The interesting fact is that we can apply some starting conditions to alter the execution of the flow, like avoiding some jobs if not needed at the moment or even run another state machine from a precise step to take our example and modularize the most complicated parts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Athena and Quicksight<\/h2>\n\n\n\n<p>Athena can generate tables that can be queried using standard SQL language, not only: results of Athena queries can be imported into Amazon Quicksight to rapidly generate charts and reports, based on your data.<\/p>\n\n\n\n<p>In our workflow, it is possible to run Athena queries on the target S3 bucket which contains both global temperature data and sensor\u2019s specific ones. Let\u2019s review quickly how to do that:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>If you have already created a Glue Crawler, you\u2019ll have a Datasource and a table.<\/li>\n\n\n\n<li>Select the database and table in Athena\u2019s dashboard in the left sidebar (we used temperatures_db and temperatures from our crawlers).<\/li>\n\n\n\n<li>Create a simple query that can later be used by QuickSight to show a chart, for example, a simple \u201cSELECT * FROM temperatures\u201d. <\/li>\n<\/ol>\n\n\n\n<p>By doing these 3 steps Athena will show the result of the query as shown below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image5-1-1024x312.png\" alt=\"Athena sample query\" class=\"wp-image-2646\" width=\"580\" height=\"176\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image5-1-1024x312.png 1024w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image5-1-400x122.png 400w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image5-1-768x234.png 768w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image5-1-1536x468.png 1536w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image5-1.png 1999w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><figcaption class=\"wp-element-caption\"><em>Athena sample query<\/em><\/figcaption><\/figure>\n\n\n\n<p>A couple of tips when working with Athena:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid using \u201c-\u201d in the database\u2019s name, use \u201c_\u201d instead.<\/li>\n\n\n\n<li>When possible avoid crawlers to scan folders containing files with different schemas, only scan for files with the same schema (if you need to do partitions for example).<\/li>\n<\/ul>\n\n\n\n<p>Quicksight can read Athena\u2019s query and present charts and diagrams from them. It\u2019s very straightforward: just go to the Quicksight service\u2019s page and follow one of the many <a href=\"https:\/\/github.com\/mariojaspers\/QuicksightAthena01\" target=\"_blank\" rel=\"noreferrer noopener\">tutorials<\/a> about it, keeping in mind a few important things:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Quicksight is not <strong>directly included in your account\u2019s resources, you must enable a paid subscription<\/strong> (there is a 60 days trial though).<\/li>\n\n\n\n<li>To access Athena, <strong>Quicksight needs its role to be modified with full access to Athena.<\/strong><\/li>\n\n\n\n<li>An SSL valid certificate must be issued, for example using Amazon ACM.<\/li>\n<\/ul>\n\n\n\n<p>If you don\u2019t want or can\u2019t use Quicksight, you can always call Athena\u2019s API directly and build your own dashboard from data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sagemaker: Random Cut Forest anomaly detection<\/h2>\n\n\n\n<p>The machine learning algorithm we will explore in this article is called Random Cut Forest. The algorithm takes a bunch of random data points (Random), cuts them to the <strong>same number of points<\/strong>, and creates trees (Cut). Finally, it checks all the trees together (Forest) to verify if a particular data point is to be considered an anomaly.<\/p>\n\n\n\n<p>Generally speaking, a tree is an ordered way of storing numerical data, and to create it, we randomly subdivide the data points until it is possible to isolate the point we\u2019re testing to determine whether it\u2019s an anomaly. A new level of the tree is created whenever we subdivide the points.<\/p>\n\n\n\n<p>Sagemaker offers a built-in implementation of Random Cut forest which accepts data points in CSV format. We can retrieve them easily with:<\/p>\n\n\n\n<pre>data_location = f\u201ds3:\/\/{bucket}\/{key}\u201d\ndf=pd.read_csv(data_location,delimiter=\u2019,\u2019)\n<\/pre>\n\n\n\n<p>Data contains a <strong>timestamp<\/strong>, the <strong>temperature value<\/strong> in C\u00b0, and a<strong>room_id<\/strong>, which identifies a particular room where the sensor was installed. We have already used our Step Function to divide data coming from different rooms so we can directly pass the CSV to the Estimator.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image6-1.png\" alt=\"Sample data extract\" class=\"wp-image-2649\" width=\"579\" height=\"489\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image6-1.png 474w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/image6-1-356x300.png 356w\" sizes=\"auto, (max-width: 579px) 100vw, 579px\" \/><figcaption class=\"wp-element-caption\"><em>Sample data extract<\/em><\/figcaption><\/figure>\n\n\n\n<p>Sample data extract We referred to this <a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/randomcutforest.html\" target=\"_blank\" rel=\"noreferrer noopener\">article<\/a> to verify how data must be passed to the Estimator. According to the official documentation, we need to pass 3 main hyperparameters:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>num_samples_per_tree<\/strong> &#8211; the number of randomly sampled data points sent to each tree. <strong>1\/num_samples_per_tree<\/strong> should approximate the estimated ratio of <strong>anomalies\/points<\/strong> in the dataset.<\/li>\n\n\n\n<li><strong>num_trees<\/strong> &#8211; the number of trees to create in the forest. Each tree learns a separate model from different samples of data.<\/li>\n\n\n\n<li><strong>feature_dim<\/strong> &#8211; the dimension of each data point.<\/li>\n<\/ul>\n\n\n\n<p>The Estimator is defined in this way:<\/p>\n\n\n\n<pre>import sagemaker\nfrom sagemaker import RandomCutForest\n \nexecution_role = sagemaker.get_execution_role()\nsagemaker_session = sagemaker.Session()\nbucket = \u201c<your_bucket>\u201d\nprefix = \u201c<your_prefix>\u201d\n \nrcf = RandomCutForest(\n    role=execution_role,\n    instance_count=1,\n    instance_type=\"ml.m4.xlarge\",\n    data_location=f\"s3:\/\/{bucket}\/{prefix}\",\n    output_path=f\"s3:\/\/{bucket}\/{prefix}\/output\",\n    num_samples_per_tree=512,\n    num_trees=50,\n)\nrcf.fit(rcf.record_set(df.value.to_numpy().reshape(-1, 1)))\n<\/your_prefix><\/your_bucket><\/pre>\n\n\n\n<p>Some considerations to take into account are that we generate the <strong>execution_role<\/strong> and the<strong>sagemaker_session<\/strong> using the built-in methods. For our training, we use an <strong>ml.m4xlarge instanc<\/strong>e, while for inference we used an <strong>ml.c5.xlarge<\/strong> as suggested by the docs. Don\u2019t\u2019 waste credits on GPU instances as the RCF algorithm doesn\u2019t take GPU into account.<\/p>\n\n\n\n<p>For deploying we can use the standard approach:<\/p>\n\n\n\n<pre>rcf.deploy(initial_instance_count=1, instance_type=\"ml.m4.xlarge\")\n<\/pre>\n\n\n\n<p>And that\u2019s it! We have reached the end of this workflow. Let\u2019s see some references and sum up all we have seen until now.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ul>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/step-functions\/latest\/dg\/create-sample-projects.html\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.aws.amazon.com\/step-functions\/latest\/dg\/create-sample-projects.html<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/connection-using.html\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/connection-using.html<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/connection-defining.html\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/connection-defining.html<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/populate-add-connection.html\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/populate-add-connection.html<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/athena\/latest\/ug\/glue-athena.html\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.aws.amazon.com\/athena\/latest\/ug\/glue-athena.html<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/quicksight\/latest\/user\/create-a-data-set-athena.html\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.aws.amazon.com\/quicksight\/latest\/user\/create-a-data-set-athena.html<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/randomcutforest.html\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/randomcutforest.html<\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/vpc\/latest\/peering\/what-is-vpc-peering.html\" target=\"_blank\" rel=\"noopener\">https:\/\/docs.aws.amazon.com\/vpc\/latest\/peering\/what-is-vpc-peering.html<\/a><\/li>\n<li><a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/derive-insights-from-iot-in-minutes-using-aws-iot-amazon-kinesis-firehose-amazon-athena-and-amazon-quicksight\/\" target=\"_blank\" rel=\"noopener\">https:\/\/aws.amazon.com\/blogs\/big-data\/derive-insights-from-iot-in-minutes-using-aws-iot-amazon-kinesis-firehose-amazon-athena-and-amazon-quicksight\/<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaways<\/h2>\n\n\n\n<p>In this article, we have seen many services from AWS perfectly suited for data analytics when dealing with near real-time scenarios. We have discussed about AWS Step function and how it can be used to orchestrate nonlinear workflows, giving developers the ability to have multiple choices in manipulating and extracting data for different kinds of analysis.<\/p>\n\n\n\n<p>AWS Glue proved to be flexible enough to take care of data sources residing in different places: EC2 instances, S3, and in different accounts. It was a perfect choice also due to the simplicity of setting up Spark jobs. We have seen in particular how to connect to a data source using a JDBC connection.<\/p>\n\n\n\n<p>Athena demonstrated to be the perfect tool to extract ETL results for Business Intelligence fruition, and Quicksight the obvious choice to show the results, as it\u2019s natively compatible with Athena queries.<\/p>\n\n\n\n<p>As in many other scenarios we have faced, Kinesis Data Firehose was also used to transfer near real-time data to S3 from a non-AWS source.<\/p>\n\n\n\n<p>We have also seen how Amazon S3 is always a must-have when dealing with big data workflows, machine learning problems, and data lake creation. Its durability standards, as well as its being compatible with any other AWS service, makes it the perfect choice both for long term storage and in-between steps buffer.<\/p>\n\n\n\n<p>To conclude we gave some hints on how to manipulate data in SageMaker to carry out inference for anomaly detection.<\/p>\n\n\n\n<p>This concludes our journey for today, as always feel free to comment and reach us to discuss any question, doubt, or idea that comes to your mind. We\u2019ll be glad to respond as soon as possible!<\/p>\n\n\n\n<p>Stay tuned: another story&#8217;s coming in 14 days on <strong><a target=\"_blank\" rel=\"noreferrer noopener\">#Proud2beCloud!<\/a><\/strong> \ud83d\ude42 See you there!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>ETL pipelines on AWS usually have a linear behavior: starting from one service and ending to another one. This time [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":2691,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[475],"tags":[415,262,447,445,411],"class_list":["post-2627","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics-en","tag-amazon-athena-en","tag-amazon-kinesis-data-firehose-en","tag-amazon-quicksight-en","tag-aws-glue-en","tag-data-analytics-en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions - Proud2beCloud Blog<\/title>\n<meta name=\"description\" content=\"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Analytics and Business Intelligence pipeline\" \/>\n<meta property=\"og:description\" content=\"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/\" \/>\n<meta property=\"og:site_name\" content=\"Proud2beCloud Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-02-19T10:05:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-02-22T16:05:12+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02_social.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Matteo Moroni\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Data Analytics and Business Intelligence pipeline\" \/>\n<meta name=\"twitter:description\" content=\"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02_social.png\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Matteo Moroni\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/\",\"url\":\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/\",\"name\":\"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions - Proud2beCloud Blog\",\"isPartOf\":{\"@id\":\"https:\/\/blog.besharp.it\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02.png\",\"datePublished\":\"2021-02-19T10:05:48+00:00\",\"dateModified\":\"2023-02-22T16:05:12+00:00\",\"author\":{\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc\"},\"description\":\"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#primaryimage\",\"url\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02.png\",\"contentUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02.png\",\"width\":1668,\"height\":1250},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.besharp.it\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.besharp.it\/#website\",\"url\":\"https:\/\/blog.besharp.it\/\",\"name\":\"Proud2beCloud Blog\",\"description\":\"il blog di beSharp\",\"alternateName\":\"Proud2beCloud Blog\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.besharp.it\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc\",\"name\":\"Matteo Moroni\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g\",\"caption\":\"Matteo Moroni\"},\"description\":\"DevOps e Solution Architect di beSharp, mi occupo di sviluppare soluzioni Saas, Data Analysis, HPC e di progettare architetture non convenzionali a complessit\u00e0 divergente. Appassionato di informatica e fisica, da sempre lavoro nella prima e ho un PhD nella seconda. Parlare di tutto ci\u00f2 che \u00e8 tecnico e nerd mi rende felice!\",\"url\":\"https:\/\/blog.besharp.it\/author\/matteo-moroni\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions - Proud2beCloud Blog","description":"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/","og_locale":"en_US","og_type":"article","og_title":"Data Analytics and Business Intelligence pipeline","og_description":"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions","og_url":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/","og_site_name":"Proud2beCloud Blog","article_published_time":"2021-02-19T10:05:48+00:00","article_modified_time":"2023-02-22T16:05:12+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02_social.png","type":"image\/png"}],"author":"Matteo Moroni","twitter_card":"summary_large_image","twitter_title":"Data Analytics and Business Intelligence pipeline","twitter_description":"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions","twitter_image":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02_social.png","twitter_misc":{"Written by":"Matteo Moroni","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/","url":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/","name":"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions - Proud2beCloud Blog","isPartOf":{"@id":"https:\/\/blog.besharp.it\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#primaryimage"},"image":{"@id":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02.png","datePublished":"2021-02-19T10:05:48+00:00","dateModified":"2023-02-22T16:05:12+00:00","author":{"@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc"},"description":"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions","breadcrumb":{"@id":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#primaryimage","url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02.png","contentUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/02\/beSharp_2021_19_02.png","width":1668,"height":1250},{"@type":"BreadcrumbList","@id":"https:\/\/blog.besharp.it\/orchestrating-data-analytics-and-business-intelligence-pipelines-via-step-function\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.besharp.it\/"},{"@type":"ListItem","position":2,"name":"Orchestrating Data Analytics and Business Intelligence pipelines via AWS Step Functions"}]},{"@type":"WebSite","@id":"https:\/\/blog.besharp.it\/#website","url":"https:\/\/blog.besharp.it\/","name":"Proud2beCloud Blog","description":"il blog di beSharp","alternateName":"Proud2beCloud Blog","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.besharp.it\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/0b3e69eb2dcb125d58476b906ec1c7bc","name":"Matteo Moroni","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/acad790b9bb4c6d62e076ecdc1debb35?s=96&d=mm&r=g","caption":"Matteo Moroni"},"description":"DevOps e Solution Architect di beSharp, mi occupo di sviluppare soluzioni Saas, Data Analysis, HPC e di progettare architetture non convenzionali a complessit\u00e0 divergente. Appassionato di informatica e fisica, da sempre lavoro nella prima e ho un PhD nella seconda. Parlare di tutto ci\u00f2 che \u00e8 tecnico e nerd mi rende felice!","url":"https:\/\/blog.besharp.it\/author\/matteo-moroni\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/2627","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/comments?post=2627"}],"version-history":[{"count":0,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/2627\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media\/2691"}],"wp:attachment":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media?parent=2627"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/categories?post=2627"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/tags?post=2627"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}