{"id":6408,"date":"2023-10-27T09:00:00","date_gmt":"2023-10-27T07:00:00","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=6408"},"modified":"2024-02-02T11:58:12","modified_gmt":"2024-02-02T10:58:12","slug":"nightmare-infrastructures-episode-2-besharps-halloween-special","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/nightmare-infrastructures-episode-2-besharps-halloween-special\/","title":{"rendered":"Nightmare Infrastructures episode 2 – beSharp\u2019s Halloween special"},"content":{"rendered":"\n
Boys and girls of every age<\/em> Last year, we saw some scary infrastructures<\/a>. Are you ready for the new episode? <\/p>\n\n\n\n In this article, we\u2019ll see some strange infrastructure designs and practices we have encountered, telling stories about Cloud anti-patterns that will become an absolute nightmare in the long term<\/strong>.<\/p>\n\n\n\n Hold your breath; we\u2019re about to start! <\/p>\n\n\n\n <\/p>\n\n\n\n The modern concept of zombies is influenced by the Haitian Vodou religion, where some people believe that a witch doctor can revive a dead person as their slave using magic or a secret potion. <\/p>\n\n\n\n Sometimes, ECS tasks are resurrected by a service, even if they should be buried (and stopped).<\/p>\n\n\n\n We saw it happen in a development environment when a small, pretty php container had a problem due to an error in a failed pipeline. <\/p>\n\n\n\n The task struggled to stay alive, but the brutal Application Load Balancer health check shot the container in the head; the ECS service did its best to revive the task, pulling it from the ECR repository. <\/p>\n\n\n\n No one noticed the issue until the end of the month when the billing increased to over 800$ due to 16 Terabytes of traffic through the NAT Gateway.<\/p>\n\n\n\n In this case, the ECS circuit breaker was willing to help, but no one asked him. <\/p>\n\n\n\n To avoid making ECS zombies, please involve him next time you deploy a container! <\/p>\n\n\n\n <\/p>\n\n\n\n I’m lying when I say, “Trust me”<\/em> This is a short but even scarier episode. While investigating a problem with a failed pipeline deployment, we saw this trust policy in a role with administrator permissions on every resource: <\/p>\n\n\n\n We first detached the policy and, as a proof of concept, during a call with the customer, we used our personal AWS account to assume that role. Pretty scary, uh? <\/p>\n\n\n\n Like in \u201cThe Sixth Sense\u201d…<\/p>\n\n\n\n <\/p>\n\n\n <\/p>\n\n\n\n It was a cold winter night, and, during a storm, our on-duty cell phone started ringing desperately. A serverless application struggled to survive, and our API gateway desperately gave 5xx errors. <\/p>\n\n\n\n Our fellow colleague started investigating, and strangely, everything was quiet. Too quiet. No logs for the lambda associated with the troubled application route were recorded in CloudWatch. <\/p>\n\n\n\n When making requests with Postman or curl, everything worked like a charm. <\/p>\n\n\n\n Since everything was working again, the investigation was postponed until the next morning, but… After an hour, the phone started ringing again. And, still, no traces of failures, even in the logs. <\/p>\n\n\n\n Our customer, in the past, was having some trouble because the lambda was timing out, so it \u201creserved some capacity\u201d. It turned out that the \u201creserved concurrency\u201d was set to 1. <\/p>\n\n\n\n According to the AWS documentation: \u201cReserved concurrency is the maximum number of concurrent instances you want to allocate to your function. When a function has reserved concurrency, no other function can use that concurrency<\/em>\u201d. <\/p>\n\n\n\n But there\u2019s a catch: reserved concurrency is also the maximum number of concurrent lambda instances that can be executed, so setting this value to one effectively throttles and limits the lambda, so if two simultaneous users call the API route, API Gateway will return a 5xx error. <\/p>\n\n\n\n
Wouldn’t you like to see something strange?<\/em>
Come with us, and you will see<\/em>
This, our town of Halloween<\/em> – The Nightmare before Christmas.<\/p>\n\n\n\n<\/figure>\n\n\n\n
The undead<\/h2>\n\n\n
<\/figure><\/div>\n\n\n
It\u2019s a matter of trust<\/h2>\n\n\n
<\/figure><\/div>\n\n\n
I can’t believe this is true…<\/em>
Trust hurts, why does trust equal suffering?\u201d<\/em>
Megadeth, Trust<\/p>\n\n\n\n{\n\n \"Version\": \"2012-10-17\",\n\n \"Statement\": [\n\n {\n\n \"Effect\": \"Allow\",\n\n \"Principal\": {\n\n \"AWS\": \"*\"\n\n },\n\n \"Action\": \"sts:AssumeRole\"\n\n }\n\n ]\n\n}<\/code><\/pre>\n\n\n\n
I see dead lambdas<\/h1>\n\n\n\n
<\/figure><\/div>\n\n\n
Determined to solve the mystery, another colleague joined the investigation and, while reviewing the configuration… It suddenly appeared! <\/p>\n\n\n\n