{"id":6519,"date":"2023-12-08T14:54:23","date_gmt":"2023-12-08T13:54:23","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=6519"},"modified":"2023-12-07T16:52:40","modified_gmt":"2023-12-07T15:52:40","slug":"incidents-in-the-cloud-deal-with-them","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/incidents-in-the-cloud-deal-with-them\/","title":{"rendered":"Incidents in the Cloud: deal with them!"},"content":{"rendered":"\n

“I don\u2019t make things complicated. That\u2019s the way they get, all by themselves.”<\/em> <\/p>\n\n\n\n

– Martin Riggs, Lethal Weapon.<\/p>\n\n\n\n

We design architectures by taking care of all of the best practices and we train our collaborators to let them acquire high skills and document everything. BUT… Sometimes something still goes wrong, and we hear or say the word “incident”. <\/p>\n\n\n\n

But what is an incident<\/em>?  <\/p>\n\n\n\n

An incident is an event that disrupts or reduces the quality of an IT service or poses a risk to the security or performance of a system. It can be a server outage, a network failure, a malware infection, or a data breach.<\/p>\n\n\n\n

We all work to avoid them and be sure to expose ourselves to the minimum risk; however, IT incidents are inevitable, and they can have a significant impact on business performance, reputation, and customer experience. We all have to deal with incidents sometimes in our work careers, and managing them is not an easy skill to acquire.<\/p>\n\n\n\n

This article will explain the key elements and best practices of incident management<\/strong>. <\/p>\n\n\n\n

We will also introduce you to AWS Systems Manager Incident Manager, a feature of AWS Systems Manager that helps to prepare and respond to application and infrastructure incidents.<\/p>\n\n\n\n

Incident Management<\/h2>\n\n\n\n

Incident management is crucial when offering customers a reliable and secure service. It’s a process that helps identify, analyze, and resolve any unplanned event or issue affecting quality, availability, or performance experienced by users.<\/p>\n\n\n\n

Incidents can have different levels of severity and impact, depending on the type of service, the number of users affected, the duration of the disruption, and the potential consequences. An incident management process aims to restore normal service operativity as quickly as possible. <\/p>\n\n\n\n

By implementing an effective incident management process, we can prevent incidents and reduce (or eliminate) downtime by improving our Mean Time To Resolution (MTRR), leading to a better customer experience.<\/p>\n\n\n\n

To briefly describe key elements, the incident management process typically consists of the following stages:<\/p>\n\n\n\n