{"id":3547,"date":"2021-09-17T13:59:00","date_gmt":"2021-09-17T11:59:00","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=3547"},"modified":"2021-09-30T16:43:52","modified_gmt":"2021-09-30T14:43:52","slug":"how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/","title":{"rendered":"How we used the AWS Cloud to optimize a MongoDB-based document management system"},"content":{"rendered":"\n

These days a lot of applications are developed with the cloud in mind: an excellent architecture isn\u2019t the only thing we need to consider to succeed. <\/p>\n\n\n\n

In today\u2019s article, we\u2019ll see that it\u2019s always better to stop and choose the right solution, even if a technology is available both on the cloud and on-prem. <\/p>\n\n\n\n

Storage is a perfect example to state our case: there\u2019s a wide choice of available services that can help you reach your goal: using the right one can be a game-changer, enabling you to reduce maintenance tasks and lower operational costs.<\/p>\n\n\n\n

If you take your time to explore cloud-native technologies you\u2019ll find better solutions, even if using them can lead to an application refactor.<\/p>\n\n\n\n

We\u2019ll tell you a case that involved data migration, even if the application was already running in the AWS Cloud. <\/p>\n\n\n\n

Some time ago, we were tasked to help improve performance and reduce costs of a custom-developed document management system running on AWS that could be hosted on-prem and also on the cloud. <\/p>\n\n\n\n

The application was designed for high availability and scalability, with a microservices architecture and using a MongoDB cluster to store documents as BSON<\/a>.<\/p>\n\n\n\n

A huge amount of documents was expected to be ingested, so elasticity was the key factor that led to cloud adoption.<\/p>\n\n\n\n

After porting everything in the AWS Cloud, everything went fine: an ECS Fargate cluster was used to run containers with autoscaling enabled, using a MongoDB Atlas cluster with three nodes to store data.<\/p>\n\n\n\n

After some time (and millions of documents later), issues began to arise, in the form of a noticeable increase in the monthly AWS bill for storage occupation and availability zone traffic.<\/p>\n\n\n\n

Data transfer costs were due to cluster synchronization and traffic: instances holding data were deployed in different AZs to maintain high availability, so charges<\/a> reflected the amount of traffic the cluster had to sustain.<\/p>\n\n\n\n

Storage costs are proportional to the size of EBS volumes required to store data and MongoDB\u2019s \u201creplicationSpecs\u201d parameter. <\/p>\n\n\n\n

In addition to the aforementioned costs, two additional nodes had to be added to the cluster to maintain a high level of performance during traffic spikes. When a node needed maintenance or, for worse, failed, additional work was required to keep up with the service level agreement.<\/p>\n\n\n\n

It was becoming clear that something had to be done to lower costs and the amount of maintenance required. <\/p>\n\n\n\n

Sometimes the best solution to a complex problem is the simplest one: meet Amazon S3 (Simple Storage Service), one of the first services generally available on the AWS Cloud, released in March 2006. <\/p>\n\n\n\n

Amazon S3<\/a> is an Object Storage designed for performance, scalability, availability, and durability (with its famous 11 9\u2019s). It has a wide range of cost-effective storage classes and data management APIs that have become a de-facto industry standard.  <\/p>\n\n\n\n

Some of the main reasons we chose Amazon S3 as a storage service are <\/p>\n\n\n\n