{"id":3547,"date":"2021-09-17T13:59:00","date_gmt":"2021-09-17T11:59:00","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=3547"},"modified":"2021-09-30T16:43:52","modified_gmt":"2021-09-30T14:43:52","slug":"how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/","title":{"rendered":"How we used the AWS Cloud to optimize a MongoDB-based document management system"},"content":{"rendered":"\n<p>These days a lot of applications are developed with the cloud in mind: an excellent architecture isn\u2019t the only thing we need to consider to succeed.&nbsp;<\/p>\n\n\n\n<p>In today\u2019s article, we\u2019ll see that it\u2019s always better to stop and choose the right solution, even if a technology is available both on the cloud and on-prem.&nbsp;<\/p>\n\n\n\n<p>Storage is a perfect example to state our case: there\u2019s a wide choice of available services that can help you reach your goal: using the right one can be a game-changer, enabling you to reduce maintenance tasks and lower operational costs.<\/p>\n\n\n\n<p>If you take your time to explore cloud-native technologies you\u2019ll find better solutions, even if using them can lead to an application refactor.<\/p>\n\n\n\n<p>We\u2019ll tell you a case that involved data migration, even if the application was already running in the AWS Cloud.&nbsp;<\/p>\n\n\n\n<p>Some time ago, we were tasked to help improve performance and reduce costs of a custom-developed document management system running on AWS that could be hosted on-prem and also on the cloud.&nbsp;<\/p>\n\n\n\n<p>The application was designed for high availability and scalability, with a microservices architecture and using a MongoDB cluster to store documents as <a href=\"https:\/\/bsonspec.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">BSON<\/a>.<\/p>\n\n\n\n<p>A huge amount of documents was expected to be ingested, so elasticity was the key factor that led to cloud adoption.<\/p>\n\n\n\n<p>After porting everything in the AWS Cloud, everything went fine: an ECS Fargate cluster was used to run containers with autoscaling enabled, using a MongoDB Atlas cluster with three nodes to store data.<\/p>\n\n\n\n<p>After some time (and millions of documents later), issues began to arise, in the form of a noticeable increase in the monthly AWS bill for storage occupation and availability zone traffic.<\/p>\n\n\n\n<p>Data transfer costs were due to cluster synchronization and traffic: instances holding data were deployed in different AZs to maintain high availability, so <a href=\"https:\/\/docs.atlas.mongodb.com\/billing\/data-transfer-costs\/\" target=\"_blank\" rel=\"noreferrer noopener\">charges<\/a> reflected the amount of traffic the cluster had to sustain.<\/p>\n\n\n\n<p>Storage costs are proportional to the size of EBS volumes required to store data and MongoDB\u2019s \u201creplicationSpecs\u201d parameter.&nbsp;<\/p>\n\n\n\n<p>In addition to the aforementioned costs, two additional nodes had to be added to the cluster to maintain a high level of performance during traffic spikes. When a node needed maintenance or, for worse, failed, additional work was required to keep up with the service level agreement.<\/p>\n\n\n\n<p>It was becoming clear that something had to be done to lower costs and the amount of maintenance required.&nbsp;<\/p>\n\n\n\n<p>Sometimes the best solution to a complex problem is the simplest one: meet Amazon S3 (Simple Storage Service), one of the first services generally available on the AWS Cloud, released in March 2006.&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noreferrer noopener\">Amazon S3<\/a> is an Object Storage designed for performance, scalability, availability, and durability (with its famous 11 9\u2019s). It has a wide range of cost-effective storage classes and data management APIs that have become a de-facto industry standard.&nbsp; <\/p>\n\n\n\n<p>Some of the main reasons we chose Amazon S3 as a storage service are&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Lower cost\/GB ratio (compared to EBS)<\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>Out-of the box multi-AZ data replication and encryption<\/li><li>Intelligent tiering helps to categorize data based on access patterns, keeping costs low.&nbsp;<\/li><li>Bucket Cross-Region replication is easy to set up and helps with disaster recovery scenarios<\/li><\/ul>\n\n\n\n<p>After a quick proof of concept, the application was ready to use S3 to store user\u2019s documents, but data migration was needed, with the shortest reasonable downtime.<\/p>\n\n\n\n<p>Taking the entire application offline, running a script to migrate data, and deploying a new release was too risky and not feasible since the amount of data to migrate was too big to handle, even during the night.<\/p>\n\n\n\n<p>We needed to rethink our approach and shift the migration on the application side. After some design and brainstorming sessions, we came up with a safe plan that didn\u2019t require downtime; in the end, a small application refactor was needed to implement a new logic for document retrieval.&nbsp;<\/p>\n\n\n\n<p>At first, a new field representing the object key in a bucket was added to the schema, then the document retrieval logic followed this flow chart:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"476\" height=\"516\" src=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/image1-1.png\" alt=\"\" class=\"wp-image-3549\" srcset=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/image1-1.png 476w, https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/image1-1-277x300.png 277w\" sizes=\"auto, (max-width: 476px) 100vw, 476px\" \/><\/figure><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>This logic worked well, but our need was to complete the data migration in progress to resize the MongoDB cluster, reduce costs, and ensure that all the documents were stored on S3.<\/p>\n\n\n\n<p>We developed a custom python script to leverage the application logic to migrate the documents: requesting a document was enough to start the object migration.<\/p>\n\n\n\n<p>We started with datasets containing the list of documents to migrate: a single dataset contained about 70 million documents, so we had to split them in smaller batches.<\/p>\n\n\n\n<p>To store the migration status for each object in a batch we needed a high-performance database that could handle millions of key-value records: our choice was (obviously) a DynamoDB table.&nbsp;<\/p>\n\n\n\n<p>We used the object id as a table key, saving the migration status and a hash of the retrieved object.&nbsp;<\/p>\n\n\n\n<p>We chose to run the script in parallel on EC2: after a quick run on a t3 family instance, we saw that we needed more memory and multiple m5 instances that could execute multiple parallel migration jobs.&nbsp;<\/p>\n\n\n\n<p>To handle script failures we implemented logic to check the migration status on the DynamoDB table before making the request to the application to avoid unnecessary calls.<\/p>\n\n\n\n<p>An additional script was developed to check hashes of successfully migrated objects on S3 against the saved entry on DynamoDB, to be sure that no data corruption could happen.<\/p>\n\n\n\n<p>The migration took a month. 40 Terabytes of data were successfully moved, without impacting users and availability, even during traffic peaks.<\/p>\n\n\n\n<p>A significant reduction in costs and maintenance allowed the business to take off and expand the user base. Costs were lowered and performance improved<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaways<\/h2>\n\n\n\n<p>The Cloud isn\u2019t a silver bullet that magically solves every problem. We need to rethink applications (or small portions of them) to use it efficiently, even if they seem to run flawlessly when ported in a cloud environment.<\/p>\n\n\n\n<p>When products scale, they need to adapt to changes to survive and grow; complex solutions aren\u2019t often the best choice: sometimes the simpler is the better.<\/p>\n\n\n\n<p>How did you handle application optimization for the Cloud? Tell us your story!<\/p>\n\n\n\n<p>See you in 14 days on <strong>#Proud2beCloud<\/strong>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>These days a lot of applications are developed with the cloud in mind: an excellent architecture isn\u2019t the only thing [&hellip;]<\/p>\n","protected":false},"author":13,"featured_media":3552,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[482],"tags":[252,370],"class_list":["post-3547","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-architecting-en","tag-amazon-s3-en","tag-mongodb-en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How we used the AWS Cloud to optimize a MongoDB-based document management system - Proud2beCloud Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How we used the AWS Cloud to optimize a MongoDB-based document management system\" \/>\n<meta property=\"og:description\" content=\"In this article we&#039;ll see how we successfully adapted a custom-developed document management system to the AWS Cloud\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/\" \/>\n<meta property=\"og:site_name\" content=\"Proud2beCloud Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-09-17T11:59:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-09-30T14:43:52+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"628\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Damiano Giorgi\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"How we used the AWS Cloud to optimize a MongoDB-based document management system\" \/>\n<meta name=\"twitter:description\" content=\"In this article we&#039;ll see how we successfully adapted a custom-developed document management system to the AWS Cloud\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Damiano Giorgi\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/\",\"url\":\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/\",\"name\":\"How we used the AWS Cloud to optimize a MongoDB-based document management system - Proud2beCloud Blog\",\"isPartOf\":{\"@id\":\"https:\/\/blog.besharp.it\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/What-makes-a-Cloud-adoption-strategy-great.jpg\",\"datePublished\":\"2021-09-17T11:59:00+00:00\",\"dateModified\":\"2021-09-30T14:43:52+00:00\",\"author\":{\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/a9195473e4a658b45cb12d3df3fdf293\"},\"breadcrumb\":{\"@id\":\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#primaryimage\",\"url\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/What-makes-a-Cloud-adoption-strategy-great.jpg\",\"contentUrl\":\"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/What-makes-a-Cloud-adoption-strategy-great.jpg\",\"width\":1600,\"height\":901},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.besharp.it\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How we used the AWS Cloud to optimize a MongoDB-based document management system\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.besharp.it\/#website\",\"url\":\"https:\/\/blog.besharp.it\/\",\"name\":\"Proud2beCloud Blog\",\"description\":\"il blog di beSharp\",\"alternateName\":\"Proud2beCloud Blog\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.besharp.it\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/a9195473e4a658b45cb12d3df3fdf293\",\"name\":\"Damiano Giorgi\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9a20b8c97250d4fb49857192f7e4bedf?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9a20b8c97250d4fb49857192f7e4bedf?s=96&d=mm&r=g\",\"caption\":\"Damiano Giorgi\"},\"description\":\"Ex sistemista on-prem, pigro e incline all'automazione di task noiosi. Alla ricerca costante di novit\u00e0 tecnologiche e quindi passato al cloud per trovare nuovi stimoli. L'unico hardware a cui mi dedico ora \u00e8 quello del mio basso; se non mi trovate in ufficio o in sala prove provate al pub o in qualche aeroporto!\",\"url\":\"https:\/\/blog.besharp.it\/author\/damiano-giorgi\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How we used the AWS Cloud to optimize a MongoDB-based document management system - Proud2beCloud Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/","og_locale":"en_US","og_type":"article","og_title":"How we used the AWS Cloud to optimize a MongoDB-based document management system","og_description":"In this article we'll see how we successfully adapted a custom-developed document management system to the AWS Cloud","og_url":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/","og_site_name":"Proud2beCloud Blog","article_published_time":"2021-09-17T11:59:00+00:00","article_modified_time":"2021-09-30T14:43:52+00:00","og_image":[{"width":1200,"height":628,"url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system.jpg","type":"image\/jpeg"}],"author":"Damiano Giorgi","twitter_card":"summary_large_image","twitter_title":"How we used the AWS Cloud to optimize a MongoDB-based document management system","twitter_description":"In this article we'll see how we successfully adapted a custom-developed document management system to the AWS Cloud","twitter_image":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system.jpg","twitter_misc":{"Written by":"Damiano Giorgi","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/","url":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/","name":"How we used the AWS Cloud to optimize a MongoDB-based document management system - Proud2beCloud Blog","isPartOf":{"@id":"https:\/\/blog.besharp.it\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#primaryimage"},"image":{"@id":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/What-makes-a-Cloud-adoption-strategy-great.jpg","datePublished":"2021-09-17T11:59:00+00:00","dateModified":"2021-09-30T14:43:52+00:00","author":{"@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/a9195473e4a658b45cb12d3df3fdf293"},"breadcrumb":{"@id":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#primaryimage","url":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/What-makes-a-Cloud-adoption-strategy-great.jpg","contentUrl":"https:\/\/blog.besharp.it\/wp-content\/uploads\/2021\/09\/What-makes-a-Cloud-adoption-strategy-great.jpg","width":1600,"height":901},{"@type":"BreadcrumbList","@id":"https:\/\/blog.besharp.it\/how-we-used-the-aws-cloud-to-optimize-a-mongodb-based-document-management-system\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.besharp.it\/"},{"@type":"ListItem","position":2,"name":"How we used the AWS Cloud to optimize a MongoDB-based document management system"}]},{"@type":"WebSite","@id":"https:\/\/blog.besharp.it\/#website","url":"https:\/\/blog.besharp.it\/","name":"Proud2beCloud Blog","description":"il blog di beSharp","alternateName":"Proud2beCloud Blog","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.besharp.it\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/a9195473e4a658b45cb12d3df3fdf293","name":"Damiano Giorgi","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.besharp.it\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9a20b8c97250d4fb49857192f7e4bedf?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9a20b8c97250d4fb49857192f7e4bedf?s=96&d=mm&r=g","caption":"Damiano Giorgi"},"description":"Ex sistemista on-prem, pigro e incline all'automazione di task noiosi. Alla ricerca costante di novit\u00e0 tecnologiche e quindi passato al cloud per trovare nuovi stimoli. L'unico hardware a cui mi dedico ora \u00e8 quello del mio basso; se non mi trovate in ufficio o in sala prove provate al pub o in qualche aeroporto!","url":"https:\/\/blog.besharp.it\/author\/damiano-giorgi\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/3547","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/comments?post=3547"}],"version-history":[{"count":0,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/posts\/3547\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media\/3552"}],"wp:attachment":[{"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/media?parent=3547"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/categories?post=3547"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.besharp.it\/wp-json\/wp\/v2\/tags?post=3547"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}