{"id":6095,"date":"2023-08-04T09:00:00","date_gmt":"2023-08-04T07:00:00","guid":{"rendered":"https:\/\/blog.besharp.it\/?p=6095"},"modified":"2023-08-02T10:55:36","modified_gmt":"2023-08-02T08:55:36","slug":"opensearch-everything-you-need-to-know-for-the-perfect-setup","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/opensearch-everything-you-need-to-know-for-the-perfect-setup\/","title":{"rendered":"OpenSearch: everything you need to know for the perfect setup"},"content":{"rendered":"\n
Data is all around. Thus, every organization – regardless of size or industry – faces the challenge of managing and extracting valuable information from huge amounts of information. In this scenario, fast and effective data handling and processing arises as a priority for companies to adapt, react, and keep pace with today\u2019s fast-changing environments. For this reason, selecting the perfect tool that best meets organization\u2019s needs is crucial.<\/p>\n\n\n\n
In this blog post, we present OpenSearch, a highly scalable toolset providing fast access and response to large volumes of data.<\/p>\n\n\n\n
Let\u2019s dive deep!<\/p>\n\n\n\n
Cluster fundamentals<\/h2>\n\n\n\n
OpenSearch is an open-source search and analytics suite used to query large volumes of data using API calls or an integrated Dashboard. OpenSearch offers features such as Full-text querying, Autocomplete, Scroll Search, customizable scoring and ranking, fuzzy matching, phrase matching, and more. Responses can be returned in jdbc, csv, raw, or JSON format.<\/p>\n\n\n\n
Let\u2019s have a brief description of the fundamental componets of an OpenSearch cluster:<\/p>\n\n\n\n
\n
Indexes<\/li>\n\n\n\n
Shards<\/li>\n\n\n\n
Node<\/li>\n\n\n\n
Types of node<\/li>\n<\/ul>\n\n\n\n
To search data, you must organize it into indexes<\/strong>. Indexes store documents (sets of fields with key-value pairs) and optimize them. Optimization is possible because each field has a specific type. You can specify field types. Otherwise, OpenSearch can try to determine the type automatically.<\/p>\n\n\n\n
Another form of optimization is splitting the index into several shards<\/strong>. Each shard contains a subset of the documents inside the index. When you search for data, queries run across different shards in parallel if each shard is located on a different node. The size of shards should be around 10-30 GB for workloads requiring low search latency and 30-50 GB for write-heavy workloads such as storing logs.<\/p>\n\n\n\n
OpenSearch instances are called nodes<\/strong>. OpenSearch can operate as a single-node or multi-node cluster. When creating a multi-node cluster, the number of nodes, the node types, and their hardware depend on your use case.<\/p>\n\n\n\n
The types of nodes<\/strong> are:<\/p>\n\n\n\n
\n
Master<\/strong>: the master node manages tasks such as indexes management, keeping track of the cluster nodes, doing health checks, and allocating shards<\/li>\n\n\n\n
Master-eligible<\/strong>: master-eligible nodes can be promoted to master through a voting process<\/li>\n\n\n\n
Data<\/strong>: data nodes perform all-data related operations on local shards, such as indexing, searching, and aggregating<\/li>\n\n\n\n
Ingest<\/strong>: ingest nodes run pipelines to transform data before storing it<\/li>\n\n\n\n
Coordinating<\/strong>: coordinating nodes delegate client requests to Data Nodes and aggregate the results into one before sending it to the client.<\/li>\n<\/ul>\n\n\n\n
A node can have multiple types. Each node is a master-eligible, data, ingest, and coordinating node by default.<\/p>\n\n\n\n
During the creation of an AWS OpenSearch cluster, you can customize Data Nodes and Dedicate Master Nodes.<\/p>\n\n\n\n
For Data Nodes, you can specify the number of nodes, the instance type, the volume type, and the volume size.<\/p>\n\n\n\n
Dedicated Master Nodes are nodes that do not contain data and are dedicated to cluster management. You can choose to not have any Dedicate Master Node in development or test environments. Since they do not contain data, you must specify only the number of nodes and the instance type.<\/p>\n\n\n\n
Provisioning types<\/h2>\n\n\n\n
On AWS, you can deploy an OpenSearch cluster in two different ways:<\/p>\n\n\n\n
\n
Servefull<\/li>\n\n\n\n
Serverless<\/li>\n<\/ul>\n\n\n\n
Serverfull<\/h3>\n\n\n\n
The classic approach is the Serverfull one in which you have to choose how many nodes you want to create, specifying the node types, sizing, and other properties.<\/p>\n\n\n\n
Let\u2019s look at the AWS Console and go through the Amazon OpenSearch service. The first step we will do is create a domain, in other words, an OpenSearch cluster.<\/p>\n\n\n\n
You will have to choose between two options: Easy create<\/strong> or Standard create.<\/strong><\/p>\n\n\n\n