Enabling enterprise Telemetry on AWS: Handling massive data backfills using ECS and Databricks wi...
17 June 2026 - 1 min. read
Keidi Xhafa

RAG, Vector Search, Embeddings: Not Trends Anymore
These aren't emerging technologies anymore. They're foundations. Every enterprise application that truly wants to leverage Generative AI starts here.
The point isn't whether to implement them. It's how to do it without blowing your budget or turning your infrastructure into an operational nightmare.
And if your stack is on AWS, there's another question to ask: how much does it cost to leave the ecosystem for a feature you already have?
Amazon OpenSearch Service doesn't start as a pure vector database. It doesn't need to. It's the orchestrator that merges textual search, vector search, and native integration with Bedrock, SageMaker, and Lambda. Qdrant flies. Pinecone simplifies. But adding an external service comes with a cost—operational, financial, or complexity—that's rarely fully accounted for.
This article isn't theory. It's an architectural deep-dive for those who want production-ready RAG systems. Provisioned vs Serverless. k-Nearest Neighbors index configuration. Chunking strategy. The decisions that separate a prototype from a system that scales.
et's start with the basics. OpenSearch isn't a pure vector database. It's an open-source fork of Elasticsearch (2021), built for search and analytics. Vector support came later.
So why use it?
Concrete Reasons to Choose OpenSearch
1. Native hybrid search.
Real use cases rarely need just vector similarity search. You need to combine semantic search (vectors), keyword search (BM25), and metadata filters. OpenSearch does it all in a single query. Zero external orchestration.
2. Mature ecosystem.
Already using OpenSearch or Elasticsearch for logging, monitoring, search? Adding vector search means extending what you have, not building something new. Less operational complexity. Fewer hidden costs.
3. Native AWS integration.
Bedrock Knowledge Bases, SageMaker, Lambda, Kinesis. AWS-centric stack? Integration overhead is minimal.
4. Storage tiering.
With UltraWarm and Cold Storage you keep historical vectors at reduced cost. Hot tier only for the most accessed data. Try doing that with Pinecone or Weaviate without architectural gymnastics.
Let's be honest. OpenSearch isn't always the right choice.
Here are scenarios where looking elsewhere makes sense:
No hybrid search, no integration with existing stack? Purpose-built databases like Pinecone or Qdrant offer lower latencies, simpler setup, developer experience optimized for that specific case. If you're starting from zero with only similarity search as your goal, they're worth considering first.
Amazon S3 Vectors (GA since late 2025) is the AWS answer for simple cases. Save vectors directly to S3, query with ANN, pay for what you consume. Zero infrastructure.
It's useful when:
It's not useful when:
If the architectural constraint is avoiding AWS lock-in at all costs, solutions like Weaviate or Milvus offer more deployment flexibility. But if you're already working in AWS—and most enterprise teams are—this scenario rarely justifies the added complexity.
Managing shard allocation, replicas, heap memory, and index configuration requires operational skills you can't improvise. If your team is small and nobody has operated an OpenSearch cluster, the time-to-value of a managed solution like Pinecone can be much lower, at least initially.
Amazon OpenSearch Service offers two deployment models. The choice impacts costs, performance, and how much you'll manage manually.
Traditional clusters with EC2 nodes. You choose instance types, storage, shard count, replicas. Maximum flexibility. Maximum responsibility.
Use it when:
Instance sizing: don't get it wrong.
For vector-intensive workloads, choose memory-optimized instances (r6g, r7g). k-NN indices consume RAM like no tomorrow. An r6g.xlarge.search with 32GB RAM handles vector queries better than a c6g.2xlarge with 16GB, even with fewer vCPU.
OpenSearch Serverless eliminates infrastructure management. You create collections, index data, AWS scales automatically. You pay for OCU (OpenSearch Compute Units) consumed.
Use it when:
Attention: Serverless ≠ economical.
OCU-based pricing can get expensive for high and constant volumes. A Provisioned cluster with Reserved Instances costs 40-60% less. Do the math before deciding.
The heart of vector search on OpenSearch is the k-NN plugin. Configure it right and you make architectural decisions that impact performance, recall accuracy, and costs.
There are some technical details to choose before moving forward with the setup.
OpenSearch supports two ANN (Approximate Nearest Neighbors) algorithms. They're not equivalent.
HNSW (Hierarchical Navigable Small World)
IVF (Inverted File Index)
NOTE: IVF requires a mandatory training step. Before indexing, you must train a model using the Train API with the IVF method definition. Training needs at least nlist data points (more is better). More complexity than HNSW, which doesn't require training.
The distance metric (space_type) depends on your embedding model:
cosinesimil: Measures the angle between vectors. Useful if vectors aren't normalized and you only care about orientation, not magnitude.
innerproduct: Dot product. The ideal choice for performance if you're using already-normalized vectors (like OpenAI or Cohere). Computing the dot product on unit-length vectors is mathematically identical to cosine similarity, but much faster because OpenSearch saves the magnitude calculation at runtime.
l2 (Euclidean Distance): Measures the straight-line distance between points. Use when magnitude (vector length) has specific meaning in your domain—for example, in certain recommendation systems where vector length reflects frequency, intensity, or confidence of data, not just thematic similarity.
Practical rule: Using OpenAI or similar models? Normalize your vectors (or let OpenSearch handle it from version 2.18+) and use innerproduct to push query performance to the max.
k-NN tuning is a continuous tradeoff. More recall = more latency. Know the current default values before touching anything.
ef_construction (HNSW, index-time)
ef_search (HNSW, query-time)
m (HNSW)
Infrastructure ready? Let's build the pipeline.
Chunking is the most underestimated variable. Chunks too small lose context. Too large increase noise and costs. There's no universal answer: it depends on your data.
Here are the three fundamental blocks.
1. Chunking with semantic awareness
self.splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
separators=["\n\n", "\n", ". ", " ", ""]
)
2. Embedding generation with Amazon Bedrock Titan
response = self.bedrock.invoke_model(
modelId=self.embedding_model,
body=json.dumps({"inputText": text})
)
embedding = json.loads(response['body'].read())['embedding']
3. Bulk indexing on OpenSearch
success, failed = helpers.bulk(
client,
actions,
chunk_size=batch_size,
raise_on_error=False
)
RAG and Vector Search aren't experiments anymore. They're production.
OpenSearch gives you the tools. But tools aren't enough: you need to know how to choose.
Provisioned or Serverless? HNSW or IVF? Chunking strategy? There's no universal answer. There's the right one for your use case. Hopefully this article gave you the resources to ask the right questions.
Want to talk about implementing all this in your AWS stack? You know where to find us.
This article is based on official documentation, AWS best practices, and real implementations in production environments.
AWS OpenSearch Service Documentation:
CloudFormation and IaC: