Is your Elastic Cloud Cluster Right Sized?

Michael Cizmar

Michael Cizmar

President And Managing Director @ MC+A

How and when you right size your cluster can lead to better utilization of your cloud spend

Elastic Cloud cluster for ELK Stack = Easy

Elastic Cloud makes deploying, operating, and scaling the Elasticsearch Stack (ELK) in the cloud easy. The Elastic Cloud is run by Elastic, the maker of Elasticsearch and related products. It runs in all the major public cloud providers (aka Hyperscalers) and through its management console, you can start with the logical minimal infrastructure needed and then scale it up to hundreds of nodes to service your use case. In this article we’ll discuss the major considerations when optimizing your cloud cluster size and therefore your bill.

How to size the major components of your Elastic Cloud deployment?

Start by understanding your ELK use case.

The Elastic Cloud serves a diverse range of use cases, including: Log Analytics, Enterprise Search, APM (Application Performance Monitoring), and Security Monitoring.  Your use case will determine the hardware profile and redundancy that you’ll need to configure.  A deployment has 3 basic hardware profiles along with additional variants depending on the hosting platform you choose.  These hardware profiles determine the ratio of RAM, Disk, and CPU resources.  The following table demonstrates the basic differences for search nodes running on AWS (Amazon Web Services).

Profile RAM to Disk vCPU/RAM Notes
Storage Optimized 30:1 0.138
Storage Optimized (Dense) 80:1 0.133 Similar CPU and RAM but a lot more disk.
CPU Optimized 12:1 0.529 2x the RAM ratio and 4x the CPUs than storage optimized.
CPU Optimized (ARM) 30:1 0.533 New / Faster CPUs.
General Purpose 10:1 0.267 Best RAM to Disk Ratio
General Purpose (ARM) 15:1 0.267 New / Faster CPUs

In general, if you are storing a large amount of data with a low query volume, our general recommendation is to lean towards the storage optimized profiles. If your use case is a traditional enterprise search or vector search use case our recommendation is to lean towards the CPU optimized profiles.

Your use case is key for determining whether you need to plan for data growth or significant query volume.  You can further segment this by putting your data into specific data tiers and utilized Index Lifecycle management policies to move the data into the appropriate tier as it grows older and is less likely to be accessed.

The data ingestion and query volume will be key factors in how much CPU capacity you need. 

More Clusters = Better, right? ¯\_(ツ)_/¯

The business flexibility that comes with the ability to create clusters on demand can’t be underestimated. But because of some of the limitation with cluster sizing our recommendation is to stick with one cluster for each use case. The logical separation has practical benefits with regards to scaling clusters. Narrowly focusing your clusters also allows you to take advantage of the most applicable and optimized hardware profile for a given use case.  Here is an example, you can have a production search cluster powering your e-commerce website using the CPU optimized profile while an observability cluster using the storage-optimized profile is monitoring the performance of the other clusters and gather the logs of the system.

Computing Resources - The Basic Costs

If you review the pricing table for Elastic Cloud, you can see that are essentially paying for GB of RAM per hour. Based on the hardware profile you have an allotment of disk and computing. The more ram you consume, the more you are changed. There are other incidentals including, network I/O and snapshots to consider but these are typically less significant in comparison to the main cluster cost.


The fundamental computing unit for Elasticsearch is RAM usage.  This matches the principal sizing metric for a cluster which is to maintain a certain amount of machine RAM to disk in what we call “ram to disk ratio.”  Our general rule is to aim at maintaining specific RAM to disk ratio depending on the use case for the cluster.  This is the total amount available in the clusters data nodes / the index size of the primary shards and their replicas.  Standard Ratios are:

  • 1:15 – Autocomplete Query Completion
  • 1:30 – Enterprise Search
  • 1:100 – Warm Data
  • 1:1000 – Cold or Frozen Data

But as we also mentioned above, this is fixed for your cluster and the tier of nodes (i.e. hot/warm/cold/frozen). 


A cluster needs to have enough storage to hold all of the data need to service your use case. Storage is the simplest component of a cluster to scale out as ESS can auto scale this as your disk capacity runs low.  Additionally, you can take advantage of warm and cold tiers, moving data to a zone of the cluster which maintains a larger RAM to disk ratio. Moving data down in the availability tiers allow you to store that is accessed less frequency for a lower price.

Ultimately, the data can be written into a snapshot and mounted as read only at the most extreme ratio of ram to disk using the Frozen Tier.


Cluster computer size varies based on your use case and is variable based on a variety of factors, including shard configuration, query volume, index size, as well as the complexity of your queries. Compute capacity refers to the number of available CPU threads your cluster has.  More threads equals more concurrent requests and processes.

Word of caution, threads are easily consumed by poorly designed indexes and queries. For example, a query that calls a single index with 8 shards, that query will consume 8 threads. If that query takes 100 milliseconds and your cluster has 4 cores (with two hyper threads), your cluster can sustain 10 QPS with the cluster unable to do anything else (not a typical scenario).

Specialty Nodes

There are additional specialty node types beyond the basic search node, including: Machine Learning (ML), Coordinating, Ingestion, Integration, Enterprise Search, and Kibana. These also are a consideration, and we’ll have a follow up article about how, when, and to what scale to deploy these.

Which Version? It Matters

Over the past few years there have been numerous version updates that include changes to improve performance. Upgrading to the most recent version of Elastic provides you with significant reduction in index size and is recommended to improve your shard querying strategy.

How do you determine if you are over or undersized

Like goldilocks and the three bears, your cluster could be under, right, or oversized. Typically speaking, you probably know that your cluster is undersized. Elastic generates a warning if there is an issue with disk or CPU utilization. But, how do you if your cluster is right (which would be great) or is oversized and wasting resources and money?
Determining if your ESS cluster has too much capacity (oversized) involves evaluating multiple performance metrics and resource utilizations against the requirements of your application and use case. These metrics can be through an Elastic observation cluster or through another monitoring system like Grafana.
Here is a short list of some of the metrics that you should use to assess if your cluster is oversized:

  • Monitor Resource Utilization
  • CPU Usage: Low average CPU usage with no spikes
    • Are your Shards allocated to all nodes?
    • Does your search ratio utilize the nodes (searches * shards = current CPU threads)?
  • Memory Usage:
    • Monitoring the heap and ensure it remains within the typical operational ranges
    • Compare your index size to the amount of RAM
  • Disk Usage: Monitoring your overall disk usage and shard allocations

A couple key questions:

  • Do you need the RAM to disk ratio that you selected in your profile?
  • Are you seeing CPUs spikes?

Remember, it’s always easier to scale up then down. Scaling down can cause reshuffling of shards and potentially removal of nodes which will have negative effects.

A note about Elastic Cloud As of this writing, ESS will auto scale under the following conditions:
  1. Data Nodes – The data exceeds the storage capacity
    • a. No capabilities for CPU or RAM scaling
    • No downward scaling
  2. ML Nodes
    • Is based loaded models and jobs
    • Scales up and down
Deployment auto scaling | Elasticsearch Service Documentation | Elastic

Want an expert opinion?

MC+A has consulted experience with client that maintain some of the largest Elasticsearch deployments in the world. Reach out to us to engage us on a health check and we’ll provide some feedback regarding the sizing of our cluster and if there are some basic improvements you can make.


Trusted Advisor

Go Further with Expert Consulting

Launch your technology project with confidence. Our experts allow you to focus on your project’s business value by accelerating the technical implementation with a best practice approach. We provide the expert guidance needed to enhance your users’ search experience, push past technology roadblocks, and leverage the full business potential of search technology.

Scroll to Top