Performing Capacity Estimation for Kafka Cluster

How many brokers should we have? What is the ideal ram size? Should we use RAID or SSD? We come across questions like this while configuring and deploying a Kafka cluster.

In this article, we will cover in brief about the Kafka capacity estimation and planning. Please note that this is subjective advice and you need to tune it according to your needs.

This estimation is assuming a Kafka cluster with 3 brokers and 3 zookeepers in a cluster.

In order to make an estimation, we are assuming that our cluster should handle 1 million messages per minute.

CPU

To run Zookeeper in production, you can and should CPU with 2 core or higher. You must have hyperthreading support enabled.

To run Kafka broker in production, you should use multi-core servers like 12 core CPU or higher. You must have hyperthreading support enabled.

RAM

To run Zookeeper in production, you should use the RAM between 16-24 GB. Personally, I feel Zookeeper consumes memory a lot and having enough RAM is a priority.

To run Kafka in production, you should use around 24-32 GB. We use 36 Gigs of ram and our usage never goes above 60%.

Disk

The size of the disk for Zookeeper can range between 500 GB to 1TB. I use 500GB space and it works pretty well.

For Kafka brokers, you can do the disk calculation based on your retention period. For example:

Assuming we are going to use one partition and replicate the topic at 3 nodes. Here is the sample capacity planning.

Retention period: 2 weeks

Assuming 100 messages per second.

Then 6000 messages per minute and 360000 per hour.

Assuming each message is of size 1kb then we need 360000 kb or 360MB storage per hour.

Assuming 2 weeks retention it will be around 120960 MB per 2 weeks i.e 120.96 GB of storage per 2 weeks.

There is no necessity to use SSD disk as the majority of the logs are in memory and written to disk periodically. EXT is a good option as well.

JVM Heap Size

Make sure you allocate at least 6-8 GB of ram to JVM heap. This is one of the grave mistakes which majority of us does. Give JVM a good heap.

On an operating system, you should increase the file descriptor limit to anywhere between 100K-150K. This helps in increasing the Kafka performance.

This article is a part of a series, check out other articles here:

1: What is Kafka
2: Setting Up Zookeeper Cluster for Kafka in AWS EC2
3: Setting up Multi-Broker Kafka in AWS EC2
4: Setting up Authentication in Multi-broker Kafka cluster in AWS EC2
5: Setting up Kafka management for Kafka cluster
6: Capacity Estimation for Kafka Cluster in production
7: Performance testing Kafka cluster

Pankaj Kumar
Pankaj Kumar
Articles: 209