How to Setup Zookeeper Cluster for Kafka

Zookeeper is a centralized service to handle distributed synchronization. Kafka uses zookeeper to handle multiple brokers to ensure higher availability and failover handling.

But what if zookeeper failed? We can’t take a chance to run a single Zookeeper to handle distributed system and then have a single point of failure. To handle this, we run multiple zookeeper i.e Zookeeper cluster also known as a quorum.

In this article, we will set up a Kafka cluster in Amazon EC2 machine.

What you need

You need to have an account on AWS and able to create an EC2 instance. We will need three EC2 instances to install and configure Zookeeper cluster.

We choose the odd number of Zookeeper nodes for leader election in case of failure.

Creating EC2 Instances

Create 3 EC2 Instances of type t2.small if you are just learning or setting up a test environment. For production, go with the instance with the RAM of size 6 to 8 GB. You need good RAM to handle Java virtual machine heap requirements.

I am using Ubuntu 16.04 for the tutorial.

You need to change the security group of each instance and allow the port range 2888-3888 and port 2181.

You can change the port number if you would like to use different ports for your setup. I am going with this one.

Once you have your EC2 instance running, we can begin our setup.

Setting up Zookeeper

Log in to each EC2 Instance and update the packages.

$ sudo apt-get update

Then repeat the following steps on each of the instances.

1: Download the Kafka latest build.

$ wget http://mirrors.estointernet.in/apache/kafka/2.1.0/kafka_2.11-2.1.0.tgz

2: Extract the Kafka tar file

$ tar xzf kafka_2.11-2.1.0.tgz

3: Create a myid file.

$ mkdir /var/lib/zookeeper
$ cd /lib/zookeeper
$ touch myid
$ echo “1” >> myid

Note: Since we have 3 servers, on each the number would differ. On the first server use 1, on the second server use 2 and for third use 3, you can use multiple servers and assign these incremental numbers as ids.

For example, in the 2nd server, we will do the following.

$ mkdir /var/lib/zookeeper
$ cd /lib/zookeeper
$ touch myid
$ echo “2” >> myid

3rd Server:

$ mkdir /var/lib/zookeeper
$ cd /lib/zookeeper
$ touch myid
$ echo “3” >> myid

Now, we will configure the zookeeper. Here is the configuration file for each server.

Open up the configuration file of a zookeeper.

$ cd kafka_2.11-2.1.0
$ vi config/zookeeper.properties

Add the configuration shown below in each server.

Server 1:

dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=0
initLimit=5
syncLimit=2
tickTime=2000
# list of servers
server.1=0.0.0.0:2888:3888
server.2=<Ip of second server>:2888:3888
server.3=<ip of third server>:2888:3888

Server 2:

dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=0
initLimit=5
syncLimit=2
tickTime=2000
# list of servers
server.1=<ip of first server>:3888
server.2=0.0.0.0:2888:3888
server.3=<ip of third server>:2888:3888

Server 3:

dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=0
initLimit=5
syncLimit=2
tickTime=2000
# list of servers
server.1=<ip of first server>:3888
server.2=<ip of second server>:3888
server.3=0.0.0.0:2888:3888

Notice how we are changing the IP in each configuration. Here the server. is the ID which we created earlier. For the local instance, we are using 0.0.0.0 and giving the IP address of another zookeeper server for the respective ID’s.

This is a very important step, one mistake in IP and you will scratch your head for hours debugging Java connection failed error.

Once the configuration is set up on each server, start the zookeeper.

Repeat this command on each server after switching to the Kafka server folder.

$ bin/zookeeper-server-start.sh config/zookeeper.properties

And that’s it.

You have successfully setup the zookeeper cluster.

This article is a part of a series, check out other articles here:

1: What is Kafka
2: Setting Up Zookeeper Cluster for Kafka in AWS EC2
3: Setting up Multi-Broker Kafka in AWS EC2
4: Setting up Authentication in Multi-broker Kafka cluster in AWS EC2
5: Setting up Kafka management for Kafka cluster
6: Capacity Estimation for Kafka Cluster in production
7: Performance testing Kafka cluster

Shahid
Shahid

Founder of Codeforgeek. Technologist. Published Author. Engineer. Content Creator. Teaching Everything I learn!

Articles: 126