Practice setting up a kafka cluster on AWS EC2

Tram Ho

In this article, I will present in detail step by step to be able to set up a cluster of EC2 servers on AWS running Kafka including 3 Zookeeper servers and 3 Bootstrap servers. To read this article, I hope you have some knowledge about AWS Cloud, Linux, and Kafka. If you don’t have it, that’s okay, during the setup process I will try to explain as concisely as possible.

Talking about kafka, you need a separate series to talk about all its theories, in this article we only focus on how to set up a Kafka cluster. Therefore, I only briefly introduce two main and necessary concepts in this article, Zookeeper and Kafka Bootstrap server. Other concepts you can read here

What is Zookeeper?

Zookeeper in Kafka can be understood simply as a place to manage and store information about the cluster, including brokers, topics, partitions and other metadata. It also provides a version control suite to make Kafka cluster management and administration easier. With ZooKeeper support, nodes in a Kafka cluster can communicate with each other to manage operations such as increasing or decreasing the number of brokers, adding or removing topics, etc.

Setup Zookeeper on AWS

Setup Zookeeper is very important because most of the configuration settings are set up at this step, so I will use this whole article to set it up carefully and in detail, and in the next part I will setup 3 Kafka servers remaining. Here are the steps I will take:

  1. Create an AWS account
  2. Setup network security allow port 2181, 2888 and 3888
  3. Setup separate VPC for cluster
  4. Create 3 EC2 instane (t2.medium 4gb ram)

In the first step, you can read here to create yourself an AWS account. After successful creation, you go to EC2 service, this is the place to initialize the servers


The next step is to create a security group so that our servers and machines can connect to each other. image.png

Security group helps us secure at the instance level, the next step I will create a separate VPC to make our cluster more secure on the subnet side.

Go to VPC service, select Create VPC , name it as you like, here I name it my-kafka-vpc with IPv4 CIDR Because my kafka cluster needs 6 servers, so please choose a region with at least 6 AZs , in each AZ will be placed 1 EC2 .


Next, we go to the Subnets tab and select Create Subnet and create 6 subnets corresponding to each AZ with the following information:


Create an Internet Gateway and link to the newly created VPC, attach the Route table on each newly created Subnet. This helps us to connect to the outside. image.png


Then we will pre-initialize an EC2 instance, the reason we create an instance first is because we will configure it once, then we just need to create an AMI on the newly created instance to be able to use it. create more instances that already have our configurations set up for us.

After the initialization is complete, you can ssh into it, pay attention to make sure that in the security group your IP is allowed with port 22.

Next, I need to install the necessary packages and services, you run the following command

Note we also need to disable RAM Swap as it can cause errors

In order for the servers to call each other easily, in the hostname file, add this line below

In the <your-ip-address> places, please add your own private ips of the remaining servers, in my example, I will add 6 private ips for my remaining 6 servers in turn.

In the next steps, we will download Zookeeper and kafka to the server

After the download is complete, run zookeeper test (notice all the configuration files are located in the config folder given by kafka).

When the screen appears like this, we have successfully run zookeeper on the server image.png But running like this is not good, we need to set it up a bit to be able to start or stop it as a service running in the background.

Please run the command below

With the nano command, copy the content from the file under the path /zookeeper/zookeeper in your repo, After running successfully, you can start the service with the command

To prove the service is running, run the command nc -vz localhost 2181


And to turn off the service we run sudo service zookeeper stop


I use the command nc -vz localhost 2181 to check if the service is working or not, then I can see that the service setup was successful. So we have officially set up a server running Zookeeper and then the next thing we will proceed to clone it to create 2 more servers with the same configuration.

Setup Zookeeper Cluster

First, we need to create the AIM from the previous Instance. Create more Instance and start Zookeeper service. Stop the running instance, go to Action and click Create image . image.png

After creating the AIM, we can re-initialize the instances with pre-set configurations, notice in the network section, the Primary IP section, you can choose the private ip as appropriate to the IP4 block of the subnet. you choose (note this address is the host ip in the /etc/hosts file above)


So we already have 3 instances running Zookeeper, but first we need to check the connection between these 3 children to determine if the Network we set up earlier is correct or not. First, you need to SSH into all 3 servers. Like the above test using nc -vz localhost 2181 , this time I will change localhost to hostnames in the cluster in turn. For example, I ssh into zookeeper1 and run the command nc -vz zookeeper2 2181 , the results return:


So our network setup and cluster config is almost done. The next thing is to setup the directory as well as check whether the cluster is working properly or not.

In the Kafka directory, the Zookeeper configuration file is located in kafka/config/, with the following content.

Each Zookeeper must have the /data/zookeeper path as configured for dataDir above, so we will create the path as well as change the user role for it.

For each Zookeeper in the cluster, there should be a file called myid with a unique ID. For Zookeeper 1:

And Zookeeper 2 and 3 do the same (note the order needs to be the same as the server order in the config file).

So the cluster setup is done, now check if they work together or not. You run the shell on zookeeper 1 bin/ zookeeper1:2181 run the command create /my-node “testing”

image.png After creating a node on zookeeper 1, try the zookeeper 2 shell to see if there is a node that has just been created


It can be seen that my-node has also been created in zookeeper 2. You can also run the command echo stat | nc <hostname> <port> . This will show the current state of the ZooKeeper cluster, including information about connected clients, number of live nodes, and more. If the connection between the servers is working fine, you will see a message indicating that the connection is active and the cluster is working properly. In my example


So we have successfully configured the Zookeeper cluster. The next part will create a Kafka cluster and practice on the whole cluster. Good bye and good luck.


Share the news now

Source : Viblo