Practice setting up a kafka cluster on AWS EC2

Thursday, 09/02/2023

Tram Ho

In this article, I will present in detail step by step to be able to set up a cluster of EC2 servers on AWS running Kafka including 3 Zookeeper servers and 3 Bootstrap servers. To read this article, I hope you have some knowledge about AWS Cloud, Linux, and Kafka. If you don’t have it, that’s okay, during the setup process I will try to explain as concisely as possible.

Talking about kafka, you need a separate series to talk about all its theories, in this article we only focus on how to set up a Kafka cluster. Therefore, I only briefly introduce two main and necessary concepts in this article, Zookeeper and Kafka Bootstrap server. Other concepts you can read here

What is Zookeeper?

Zookeeper in Kafka can be understood simply as a place to manage and store information about the cluster, including brokers, topics, partitions and other metadata. It also provides a version control suite to make Kafka cluster management and administration easier. With ZooKeeper support, nodes in a Kafka cluster can communicate with each other to manage operations such as increasing or decreasing the number of brokers, adding or removing topics, etc.

Setup Zookeeper on AWS

Setup Zookeeper is very important because most of the configuration settings are set up at this step, so I will use this whole article to set it up carefully and in detail, and in the next part I will setup 3 Kafka servers remaining. Here are the steps I will take:

Create an AWS account
Setup network security allow port 2181, 2888 and 3888
Setup separate VPC for cluster
Create 3 EC2 instane (t2.medium 4gb ram)

In the first step, you can read here to create yourself an AWS account. After successful creation, you go to EC2 service, this is the place to initialize the servers

The next step is to create a security group so that our servers and machines can connect to each other.

Security group helps us secure at the instance level, the next step I will create a separate VPC to make our cluster more secure on the subnet side.

Go to VPC service, select Create VPC , name it as you like, here I name it my-kafka-vpc with IPv4 CIDR 192.168.1.0/24. Because my kafka cluster needs 6 servers, so please choose a region with at least 6 AZs , in each AZ will be placed 1 EC2 .

Next, we go to the Subnets tab and select Create Subnet and create 6 subnets corresponding to each AZ with the following information:

Create an Internet Gateway and link to the newly created VPC, attach the Route table on each newly created Subnet. This helps us to connect to the outside.

Then we will pre-initialize an EC2 instance, the reason we create an instance first is because we will configure it once, then we just need to create an AMI on the newly created instance to be able to use it. create more instances that already have our configurations set up for us.

After the initialization is complete, you can ssh into it, pay attention to make sure that in the security group your IP is allowed with port 22.

Next, I need to install the necessary packages and services, you run the following command

# Packages
sudo apt-get update &amp;&amp; 
      sudo apt-get -y install wget ca-certificates zip net-tools vim nano tar netcat

# Java Open JDK 8
sudo apt-get -y install openjdk-8-jdk
java -version

# Packages

sudo apt-get update &&

sudo apt-get -y install wget ca-certificates zip net-tools vim nano tar netcat

# Java Open JDK 8

sudo apt-get -y install openjdk-8-jdk

java -version

Note we also need to disable RAM Swap as it can cause errors

# Disable RAM Swap - can set to 0 on certain Linux distro
sudo sysctl vm.swappiness=1
echo 'vm.swappiness=1' | sudo tee --append /etc/sysctl.conf

# Disable RAM Swap - can set to 0 on certain Linux distro

sudo sysctl vm.swappiness=1

echo 'vm.swappiness=1' | sudo tee --append /etc/sysctl.conf

In order for the servers to call each other easily, in the hostname file, add this line below

# Add hosts entries (mocking DNS) - put relevant IPs here
echo "&lt;your-ip-address&gt; kafka1
&lt;your-ip-address&gt; zookeeper1
&lt;your-ip-address&gt; kafka2
&lt;your-ip-address&gt; zookeeper2
&lt;your-ip-address&gt; kafka3
&lt;your-ip-address&gt; zookeeper3" | sudo tee --append /etc/hosts

# Add hosts entries (mocking DNS) - put relevant IPs here

echo "<your-ip-address> kafka1

<your-ip-address> zookeeper1

<your-ip-address> kafka2

<your-ip-address> zookeeper2

<your-ip-address> kafka3

<your-ip-address> zookeeper3" | sudo tee --append /etc/hosts

In the <your-ip-address> places, please add your own private ips of the remaining servers, in my example, I will add 6 private ips for my remaining 6 servers in turn.

In the next steps, we will download Zookeeper and kafka to the server

# download Zookeeper and Kafka. Recommended is latest Kafka (0.10.2.1) and Scala 2.12
wget https://archive.apache.org/dist/kafka/0.10.2.1/kafka_2.12-0.10.2.1.tgz
tar -xvzf kafka_2.12-0.10.2.1.tgz
rm kafka_2.12-0.10.2.1.tgz
mv kafka_2.12-0.10.2.1 kafka
cd kafka/

# download Zookeeper and Kafka. Recommended is latest Kafka (0.10.2.1) and Scala 2.12

wget https://archive.apache.org/dist/kafka/0.10.2.1/kafka_2.12-0.10.2.1.tgz

tar -xvzf kafka_2.12-0.10.2.1.tgz

rm kafka_2.12-0.10.2.1.tgz

mv kafka_2.12-0.10.2.1 kafka

cd kafka/

After the download is complete, run zookeeper test (notice all the configuration files are located in the config folder given by kafka).

# Testing Zookeeper install
# Start Zookeeper in the background
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
bin/zookeeper-shell.sh localhost:2181
ls /

# Testing Zookeeper install

# Start Zookeeper in the background

bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

bin/zookeeper-shell.sh localhost:2181

ls /

When the screen appears like this, we have successfully run zookeeper on the server

Please run the command below

# Install Zookeeper boot scripts
sudo nano /etc/init.d/zookeeper
sudo chmod +x /etc/init.d/zookeeper
sudo chown root:root /etc/init.d/zookeeper
    
# you can safely ignore the warning
sudo update-rc.d zookeeper defaults

# Install Zookeeper boot scripts

sudo nano /etc/init.d/zookeeper

sudo chmod +x /etc/init.d/zookeeper

sudo chown root:root /etc/init.d/zookeeper

# you can safely ignore the warning

sudo update-rc.d zookeeper defaults

With the nano command, copy the content from the file under the path /zookeeper/zookeeper in your repo, After running successfully, you can start the service with the command

sudo service zookeeper start

1 2	sudo service zookeeper start

To prove the service is running, run the command nc -vz localhost 2181

And to turn off the service we run sudo service zookeeper stop

I use the command nc -vz localhost 2181 to check if the service is working or not, then I can see that the service setup was successful. So we have officially set up a server running Zookeeper and then the next thing we will proceed to clone it to create 2 more servers with the same configuration.

Setup Zookeeper Cluster

First, we need to create the AIM from the previous Instance. Create more Instance and start Zookeeper service. Stop the running instance, go to Action and click Create image .

After creating the AIM, we can re-initialize the instances with pre-set configurations, notice in the network section, the Primary IP section, you can choose the private ip as appropriate to the IP4 block of the subnet. you choose (note this address is the host ip in the /etc/hosts file above)

So we already have 3 instances running Zookeeper, but first we need to check the connection between these 3 children to determine if the Network we set up earlier is correct or not. First, you need to SSH into all 3 servers. Like the above test using nc -vz localhost 2181 , this time I will change localhost to hostnames in the cluster in turn. For example, I ssh into zookeeper1 and run the command nc -vz zookeeper2 2181 , the results return:

So our network setup and cluster config is almost done. The next thing is to setup the directory as well as check whether the cluster is working properly or not.

In the Kafka directory, the Zookeeper configuration file is located in kafka/config/zookeeper.properties, with the following content.

dataDir=/data/zookeeper
clientPort=2181
maxClientCnxns=128
initLimit=10
syncLimit=5
tickTime=6000
server.1=&lt;zookeeper_1_IP&gt;:2888:3888
server.2=&lt;zookeeper_2_IP&gt;:2888:3888
server.3=&lt;zookeeper_3_IP&gt;:2888:3888

dataDir=/data/zookeeper

clientPort=2181

maxClientCnxns=128

initLimit=10

syncLimit=5

tickTime=6000

server.1=<zookeeper_1_IP>:2888:3888

server.2=<zookeeper_2_IP>:2888:3888

server.3=<zookeeper_3_IP>:2888:3888

Each Zookeeper must have the /data/zookeeper path as configured for dataDir above, so we will create the path as well as change the user role for it.

# create data dictionary for zookeeper
sudo mkdir -p /data/zookeeper
sudo chown -R ubuntu:ubuntu /data/

# create data dictionary for zookeeper

sudo mkdir -p /data/zookeeper

sudo chown -R ubuntu:ubuntu /data/

For each Zookeeper in the cluster, there should be a file called myid with a unique ID. For Zookeeper 1:

echo "1" &gt; /data/zookeeper/myid

1 2	echo "1" > /data/zookeeper/myid

And Zookeeper 2 and 3 do the same (note the order needs to be the same as the server order in the config file).

So the cluster setup is done, now check if they work together or not. You run the shell on zookeeper 1 bin/zookeeper-shell.sh zookeeper1:2181 run the command create /my-node “testing”

It can be seen that my-node has also been created in zookeeper 2. You can also run the command echo stat | nc <hostname> <port> . This will show the current state of the ZooKeeper cluster, including information about connected clients, number of live nodes, and more. If the connection between the servers is working fine, you will see a message indicating that the connection is active and the cluster is working properly. In my example

So we have successfully configured the Zookeeper cluster. The next part will create a Kafka cluster and practice on the whole cluster. Good bye and good luck.

Reference

https://unixcop.com/kafka-and-zookeeper-ha-cluster-setup/

https://www.clairvoyant.ai/blog/kafka-series-3.-creating-3-node-kafka-cluster-on-virtual-box?hs_amp=true

https://normanlimxk.com/2021/11/01/setup-a-kafka-cluster-on-amazon-ec2/

Share the news now

Source : Viblo

Practice setting up a kafka cluster on AWS EC2

What is Zookeeper?

Setup Zookeeper on AWS

Setup Zookeeper Cluster

Reference

TikTok becomes the second largest social platform in South Africa

The fastest depreciating after 9 months of launch, iPhone 14 Pro Max continues to break the bottom in Vietnam

Beginner's guide to R: Introduction

10 essential SublimeText plugins for JavaScript developers