I. Situation
Model of Amazon Elastic Container Service for Kubernetes (EKS) at default creation:
- Normally, Kubernetes CNI will allocate each pod an address in the intra-cluster IP range, without affecting the host network range.
- However, AWS EKS uses the Amazon VPC CNI by default, which assigns each pod an IP address from the primary subnet (the subnet to which the primary ENI is attached), which is usually the host/node’s network range. This is a problem if the CIDR range of the planned host subnet is not enough IPv4 to be assigned to both the pod and all other resources.
- Not only that, the VPC CNI plugin also keeps (receives) a certain amount of IP addresses at each node for quick assignment to new pods. Each instance type has its own limit on the number of network interfaces. Detailed list here . This results in a specific max-pods for each instance type. Details of max-pods refer here .
- In addition, in terms of security, by default, the security group cannot be used separately for the primary network interface (the node’s network port) and the secondary network interfaces (the network port that represents the internal pod), because the VPC CNI plugin will automatically share the security group. for both types of network interfaces.
II. Satisfy EKS’s IP v4 thirst
By expanding with VPC’s 2nd CIDR range incorporating CNI custom networking
1. Secondary CIDR block
Amazon Elastic Container Service for Kubernetes (EKS) has enabled cluster to create in a VPC with IPv4 CIDR block 2 , can use 100.64.0.0/10 and 198.19.0.0/16 ranges exclusively for pods. This network range is outside the commonly used intranet ranges and is not routable on the internet (non-routable). This increases the number of IPs available to the cluster without overlapping the internal IP range .
2. Combination of secondary-CIDR-block and CNI-Custom-networking
The CNI-custom-networking feature is the solution to all the problems outlined in the facts section. Moreover, it also supports the case that nodes in public-subnets want to place pods in private-subnets. Specifically, CNI-custom-networking will assign IPs to nodes and pods or just pods to the VPC’s secondary-CIDR-block. It allows customizing ENIConfig to be allowed to use the network range
3. Step-by-step instructions for converting from existing systems
Prerequisites:
- The user account has enough rights to operate with VPC and EKS.
- Verification preconfigured for AWS CLI and EKS context.
- We need to make sure Amazon VPC CNI is on version 1.6.3-eksbuild.2 or later, by running the following command to check:
1 2 | kubectl describe daemonset aws-node -n kube-system | grep amazon-k8s-cni: | cut -d "/" -f 2 |
If the version is smaller, you need to follow the instructions to update the CNI first.
3.2. Additional configuration secondary-CIDR-block
Before configuring EKS, we need to enable secondary-CIDR-blocks on the VPC and make sure they are properly configured with tags and route table.
There are some limitations on secondary-CIDR-blocks for extending VPC, see details here .
3.2.1. Use CLI
Add secondary-CIDR-blocks The following two statements will add the 100.64.0.0/16 CIDR range to the VPC of the EKS cluster. Change my-eks-cluster to the name of the existing EKS cluster.
1 2 3 | vpc_id=$(aws eks describe-cluster --name my-eks-cluster --query "cluster.resourcesVpcConfig.vpcId" --output text) aws ec2 associate-vpc-cidr-block --vpc-id $vpc_id --cidr-block 100.64.0.0/16 |
Create subnet With an environment with more than 3 instances located on 3 subnets (3 different AZs), it is necessary to create subnets for 3 corresponding AZs. Change my-eks-cluster to the name of the existing EKS cluster.:
1 2 3 4 5 6 7 8 | export POD_AZS=($(aws ec2 describe-instances --filters "Name=tag-key,Values=eks:cluster-name" "Name=tag-value,Values=my-eks-cluster*" --query 'Reservations[*].Instances[*].[Placement.AvailabilityZone]' --output text | sort | uniq)) echo ${POD_AZS[@]} new_subnet_id_1=$(aws ec2 create-subnet --cidr-block 100.64.0.0/19 --vpc-id $vpc_id --availability-zone ${POD_AZS[0]} --query 'Subnet.SubnetId' --output text) | export new_subnet_id_1 new_subnet_id_2=$(aws ec2 create-subnet --cidr-block 100.64.32.0/19 --vpc-id $vpc_id --availability-zone ${POD_AZS[1]} --query 'Subnet.SubnetId' --output text) | export new_subnet_id_2 new_subnet_id_3=$(aws ec2 create-subnet --cidr-block 100.64.64.0/19 --vpc-id $vpc_id --availability-zone ${POD_AZS[2]} --query 'Subnet.SubnetId' --output text) | export new_subnet_id_3 |
3.2.2. Use interface
Add secondary-CIDR-blocks
- Visit: https://ap-southeast-1.console.aws.amazon.com/vpc/home?region=ap-southeast-1#vpcs:
- Right click on the VPC that needs to add secondary-CIDR-blocks
- Select Edit CIDRs
- In the Edit CIDRs interface, select Add new IPv4 CIDR
- Enter 100.64.0.0/16 then select Save
- Visit: https://ap-southeast-1.console.aws.amazon.com/vpc/home?region=ap-southeast-1#CreateSubnet:
- In the VPC ID section, click the VPC with the EKS cluster.
- In the Subnet settings area, create 3 new subnets in turn as shown below, use the Add new subnet button at the bottom right to add 3 subnets:
3.3. Configure Kubernetes
3.3.1 Configure Custom networking
1 2 3 | kubectl set env ds aws-node -n kube-system AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true kubectl describe daemonset aws-node -n kube-system | grep -A5 Environment |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Giao diện kết quả tương tự: Environment: DISABLE_TCP_EARLY_DEMUX: false ENABLE_IPv6: false Mounts: /host/opt/cni/bin from cni-bin-dir (rw) Containers: Environment: ADDITIONAL_ENI_TAGS: {} AWS_VPC_CNI_NODE_PORT_SUPPORT: true AWS_VPC_ENI_MTU: 9001 AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER: false AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: true |
3.3.1.1 Security-group Create a separate security-group and then get the ID or get the security-group ID of the node/cluster to run the following command:
1 2 | export security_group_id="sg-0e83050f797699c99" |
3.3.1.2 Subnet-ID
Note : If using CLI of section 3.2.1, skip this step.
- If using the interface, follow the instructions:
- Access the interface: https://ap-southeast-1.console.aws.amazon.com/vpc/home?region=ap-southeast-1#subnets:
- Get the subnet ID (secondary CIDR) just created to replace the subnet ID and use the correct AZ of that subnet in the command:
1 2 3 4 5 6 7 | export new_subnet_id_1="subnet-95f9d9c99f0fff2ff" export az_1="ap-southeast-1a" export new_subnet_id_2="subnet-930864a80d468721d" export az_2="ap-southeast-1b" export new_subnet_id_3="subnet-999869a89d468799d" export az_2="ap-southeast-1c" |
3.3.1.3 Creating and deploying yaml . files
Note : If using CLI of section 3.2.1, change $az_1 to $POD_AZS[0] and corresponding $AZ_3 to POD_AZS[2]
- Subnet1-AZ1
1 2 3 4 5 6 7 8 9 10 11 12 | cat >$az_1.yaml <<EOF apiVersion: crd.k8s.amazonaws.com/v1alpha1 kind: ENIConfig metadata: name: $az_1 spec: securityGroups: - $security_group_id subnet: $new_subnet_id_1 EOF kubectl apply -f $az_1.yaml |
- Subnet2-AZ2
1 2 3 4 5 6 7 8 9 10 11 12 | cat >$az_2.yaml <<EOF apiVersion: crd.k8s.amazonaws.com/v1alpha1 kind: ENIConfig metadata: name: $az_2 spec: securityGroups: - $security_group_id subnet: $new_subnet_id_2 EOF kubectl apply -f $az_2.yaml |
- Subnet3-AZ3
1 2 3 4 5 6 7 8 9 10 11 12 | cat >$az_3.yaml <<EOF apiVersion: crd.k8s.amazonaws.com/v1alpha1 kind: ENIConfig metadata: name: $az_3 spec: securityGroups: - $security_group_id subnet: $new_subnet_id_3 EOF kubectl apply -f $az_3.yaml |
3.3.1.4 Check
1 2 | kubectl get ENIConfigs |
Get the same result:
1 2 3 4 5 6 | NAME AGE ap-southeast-1a 5m ap-southeast-1b 5m ap-southeast-1c 5m |
3.3.4. Automatic configuration with AZ labels – Availability-Zone-Labels:
You can allow Kubernetes to automatically apply the corresponding ENIConfig to worker-nodes according to the Availability-Zone (AZ). End result: The name of the ENIConfig will correspond to the AZ names for each subnet. (ap-southeast-1a, ap-southeast-1b, ap-southeast-1c). Kubernetes also automatically adds labels: topology.kubernetes.io/zone to the worker-nodes corresponding to the AZs. You can check with the command:
1 2 | kubectl describe nodes | grep 'topology.kubernetes.io/zone' |
The output is similar to:
1 2 3 4 | topology.kubernetes.io/zone=ap-southeast-1a topology.kubernetes.io/zone=ap-southeast-1b topology.kubernetes.io/zone=ap-southeast-1c |
Therefore, we take advantage of this label to automatically apply the ENIConfig corresponding to the AZ of the node with the command:
1 2 | kubectl set env daemonset aws-node -n kube-system ENI_CONFIG_LABEL_DEF=topology.kubernetes.io/zone |
Note: This section label failure-domain.beta.kubernetes.io/zone can be used but is planned to be removed, we should always replace it with label topology.kubernetes.io/zone .
3.4 Apply configuration to worker-nodes
We need to replace all the running worker-nodes with the new node so that it starts automatically with the newly created network configuration. If it is a Dev/test environment, without worrying about downtime, you can freely delete all nodes and open a new node. If it is a production environment, please refer to a few of my suggestions below: NOTE DOWN TIME With EKS clusters running prodution, because the node needs to be terminated, be very careful at this time:
- Deploy at off-peak hours.
- Make sure that each deployment has a minimum number of pods spread across the nodes.
- Rolling-update each node: Open one more to delete one. It is best to open twice the number of existing nodes first, then scale-in to the current number to let the AutoScaling process handle itself. (By default it will delete the oldest nodes first and before that it automatically switches the running pods to another node.)