Network SLA and QoS in Kubernetes

Updated: Jul 29

A quick glance at QoS (Quality of service) basics


QoS has been used extensively for quite some time especially in routers and switches. The key goal is to enable organizations to prioritize traffic, which includes offering dedicated bandwidth, controlled jitter, and lower latency. The technologies used to ensure this are vital to enhancing the performance of business applications, wide-area networks (WANs), service provider networks and now also increasingly seen in cloud-native environments. Let’s see what are the main building blocks of network QoS.

For the sake of our current discussion, we will try to focus on what is known as traffic policing as this is one of the major QoS features used to enforce SLA (service level agreement).


Traffic policing is used to check whether the incoming traffic at an input port conforms to the traffic rates that have been agreed upon between the customer and the IP network service provider. Traffic policing consists of metering the traffic according to preset traffic rates and marking or remarking the packets based on the outcome of the metering. Packets may need to be dropped depending on the traffic policing.


Typically, traffic policing checks the rate of the incoming traffic with respect to either a single rate referred to as the Committed Information Rate (CIR) or two rates, the CIR and the Peak Information Rate (PIR). To “police” the CIR and the PIR, traffic policing uses three additional auxiliary parameters: the Peak Burst Size (PBS), the Committed Burst Size (CBS) and

the Excess Burst Size (EBS). There are two types of traffic policing - SRTCM and TRTCM


The need of QoS in cloud-native/k8s environment


As we all know, the network is one of the most important components of micro-service architecture. Every Deployment of pods needs to be connected to a hierarchy of other Deployment pods. But different microservice based application deployments( or their respective pods/containers) have different characteristics from a business point of view. Let’s consider the following hypothetical scenario:



In this simple scenario, Pod1B running highly critical business applications can easily get disrupted by a non-critical Pod1C. We can easily understand what happens when there are 1000’s of pods running in a cluster. Apart from the critical issue of causing connectivity and bandwidth disruption, this also causes what is known as the noisy neighbor problem which will adversely affect different performance metrics of a K8s cluster.


Highly efficient QoS for cloud-native workloads with CNI


NetLOX, which provides a CNI for high performance cloud-native environments, implements comprehensive QoS features for avoiding such issues. It is one of the first CNI which works in hybrid-mode architecture:

  • eBPF only mode - Provides a native eBPF/XDP policer implementation which works with a wide range of traffic profiles accurately and efficiently. Run-of-the-mill Policer/Metering implementations usually don’t work well with different traffic profiles (ref)

  • DPU mode - Seamlessly integrates and offloads policer logic using DPU blocks on top of its eBPF engine. The biggest advantage of using DPU is the fact that there is 0% server cpu core usage to run such QoS policies

With advanced QoS features under its belt, it now allows Kubernetes operators to have policy based deployment/pod level network QoS SLA administration.



Example usage in Kubernetes


The following is an example of a QoS policy enabled Deployment using CNI. It creates a ReplicaSet to bring up three nginx Pods with ingress qos admission control set to maximum of 500Mbps :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
template:
   metadata:
     labels:
       app: nginx
       iqp: 500M
   spec:
     containers:
     - name: nginx
       image: nginx:1.14.2
       ports:
       - containerPort: 80           

Deployment can be applied as usual :

# kubectl apply -f nginx-deployment.yaml

Network QoS policy admission control in action


To check and verify how QoS policy works in real life, we create three pods - iperf1 (capped at 500Mbps) and iperf2/iperf3 with no cap :

apiVersion: v1
kind: Pod
metadata:
  name: iperf1
  labels:
    app: iperf
    iqp: 500M
spec:
  containers:
  - image: eyes852/ubuntu-iperf-test:0.5
    command:
      - sleep
      - "3600"
    imagePullPolicy: Always
    name: iperf
    securityContext:
       capabilities:
         add:
           - NET_ADMIN
  restartPolicy: Always
  nodeSelector:
    kubernetes.io/hostname: node3
—
apiVersion: v1
kind: Pod
metadata:
  name: iperf2
  labels:
    app: iperf
spec:
  containers:
  - image: eyes852/ubuntu-iperf-test:0.5
    command:
      - sleep
      - "3600"
    imagePullPolicy: Always
    name: iperf
    securityContext:
       capabilities:
         add:
           - NET_ADMIN
  restartPolicy: Always
  nodeSelector:
    kubernetes.io/hostname: node4
—
apiVersion: v1
kind: Pod
metadata:
  name: iperf3
  labels:
    app: iperf
spec:
  containers:
  - image: eyes852/ubuntu-iperf-test:0.5
    command:
      - sleep
      - "3600"
    imagePullPolicy: Always
    name: iperf
    securityContext:
       capabilities:
         add:
           - NET_ADMIN
  restartPolicy: Always
  nodeSelector:
    kubernetes.io/hostname: node5

Step1 : Confirm the pods created


netlox@nd:~$ kubectl get pods -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES iperf1 1/1 Running 0 2m 10.177.0.223 node3 <none> <none> iperf2 1/1 Running 0 2m 10.177.1.142 node4 <none> <none> iperf3 1/1 Running 0 2m 10.177.2.154 node5 <none> <none>

Step2 : Check traffic performance between pods with no QoS policy

netlox@nd:~$ kubectl exec -i -t iperf3 -- /bin/bash
root@iperf3:/# iperf -s 

netlox@nd:~$ kubectl exec -i -t iperf2 -- /bin/bash
root@iperf2:/#  iperf -c 10.177.2.154 -t 5
------------------------------------------------------------
Client connecting to 10.177.2.154, TCP port 5001
TCP window size: 2.76 MByte (default)
------------------------------------------------------------
[  3] local 10.177.1.142 port 54910 connected with 10.177.2.154 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  9.59 GBytes  9.1 Gbits/sec

Step3 : Check traffic performance between pods with QoS policy

netlox@nd:~$ kubectl exec -i -t iperf3 -- /bin/bash
root@iperf3:/# iperf -s 


netlox@nd:~$ kubectl exec -i -t iperf1 -- /bin/bash
root@iperf1:/#  iperf -c 10.177.2.154 -t 5
------------------------------------------------------------
Client connecting to 10.177.2.154, TCP port 5001
TCP window size: 2.76 MByte (default)
------------------------------------------------------------
[  3] local 10.177.0.223 port 54910 connected with 10.177.2.154 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  9.38 GBytes  496 Mbits/sec

This concludes how we can easily and effectively manage and apply policy based network QoS in cloud-native environments. Although this blog focused on Pod/Deployment level QoS, LoxiCNI provides a wide variety of other QoS features like marking/queuing/scheduling to meet various end-to-end network QoS requirements in Kubernetes. We believe that in days to come, it will play an increasingly important role in trading, banking and telco cloud-native environments.




44 views0 comments

Recent Posts

See All