Why Kubernetes Pod Quality of Service matters

I (painfully) discovered a few weeks ago why specifying how much CPU and RAM a container needs was so important on Kubernetes. In this post, I will explain why ;)

1. The basis

Each container in a Kubernetes pod can specify the following:

CPU request
Memory request
CPU limit
Memory limit

The sum of all the requests and limits of all the containers in a pod defines the pod requests/limits.

When a pod is created, the Kubernetes scheduler selects a node which has enough CPU and memory according to the pod requirements (its requests). On the other hand, a container using more CPU or memory than its limits becomes a candidate for termination, and can therefore be terminated at any time.

2. The Quality of Service

This is where the QoS classes really matter:

a pod with equal limits and requests set for each container will be classified as Guaranteed
a pod with different limits and requests set for each container will be classified as Burstable
a pod without limits and requests set for each container will be classified as Best-Effort.

As you probably guessed it, Guaranteed > Burstable > Best-Effort. A guaranteed pod is garanteed to not be killed until it exceed its limits (or the cluster is under memory pressure and there are no lower priority pods that can be killed). This is definitely what you want for you most important pods.

3. What happened in my case

At Hunter, we moved from OVH to DigitalOcean, and deployed for this occasion our Kubernetes cluster on instances with fewer RAM than our previous nodes. For this reason, our new cluster was under a constant memory pressure, whereas the old one always had plenty of memory left.

We had installed Flannel on our brand new cluster using the official guide, that, as you may see, doesn’t set requests and limits for the kube-flannel container. The result is devastating if you run several pods with requests and limits set.

As explained sooner, the kube-flannel-ds pod will be classified as Best-Effort. For this reason, it will be killed by Kubernetes if needed, making the Kubernetes internal network broken. Pods won’t be able to ping each other, Services won’t be reachable from pods, etc., obviously because the Pod network won’t update the node routing table…

This had never happened on the old cluster (with the same setup) because it had never been under such a memory pressure.

4. How to prevent this

You should always set resources requests and limits to your pods, unless you agree to let Kubernetes kill them if the cluster is under memory pressure.

You should also spread the word every time you follow a tutorial that do no set requests or limit on containers (even if I haven’t been really successful for my part ;)).