Deep-dive into Kubernetes Scheduling

3 min readApr 3, 2022

When deploying large-scale applications with hundreds of micro-services In kubernetes, controlling the placements and priorities of the workloads is necessary due to several constraints. Kubernetes offers multiple ways to do it.

In this article, we are covering some of the ways to control the placement of pods in kubernetes

Affinity
Taints
Priority classes

Affinity:

Using Affinities one can control the placement of pods based on the labels assigned to pods/nodes in the kubernetes cluster. Currently, there are two types of Affinity policies requiredDuringSchedulingIgnoredDuringExecution and prefferedDuringSchedulingIgnoredDuringExecution.

requiredDuringSchedulingIgnoredDuringExecution:

When a workload is created with this policy, the given rules must be followed for the pod to be scheduled else it will remain in the pending state

prefferedDuringSchedulingIgnoredDuringExecution:

When a workload is created with this policy, the scheduler will first try to satisfy the affinity rule, then schedule the pod even if it couldn’t satisfy the affinity rules hence the pod will be running.

Affinities are categorized into two types

Pod Affinities
Node Affinities

Using affinities one can attract or repel the pods. Affinities are used to attract the pods where antiAffinites are used to repel the pods.

Sample definition of different types of Affinities.

spec:
  affinity:
    nodeAffinity:
      #### Attracts the pods to schedule on the node
      requiredDuringSchedulingIgnoredDuringExecution:
      ##### strict enforcement of the policy 
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux 
            ##### pods will be scheduled on the node with this label kubernetes.io/os: linux
    podAntiAffinity: 
    #### Prevent the pods to schedule on the same node 
      preferredDuringSchedulingIgnoredDuringExecution: 
      ##### No strict enforcement of the policy
      - weight: 100  
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security 
              operator: In 
              values:
              - S2 
              ##### pods with this label will not fall on the same node
          topologyKey: kubernetes.io/hostname

Taints:

Taints are opposite to node affinities, it repels the pods from a particular node. If pods need to be scheduled on the tainted node, tolerations need to be specified for the pods.

A node can be tainted using the following command

kubectl taint nodes kube-node-0 mode=maintenance:NoSchedule

After adding the above taint, only the pods with the tolerations that match the taint mode=maintenance can be scheduled to node kube-node-0.

To add the toleration to the pod, add the below snippet to the podSpec.

tolerations:
- key: "mode"
  operator: "Equal"
  value: "maintenance"
  effect: "NoSchedule"

Priority classes:

Using priority classes, priorities can be assigned to pods it will be an integer value higher the value greater the priority while scheduling the pods, the Kube scheduler will place all the pending pods in the queue and selects the pods with the top priority, and assign it to a node first and then tries to schedule all the lower priority pods at the last.

The pods with the higher priorities can also preempt the pods with the lower priority if they can not be scheduled on any of the nodes, to change this default behavior add preemptionPolicy: Never option to the priority class.

Sample creation of priority class

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: diamond
value: 1000
description: "PriorityClass for all the core application pods."

To assign this PriorityClass to a workload, the PriorityClassName name should be specified.

Sample pod with the PriorityClass

apiVersion: v1
kind: Pod
metadata:
  name: frontend-pod
  labels:
    app: frontend-pod
spec:
  containers:
  - name: frontend-pod
    image: nginx
  priorityClassName: diamond

PriorityClass also has an optional field globalDefault , if this is set to true the pods that are created without any priorityClassName will take the globalDefault PriorityClass.

Deep-dive into Kubernetes Scheduling

Taints:

Priority classes:

Written by Chaithanya Kopparthi