Scheduling

Exam relevance: CKA ✅ (Workloads & Scheduling — 15%) | CKAD ✅ (Application Environment — 25%)

How the Scheduler Works (Recap)

When a Pod has no nodeName, the scheduler:

Filters — eliminates nodes that can't run the pod
Scores — ranks remaining nodes by preference
Binds — assigns the pod to the best node (sets nodeName)

You can influence every step of this process.

nodeSelector — Simple Node Selection

The simplest way to constrain a pod to specific nodes. Match nodes by their labels.

# Add label to a node
kubectl label node worker-1 disktype=ssd

# Verify
kubectl get nodes --show-labels
kubectl get nodes -l disktype=ssd

apiVersion: v1
kind: Pod
metadata:
  name: fast-app
spec:
  nodeSelector:
    disktype: ssd                    # Only schedule on nodes with this label
  containers:
  - name: app
    image: nginx

If no node matches, the pod stays Pending.

Node Affinity — Advanced Node Selection

Node affinity is a more expressive version of nodeSelector. It supports:

Required rules (must match — hard constraint)
Preferred rules (try to match — soft constraint)
Set-based operators (In, NotIn, Exists, DoesNotExist, Gt, Lt)

requiredDuringSchedulingIgnoredDuringExecution

Pod MUST be scheduled on a matching node. If no node matches, pod stays Pending.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-app
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: gpu
            operator: In
            values:
            - "true"
            - "yes"
  containers:
  - name: app
    image: tensorflow/tensorflow:latest-gpu

preferredDuringSchedulingIgnoredDuringExecution

Scheduler TRIES to place pod on matching node, but will place it elsewhere if no match.

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80                     # Higher weight = stronger preference (1-100)
        preference:
          matchExpressions:
          - key: zone
            operator: In
            values:
            - us-east-1a
      - weight: 20
        preference:
          matchExpressions:
          - key: zone
            operator: In
            values:
            - us-east-1b

Operators

Operator	Meaning
`In`	Label value is in the list
`NotIn`	Label value is NOT in the list
`Exists`	Label key exists (any value)
`DoesNotExist`	Label key does NOT exist
`Gt`	Label value is greater than (numeric)
`Lt`	Label value is less than (numeric)

"IgnoredDuringExecution" — What Does It Mean?

If a node's labels change AFTER a pod is already running, the pod is NOT evicted. The rule only applies at scheduling time. (There is a planned RequiredDuringExecution that would evict — not yet stable.)

Taints and Tolerations

Taints are applied to nodes. They repel pods unless the pod has a matching toleration.

Think of it as: nodes say "stay away unless you can tolerate me."

Applying Taints

# Taint a node
kubectl taint nodes worker-1 key=value:NoSchedule

# Examples
kubectl taint nodes worker-1 env=production:NoSchedule
kubectl taint nodes worker-1 gpu=true:NoExecute

# Remove a taint (add - at the end)
kubectl taint nodes worker-1 env=production:NoSchedule-

# View taints on a node
kubectl describe node worker-1 | grep Taints

Taint Effects

Effect	Behavior
`NoSchedule`	New pods without toleration won't be scheduled here. Existing pods stay.
`PreferNoSchedule`	Scheduler avoids this node but will use it as last resort.
`NoExecute`	New pods won't schedule AND existing pods without toleration are evicted.

Adding Tolerations to Pods

apiVersion: v1
kind: Pod
metadata:
  name: production-app
spec:
  tolerations:
  - key: "env"
    operator: "Equal"          # key=value must match exactly
    value: "production"
    effect: "NoSchedule"
  containers:
  - name: app
    image: nginx

Toleration Operators

Operator	Meaning
`Equal`	Key and value must match exactly
`Exists`	Only the key needs to exist (value is ignored)

Special Tolerations

# Tolerate ALL taints with a specific key (any value, any effect)
tolerations:
- key: "env"
  operator: "Exists"

# Tolerate EVERYTHING (run anywhere — used by DaemonSets)
tolerations:
- operator: "Exists"

Built-in Taints

Kubernetes automatically adds these taints:

Taint	When
`node.kubernetes.io/not-ready`	Node is not ready
`node.kubernetes.io/unreachable`	Node is unreachable
`node.kubernetes.io/memory-pressure`	Node is low on memory
`node.kubernetes.io/disk-pressure`	Node is low on disk
`node.kubernetes.io/pid-pressure`	Node has too many processes
`node.kubernetes.io/unschedulable`	Node is cordoned
`node-role.kubernetes.io/control-plane:NoSchedule`	Control plane node

Taints + Tolerations vs Node Affinity

They solve different problems:

Mechanism	Who decides?	What it does
Taints/Tolerations	Node says "keep out"	Repels pods FROM a node
Node Affinity	Pod says "I want that node"	Attracts pods TO a node

To guarantee a pod runs ONLY on specific nodes, use both:

Taint the nodes → keeps other pods away
Add toleration + nodeAffinity on your pod → ensures it goes there AND can tolerate the taint

Pod Affinity and Anti-Affinity

Pod affinity/anti-affinity schedules pods based on which other pods are already running on a node.

Pod Affinity — "Schedule near this pod"

spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - cache
        topologyKey: kubernetes.io/hostname    # Same node

This says: "Schedule this pod on a node that already has a pod with label app=cache."

Pod Anti-Affinity — "Schedule away from this pod"

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - webapp
        topologyKey: kubernetes.io/hostname    # Different node

This says: "Don't schedule this pod on any node that already has a pod with label app=webapp." Common for spreading replicas across nodes for high availability.

topologyKey

Defines the "zone" for affinity rules:

topologyKey	Meaning
`kubernetes.io/hostname`	Same/different node
`topology.kubernetes.io/zone`	Same/different availability zone
`topology.kubernetes.io/region`	Same/different region

Practical Example: Spread Web Pods Across Nodes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: webapp
              topologyKey: kubernetes.io/hostname
      containers:
      - name: webapp
        image: nginx

Topology Spread Constraints

More fine-grained control over how pods are spread across topology domains.

spec:
  topologySpreadConstraints:
  - maxSkew: 1                                  # Max difference in pod count between zones
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule            # or ScheduleAnyway
    labelSelector:
      matchLabels:
        app: webapp

maxSkew: 1 means the difference in pod count between any two zones can't exceed 1
Ensures even distribution across zones

Resource Requests and Limits

Resources affect scheduling — the scheduler uses requests to decide which node has capacity.

How They Work

	Requests	Limits
CPU	Guaranteed minimum. Scheduler uses this to find a node.	Maximum CPU. Pod gets throttled if exceeded.
Memory	Guaranteed minimum. Scheduler uses this.	Maximum memory. Pod gets OOMKilled if exceeded.

containers:
- name: app
  image: nginx
  resources:
    requests:
      cpu: "250m"         # 250 millicores = 0.25 CPU
      memory: "128Mi"     # 128 mebibytes
    limits:
      cpu: "500m"         # Throttled above this
      memory: "256Mi"     # OOMKilled above this

CPU Units

Value	Meaning
`1`	1 full CPU core
`500m`	Half a CPU core
`100m`	1/10th of a CPU core
`0.1`	Same as 100m

Memory Units

Value	Meaning
`128Mi`	128 mebibytes (1 Mi = 1,048,576 bytes)
`1Gi`	1 gibibyte
`128M`	128 megabytes (1 M = 1,000,000 bytes) — use Mi instead

QoS Classes

Kubernetes assigns a QoS class based on how you set requests and limits:

Class	Condition	Eviction Priority
Guaranteed	All containers have requests = limits (both CPU and memory)	Last to be evicted
Burstable	At least one container has a request or limit set	Middle
BestEffort	No requests or limits set on any container	First to be evicted

When a node runs out of memory, Kubernetes evicts BestEffort pods first, then Burstable, then Guaranteed.

LimitRange — Default Resource Boundaries

A LimitRange sets default and max/min resource values for a namespace. Pods that don't specify resources get the defaults.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: dev
spec:
  limits:
  - type: Container
    default:                  # Default limits (applied if not specified)
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:           # Default requests (applied if not specified)
      cpu: "100m"
      memory: "128Mi"
    max:                      # Maximum allowed
      cpu: "2"
      memory: "1Gi"
    min:                      # Minimum allowed
      cpu: "50m"
      memory: "64Mi"
  - type: Pod
    max:
      cpu: "4"
      memory: "2Gi"

ResourceQuota — Namespace Resource Caps

A ResourceQuota limits the total resources consumed across all pods in a namespace.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: dev
spec:
  hard:
    requests.cpu: "10"              # Total CPU requests across all pods
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"                      # Max number of pods
    services: "10"
    persistentvolumeclaims: "20"
    configmaps: "50"
    secrets: "50"
    services.nodeports: "5"

When a ResourceQuota is set, all pods MUST specify resource requests/limits — otherwise creation is rejected. Use LimitRange to auto-apply defaults.

# Check quota usage
kubectl describe resourcequota dev-quota -n dev

Manual Scheduling (nodeName)

Bypass the scheduler entirely by setting nodeName:

spec:
  nodeName: worker-2              # Pod goes directly to this node
  containers:
  - name: app
    image: nginx

Warning: If the node doesn't exist or has no capacity, the pod fails. No filtering or scoring happens.

Priority and Preemption

Higher-priority pods can preempt (evict) lower-priority pods when nodes are full.

# Create a PriorityClass
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000                    # Higher = more priority
globalDefault: false
preemptionPolicy: PreemptLowerPriority    # or Never
description: "Critical workloads"

# Use it in a pod
spec:
  priorityClassName: high-priority
  containers:
  - name: critical-app
    image: nginx

Key Takeaways

nodeSelector is simple: label nodes, match in pod spec
Node affinity is powerful: required vs preferred, set-based operators
Taints repel, tolerations allow — they work together
Use taints + affinity together to dedicate nodes to specific workloads
Pod anti-affinity spreads replicas for HA — common exam pattern
Requests affect scheduling; limits affect runtime enforcement
CPU is throttled when limit exceeded; memory causes OOMKill
LimitRange sets defaults; ResourceQuota sets namespace totals
Guaranteed QoS (requests=limits) is the safest for critical workloads

How the Scheduler Works (Recap)​

nodeSelector — Simple Node Selection​

Node Affinity — Advanced Node Selection​

requiredDuringSchedulingIgnoredDuringExecution​

preferredDuringSchedulingIgnoredDuringExecution​

Operators​

"IgnoredDuringExecution" — What Does It Mean?​

Taints and Tolerations​

Applying Taints​

Taint Effects​

Adding Tolerations to Pods​

Toleration Operators​

Special Tolerations​

Built-in Taints​

Taints + Tolerations vs Node Affinity​

Pod Affinity and Anti-Affinity​

Pod Affinity — "Schedule near this pod"​

Pod Anti-Affinity — "Schedule away from this pod"​

topologyKey​

Practical Example: Spread Web Pods Across Nodes​

Topology Spread Constraints​

Resource Requests and Limits​

How They Work​

CPU Units​

Memory Units​

QoS Classes​

LimitRange — Default Resource Boundaries​

ResourceQuota — Namespace Resource Caps​

Manual Scheduling (nodeName)​

Priority and Preemption​

Key Takeaways​