k8s- Cluster scheduling

 

One About the scheduler

 

1.1 Introduce

Scheduler yes kubernetes The scheduler , The main task is to define pod Assigned to nodes in the cluster . It sounds very simple , But there's a lot to think about :

 

fair : How to ensure that each node can be allocated resources, efficient use of resources : All resources of the cluster are maximized

efficiency : Scheduling performance is better , Be able to deal with large quantities of pod Complete the scheduling work

flexible : Allow users to control the scheduling logic according to their own needs Sheduler It runs as a separate program , It's going to stay strong after it starts API Server, obtain PodSpec.NodeName Empty pod, For each pod Will create a binding, Indicates that the pod Which node should I put it on .

1.2 Scheduling process

The scheduling is divided into several parts : First, filter out the nodes that do not meet the conditions , This process is called predicate ; Then the nodes passed are sorted according to the priority , This is priority ; Finally, select the node with the highest priority . If there is a mistake in any of the steps , Just return the error directly , If in predicate There is no suitable node in the process ,pod Will always be pending state , Keep retrying the schedule , Until some nodes meet the conditions .

Go through this step , If more than one node satisfies the condition , Just go ahead priorities The process : Sort nodes by priority size

Two Node affinity

  • pod.spec.nodeAffinity:

preferredDuringSchedulingIgnoredDuringExecution: Soft strategy

requiredDuringSchedulingIgnoredDuringExecution: Hard strategy ( Must satisfy )

 

  • Key value operation relation

In:label The value of is in a list

NotIn:label The value of is not in a list

Gt:label The value of is greater than a value

Lt:label The value of is less than a value

Exists: Some label There is

DoesNotExist: Some label non-existent

 

2.1 Hard strategy

Must be in k8s-node2 Run a pod

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: wangyanglinux/myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - k8s-node2

2.2 Soft strategy

Want to be in node3 Run a pod, No, node3 It's fine too

apiVersion: v1
kind: Pod
metadata:
  name: affinity1
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: wangyanglinux/myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname 
            operator: In
            values:
            - k8s-node3

3、 ... and pod Affinity

pod.spec.affinity.podAffinity/podAntiAffinity:

 

preferredDuringSchedulingIgnoredDuringExecution: Soft strategy

requiredDuringSchedulingIgnoredDuringExecution: Hard strategy

3.1 Hard strategy

pod-3 This pod And value by node-affinity-pod All in the same node On

apiVersion: v1
kind: Pod
metadata:
  name: pod-3
  labels:
    app: pod-3
spec:
  containers:
  - name: pod-3
    image: wangyanglinux/myapp:v1
  affinity:
    podAffinity: # stay
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - node-affinity-pod
        topologyKey: kubernetes.io/hostname

kubectl get pod --show-labels

3.2 Soft strategy

pod-4 This pod And label it as pod-3 I'm not in the same node Node

 

[[email protected] diaodu]# cat pod4.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-4
  labels:
    app: pod-4
spec:
  containers:
  - name: pod-4
    image: wangyanglinux/myapp:v1
  affinity:
    podAntiAffinity: # be not in
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
         labelSelector:
           matchExpressions:
           - key: app
             operator: In
             values:
             - pod-3
         topologyKey: kubernetes.io/hostname

In a node1, In a node2

Four Stains and tolerance

4.1 Introduce

Node affinity , yes pod A property of ( Preference or rigid demand ), It makes pod Attracted to a particular class of nodes .Taint On the contrary , It makes nodes can exclude A specific kind of pod

Taint and toleration Cooperate with each other , Can be used to avoid pod Assigned to inappropriate nodes . One or more can be applied to each node taint , It means that for those who can't tolerate this taint Of pod, It will not be accepted by this node . If you will toleration be applied to pod On , It means these pod Sure ( But don't ask for ) Is scheduled to have a match taint Node

 

4.2 The stain

https://blog.frognew.com/2018/05/taint-and-toleration.html#%E5%AE%9E%E8%B7%B5kubernetes-master%E8%8A%82%E7%82%B9%E4%B8%8D%E8%BF%90%E8%A1%8C%E5%B7%A5%E4%BD%9C%E8%B4%9F%E8%BD%BD

The stain ( Taint ) The composition of

Use kubectl taint An order can be given to a Node Node set stain ,Node After being set up with a stain, and Pod There is a mutually exclusive relationship between , It can make Node Refuse Pod Scheduling execution of , Even the Node What already exists Pod Drive out .

 

Each stain has one key and value As a label for stain , among value Can be null ,effect Describe the role of stains . At present tainteffect The following three options are supported :

 

NoSchedule : Express  k8s  Will not  Pod  Schedule to  Node  On
PreferNoSchedule : Express  k8s  Will try to avoid  Pod  Schedule to  Node  On
NoExecute : Express  k8s  Will not  Pod  Schedule to  Node  On , At the same time  Node  Existing on  Pod  Drive out 

 

  • Look at the stain

 

[[email protected] diaodu]# kubectl describe nodes k8s-master01|grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule


 

Remove the stain  

 

kubectl taint nodes --all node-role.kubernetes.io/master:NoSchedule
#  If the above report is wrong , Just use the following
kubectl taint nodes k8s-master01 node-role.kubernetes.io/master-

  Set stain

 

#  The master node sets the stain
kubectl taint nodes k8s-master01 node-role.kubernetes.io/master=:NoSchedule
kubectl taint nodes k8s-master01 node-role.kubernetes.io/master=:PreferNoSchedule
kubectl taint nodes k8s-master01 node-role.kubernetes.io/master=:PreferNoSchedule
#  Set the taint from the node
kubectl taint nodes k8s-node1 key=hu:NoExecute

 

After setting up , Found in node1 above pod, All disappeared.

Set before :

After setting :

 

  • Remove the stain

Look at the stain :

 

kubectl taint nodes k8s-node1 key:NoExecute-

 

Then check again , I found that the stain was gone

 

4.3 tolerate

In a word , Namely pod Set up tolerance , Even if node There's a stain , It can also be distributed

 

With a stain on it Node Based on the taint Of effect:NoSchedule、PreferNoSchedule、NoExecute and Pod There is a mutually exclusive relationship between ,Pod Will not be dispatched to Node On . But we can be in Pod Set tolerance on ( Toleration ) , It means that tolerance is set Pod Will tolerate the existence of stains , Can be scheduled to have a tainted Node On .

 

Example :

 

apiVersion: v1
kind: Pod
metadata:
  name: pod-3
  labels:
    app: pod-3
spec:
  containers:
  - name: pod-3
    image: wangyanglinux/myapp:v1
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "hu"
    effect: "NoExecute"
    tolerationSeconds: 3600 # Express 3600 Seconds before it's deleted 

among key, vaule, effect To work with Node Set up taint bring into correspondence with

operator The value of is Exists Will be ignored value value

 

tolerationSeconds Used to describe when Pod When you need to be expelled, you can do it in Pod Keep running time on

4.3.1 When you don't specify key When the value of , Show tolerance for all stains key:

 

tolerations:
- operator: "Exists"

4.3.2、 When you don't specify effect When the value of , It means tolerating all the tainted effects

 

tolerations:
- key: "key"
  operator: "Exists

 

4.3.3 There are many. Master In existence , Prevent waste of resources , It can be set as follows

Try not to distribute in master above , If node Nodes are not enough , In distribution

 

kubectl taint nodes k8s-master01 node-role.kubernetes.io/master=:PreferNoSchedule

 

5、 ... and Fixed nodes

5.1 according to node Node's host name selection

Pod.spec.nodeName take Pod Dispatch directly to the specified Node Node , Will skip Scheduler The scheduling strategy of , The matching rule is to force matching

 

[email protected] diaodu]# cat pod5.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeName: k8s-node1
      containers:
      - name: myweb
        image: wangyanglinux/myapp:v1
        ports:
        - containerPort: 80

Then go and see the effect

 

5.2 according to node Label to choose

 

Set the label

kubectl label node k8s-node2 disk=ssd

 

View tab

 

Example :

 

[[email protected] diaodu]# cat label.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeSelector:
        disk: ssd # label
      containers:
      - name: myweb
        image: wangyanglinux/myapp:v1
        ports:
        - containerPort: 80

The results are all running in node2 above