Abstract ： This article will decrypt K8s Cluster Autoscaler Module architecture and code Deep Dive, And K8s Cluster Autoscaler Huawei cloud plug-in .
Based on the business team (Cloud BU Application platform ) Developing Serverless The engine framework is done in the process K8s Cluster Autoscaler Huawei cloud plug-in . At present, the plug-in has contributed to K8s The open source community , See the picture below ：
This article will cover the following :
1. Yes K8s Cluster Autoscaler Module architecture and code Deep Dive, Especially the introduction of the algorithms involved in the core function points .
2. K8s Cluster Autoscaler Huawei cloud plug-in Module introduction .
3. The author himself participated in K8s Some experience of open source project .（ Such as ： How to get information and help from the open source community , What should be paid attention to in the process of contributing to open source ）
Go straight to the topic , No more details here K8s Basic concepts of .
What is? K8s Cluster Autoscaler (CA)？
What is elastic stretching ？
As the name implies, it is based on the business needs and Strategies of users , Automatically adjust its flexible computing resource management services , Its advantages are ：
1. From the perspective of application developers ： Enables application developers to focus on implementing business functions , There is no need to think too much about system level resources
2. From the system operator's point of view ： Greatly reduce the operation and maintenance burden , If the system design is reasonable, it can be realized “ Zero operations ”
3. It's the realization of Serverless The cornerstone of Architecture , It's also Serverless One of the main characteristics of
In the specific explanation CA Before the concept , Let's first understand from the macro K8s Several ways of elastic scaling supported (CA It's just one of them ）.
K8s Several elastic expansion methods supported ：
notes ： To describe accuracy , When introducing the following key concepts , First reference K8S The official explanation is that the town will come to an end :)." In short " Part is the author's own interpretation .
A set of components that automatically adjust the amount of CPU and memory requested by Pods running in the Kubernetes Cluster. Current state - beta.
In short ： For one POD, Expand and shrink it （ Because there are not many scenarios , Don't introduce too much ）
HPA（Horizontal Pod Autoscaler) - Pod Level scaling
A component that scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with beta support, on some other, application-provided metrics).
In short ： For a certain Node, According to the pre-set scaling policy ( Such as CPU, Memory Usage rate a set threshold value ), increase / Cut out Pods.
- HPA Scaling strategy ：
HPA rely on metrics-server Component collection Pod On metrics, And then according to the preset scaling strategy （ Such as ：CPU The utilization rate is greater than 50%）, To decide whether to expand or shrink Pods. Calculation CPU/Memory Usage rate , It's taking everything Pods Average value . About how to calculate , Click here for a detailed introduction to the algorithm .
notes ：metrics-server By default, only support based on cpu and memory Monitoring index scaling strategy
- HPA Architecture diagram ：
The bottom half of the picture Prometheus Monitoring system and K8s Prometheus Adapter The introduction of components is to be able to use custom metrics To set the scaling strategy , Because it's not the focus of this article , I won't give you too much introduction here , K8s The official document has a Walkthrough The case is to master and understand the module step by step in the actual operation . If users only need to rely on cpu/memory To set the scaling strategy , as long as deploy default metrics-server Components （ Its installation is right K8s It's just one time deployment, Very convenient , The above link contains the installation steps ）
CA (Cluster Autoscaler)- Node Level scaling
A component that automatically adjusts the size of a Kubernetes Cluster so that: all pods have a place to run and there are no unneeded nodes.
In short ： about K8S colony , increase / Delete the Nodes, To achieve the purpose of cluster expansion and reduction .
Kubernetes(K8s) Cluster Autoscaler(CA) Module source code analysis :
So much bedding has been done in front , It's time to get to the point of this article . Next, I will reveal from the two dimensions of architecture and code CA The mysterious veil of modules , And cooperate with FAQ Answer common questions in the form of .
CA The overall architecture and its sub modules
As shown in the figure above , CA The module contains the following sub modules , See K8S CA Modules in Github Source code ：
- autoscaler: Core module , Including the core Scale Up and Scale Down function ( Corresponding Github in core Package).
1. During the expansion ： Its ScaleUp Function will call estimator Module to evaluate the number of nodes required
2. When shrinking ： Its ScaleDown Function will call simulator Module to evaluate the number of nodes in the shrink
- estimator: Responsible for calculating how much capacity expansion needs Node ( Corresponding Github in estimator Package)
- simulator: Responsible for simulation scheduling , Calculate the reduction node ( Corresponding Github in simulator Package)
- expander: In charge of capacity expansion , Choose the right one Node The algorithm of ( Corresponding Github in expander Package), You can add or customize your own algorithms
- cloudprovider: CA The interface module provides to specific cloud providers ( Corresponding Github in cloudprovider Package). The following sub modules will also focus on , It's also Huawei cloud cloudprovider The extension point .
1. autoscaler Connect with specific cloud providers through this module （ As shown in the box at the bottom right corner of the figure above AWS, GCE And cloud providers ）, And you can schedule the Node.
2. cloudprovider Some column interfaces are pre-set , For specific cloud providers to implement , To schedule the Node Purpose
Through to K8s CA Module architecture and the introduction of the fabric of the source code , I summarize the following best practices that are worth learning and learning from , It can be applied to any programming language ：
1. SOLID Design principles are everywhere , Specifically reflected in ：
1. Each sub module is responsible for solving a specific problem only - Single responsibility
2. Each sub module has an extension point reserved - Opening and closing principle
3. The interface isolation of each sub module is very clear - Interface separation principle
Clear organization of sub module packages
Plug in extension point design
About CA Module user FAQs
1. CA and k8s The relationship between other ways of elastic expansion ？
1. VPA Update existing Pod The use of resources
2. HPA Update existing Pod replications
3. If there are not enough nodes to run after a scalability event POD, be CA Will expand the capacity of new Node Into the cluster , I was in Pending State of Pods Will be scheduled to be managed by the new node On
2. CA When to adjust K8S The cluster size ？
- When to expand : When resources are insufficient ,Pod Scheduling failed , That is, being has always been in Pending State of Pod（ See the flow chart on the next page ）, from Cloud Provider Add NODE Into the cluster
- When to shrink : Node The utilization rate of resources is low , And Node There is Pod Can be rescheduled to other Node Up
3. CA How often to check Pods The state of ？
CA every other 10s Check to see if you are in pending State of Pods
4. How to control some Node Don't be CA Delete when shrinking ？
0. Node There are Pod By PodDisruptionBudget Controller limits .PodDisruptionBudgetSpec
1. Node The namespace on is kube-system Of Pods.
2. Node On Pod By Evict There's no place to put , There is no other suitable Node Can schedule this pod
3. Node Yes annotation: “cluster-autoscaler.kubernetes.io/scale-down-disabled”: “true”
4. Node There are the following annotation Of Pod：“cluster-autoscaler.kubernetes.io/safe-to-evict”: “false”. Click for details
If you want to learn more about it , Please click here View a more complete list of FAQs and answers .
CA Module source code analysis
Because of space , Only the core sub module is introduced in depth , This paper introduces other sub modules by combining how to coordinate and cooperate with other sub modules .
CA Module integral entrance
Program start entrance :kubernetes/autoscaler/cluster-autoscaler/main.go
CA Of autoscaler Sub module
As shown in the figure above ,autoscaler.go It's the interface , The default implementation is static_autoscaler.go, The implementation calls scale_down.go and scale_up.go Inside ScaleDown as well as ScaleUp Function to complete the expansion and reduction .
So here comes the question , appropriate ScaleUp and ScaleDown Method will be called , Let's go through it step by step in order , go back to CA The whole entrance , There's a RunOnce（ stay autoscaler The default implementation of the interface static_autoscaler.go in ） Method , Will start a Loop Has been running listen and watch Is there any one in the system that is in pending State of Pods(i.e. Need help finding Node Of Pods), Like the following code snippet (static_autoscaler.go Inside RunOnce function ) Shown , It is worth noting that , In the actual call ScaleUp There will be a few if/else Judge whether or not the specific conditions are met ：
about ScaleDown Function call , Empathy , Also in the RunOnce In the function , ScaleDown The main logic is to follow the following steps ：
1. Identify potential low utilization Nodes （ That's in the code scaleDownCandidates An array variable ）
2. And then to Nodes Inside Pods find “ Next door ”（ That can be placed Nodes, Corresponding to the code podDestinations An array variable ）
3. And then there's the screenshot below , How many? if/else The judgment is in accordance with ScaleDown Conditions , Is executed TryToScaleDown function
Through the above introduction, combined with the code fragment , We learned when ScaleUp/ScaleDown Function will be called . Next , Let's look at when these two core functions are called , What happened inside .
Let's take a look first ScaleUp:
From the above code snippet , And the notes I put in it , You can see , Here's what happened ：
1. adopt cloudprovider Sub module （ The following is a special introduction to this sub module ） Get the ones that can be expanded from the specific cloud provider NodeGroups
2. Put those Unschedulable Pods Group according to the expansion requirements ( Corresponding to the code above buildPodEquivalenceGroups Function call )
3. The first 1 Step to get all the available NodeGroups And the 2 Step by step, to be allocated Pods, As input , Send to estimator The packing algorithm of sub modules ( The call takes place on computeExpansionOption Function call internal ) , Get some candidates Pods Dispatch / Distribution plan . because estimator The core of the sub module is the boxing algorithm , The following figure shows the implementation of the bin packing algorithm Estimate function , Here's a little trick to implement , Just before the algorithm starts , First call calculatePodScore Reduce the two-dimensional problem to one-dimensional problem （ namely Pod Yes CPU and Memory The needs of ）, Then there is the traditional packing algorithm , Two for loop Here it is Pods Find the right Node. How to reduce the dimension , See binpacking.estimator.go Inside calculatePodScore Function source code .
4. The first 3 Step by step to get some solutions , Send to expander Sub module , Get the optimal allocation scheme （ In the corresponding code fragment ExpanderStrategy.BestOption Function call for ）expander Provides the centralized strategy in the following screenshot , Users can implement expander Interface BestOption function , To achieve your own expander Strategy
CA Of cloudprovider Sub module
With specific cloud providers (i.e. AWS, GCP, Azure, Huawei Cloud) Yes, the corresponding cloud platform will be followed by Node Group( Some cloud platforms are called Node Pool) Inside Node Add and delete operations have achieved the purpose of expansion and reduction . Its code corresponds to the same name as cloudprovider package. See Github Code . No cloud provider , All in accordance with k8s Expand in a way that is agreed upon , Develop your own cloudprovider plug-in unit , Here's the picture ：
The following will be a special introduction to how Huawei cloud expands the module .
Hua Wei Yun cloudprovider Plug in development and open source contribution experience
Hua Wei Yun cloudprovider How plug-ins are extended and developed ？
The picture below shows Huawei cloudprovider The general code structure of the plug-in , It's in the green box SDK The actual is CCE（ Cloud container engine CCE） What is necessary for the necessary operation （ Yes Node Pool/Group Inside Node Add and delete ）. We don't have to write this part ourselves, as it is , But because we say CCE Team SDK It's not perfect , So we developed some of the necessary solutions to CCE operational SDK. The point is the code in the red box ：
huaweicloud_cloud_provider.go It's the entrance , He is in charge of huaweicloud_cloud_config.go Reading configuration , And instantiate huaweicloud_manager.go object .huaweicloud_manager.go Object by calling in the blue box department CCE SDK To get CCE The overall message . CCE When the overall information is obtained , You can call huaweicloud_node_group.go To complete the task CCE The binding of Node Group/Pool Conduct Node The expansion and reduction of the volume has reached to the whole CCE Of Node Telescopic .
How to obtain the resources needed from the open source community and the points to be noticed in the open source process ？
When I first started taking on the project , I don't know , I don't know how to start .K8s The document on this piece is not very clear . Past experience and K8s Github README Information provided in , I joined their Slack organization , Find the corresponding interest group channel( My case is sig-autoscaling channel), Raised my question （ As the screenshot below ）. be based on K8s The size of the code store , If you don't find the right extension point , Almost impossible to change and extend .
Focus on ： Now almost all open source groups have Slack group , Join in finding the appropriate interest group , There are a lot of cattle in it , Raise questions , Generally, someone will be enthusiastic about the answer . Mailing lists can also , But I think Slack Be efficient and real-time , Strongly recommend . For the open source projects I normally come into contact with , I usually join it Slack in , Ask questions whenever you have questions . Of course , Open source projects contributed by China , Many communicate in wechat group ：） For example, Huawei open source micro service framework project ServiceComb, I also have wechat groups . All in all , For open source projects , Be sure to find efficient ways to communicate with the organization .
in addition , In the process of contributing code , If you use three-party open source code , Due to copyright and secondary distribution issues , Try to avoid directly including three-party source code , If you really need , It can be extended , And attach Huawei's copyright information and disclaimer to the newly expanded documents .