The essence of service registry is to decouple service providers and service consumers . For any micro service , In principle, there should be or support multiple providers , This is determined by the distributed nature of the microservice . Further more , In order to support the elastic expansion and contraction characteristics , The number and distribution of a micro service provider is often dynamic , It can't be determined in advance . therefore , Originally used in the monomer application phase static LB The mechanism is no longer applicable , Additional components need to be introduced to manage the registration and discovery of microservice providers , And this component is the service registry ;
CAP Theory is an important theory in distributed architecture
Uniformity (Consistency) ( All nodes have the same data at the same time )
Usability (Availability) ( Ensure that every request, whether successful or unsuccessful, responds )
Separation tolerance (Partition tolerance) ( The loss or failure of any information in the system will not affect the continuous operation of the system )
About P The understanding of the , I think it's part of the whole system , Hang up , Or it's down , It does not affect the operation or use of the whole system ,
And usability is , A node of a system is down , But it doesn't affect the system's acceptance or request ,CAP It's impossible to take all of them , You can only take one of them 2 individual ;
The reason is that if C If it's the first demand , Then it will affect A Performance of , Because it's about data synchronization , Otherwise, the result of the request will be different , But data synchronization takes time , In the meantime, availability will decrease .
If A It's the first need , So as long as there is a service in , Can accept the request normally , But there is no guarantee for the return result , as a result of , In distributed deployment , The process of data consistency can't be as fast as switching lines .
If , Meet both consistency and availability , So partition fault tolerance is hard to guarantee , That's a single point , It's also the basic core of distribution , Okay , Understand these theories , You can select service registration and discovery in the corresponding scenario
Service registry solutions
Design or select a service registry , The first thing to consider is the service registration and discovery mechanism . Throughout the current mainstream service registry solutions , It can be roughly classified into three categories :
In application ： Directly integrated into the application , The registration and discovery of services depends on the application itself , The most typical is Netflix Provided Eureka
Out of application ： Think of applications as black boxes , Register the service to the registry through some mechanism outside the application , Minimize intrusion into applications , such as Airbnb Of SmartStack,HashiCorp Of Consul
DNS： Register the service as DNS Of SRV Record , Strictly speaking , It's a special way to register outside the application ,SkyDNS Is one of the representatives ;
For the first type of registration , except Eureka This one-stop solution , It can also be based on ZooKeeper perhaps Etcd Self implement a set of service registration mechanism , It's common in big companies , But for small companies, the price / performance ratio is obviously too low .
because DNS Inherent cache defects , This article does not discuss the third type of registration .
In addition to the basic service registration and discovery mechanism , From the perspective of development and operation and maintenance , At least five aspects should be considered :
Activity test ： After the service is registered , How to test the service to ensure its availability ？
Load balancing ： When there are multiple service providers , How to balance the load of each provider ？
Integrate ： On the service provider or the caller side , How to integrate the registry ？
Runtime dependency ： After the introduction of the registry , What is the impact on the runtime environment of the application ？
Usability ： How to ensure the availability of the registry itself , Especially eliminating single point of failure ？
Mainstream registry products
Apache Zookeeper --> CP
And Eureka Somewhat different ,Apache Zookeeper Follow closely when designing CP principle , At any time Zookeeper Can get consistent data results , At the same time, the system has fault tolerance for network segmentation , however Zookeeper There is no guarantee that every service request will be reachable .
from Zookeeper The actual application of , In the use of Zookeeper When getting the list of services , If at this time Zookeeper In the cluster Leader It's down. , The cluster is going to be Leader The election of , Or, Zookeeper More than half of the server nodes in the cluster are not available （ For example, there are three nodes , If node one detects that node three is hung , Node 2 also detects that node 3 is hung , Then this node is really hung up ）, Then the request will not be processed . So ,Zookeeper Service availability is not guaranteed .
Of course , In most distributed environments , Especially when it comes to data storage , Data consistency should be guaranteed first , This is also Zookeeper Design compliance CP Another reason for the principle .
But for service discovery , It's not the same , For the same service , Even if different nodes in the registry hold different service provider information , And it won't have catastrophic consequences .
Because for service consumers , Consumption is the most important thing , Although consumers may get incorrect service instance information and try to consume it , It's better than not spending because you can't get instance information , Cause the system to be exceptionally good （ Taobao's double eleven , Jingdong's 618 It's just to follow closely AP The best reference for ）;
When master When a node loses contact with another node due to a network failure , The remaining nodes will be rerun leader The election . The problem lies in , The election leader The time is too long ,30~120s, And throughout the election period zk Clusters are not available , This caused the registration service to crash during the election .
In a cloud deployment environment , Because of network problems zk Cluster lose master Nodes are high probability events , Although the service can eventually be restored , But it is intolerable that the registration will not be available for a long time due to the long election incident .
Spring Cloud Eureka --> AP
Spring Cloud Netflix In the design Eureka Follow the rules when you need to AP principle （ Although now 2.0 Released , But because of its closed source , But at the moment Ereka 1.x It's still quite active ）.
Eureka Server You can also run multiple instances to build a cluster , Solve a single problem , But it's different from ZooKeeper The election of leader The process of ,Eureka Server It's using Peer to Peer Peer to peer communication . This is a decentralized architecture , nothing master/slave Points , every last Peer They are all equal . In this architectural style , Nodes register with each other to improve availability , Each node needs to add one or more valid serviceUrl Point to other nodes . Each node can be treated as a copy of the other nodes .
In a cluster environment, if a certain set Eureka Server Downtime ,Eureka Client Will automatically switch to the new Eureka Server Node , When the down server is restored ,Eureka It will be included in the server cluster management again . When the node begins to accept client requests , All operations are replicated between nodes （replicate To Peer） operation , Copy the request to the Eureka Server Of all other nodes currently known .
When a new Eureka Server After the node is started , Will first try to get all the registration list information from the neighboring nodes , And complete initialization .Eureka Server adopt getEurekaServiceUrls() Method to get all nodes , And it will be updated regularly through heartbeat contract .
By default , If Eureka Server The heartbeat of a service instance is not received within a certain period of time （ The default period is 30 second ）,Eureka Server The instance will be deregistered （ The default is 90 second , eureka.instance.lease-expiration-duration-in-seconds Make custom configuration ）.
When Eureka Server When nodes lose too much heartbeat in a short time , Then this node will go into self-protection mode .
Eureka In the cluster , As long as you have one Eureka still , The registration service is guaranteed to be available （ Guaranteed availability ）, The information may not be up to date （ Strong consistency is not guaranteed ）. besides ,Eureka There is also a self-protection mechanism , If in 15 Within minutes 85% None of the nodes had a normal heartbeat , that Eureka The client and registry are considered to have a network failure , Here's what happens ：
1 . Eureka No longer remove services that have expired due to a long period of no heartbeat from the registry ;
2 . Eureka Still able to accept new service registration and query requests , But it will not be synchronized to other nodes （ This ensures that the current node is still available ）;
3 . When the network is stable , The newly registered information of the current instance will be synchronized to other nodes ;
therefore ,Eureka It can well cope with the situation that some nodes lose contact due to network failure , And don't like zookeeper That paralyzes the entire registration service .
Consul yes HashiCorp Open source tools launched by the company , Service discovery and configuration for distributed systems .Consul Use Go Language writing , Therefore, it has natural portability （ Support Linux、windows and Mac OS X）.
Consul Built-in service registration and discovery framework 、 Distributed consistency protocol implementation 、 health examination 、Key/Value Storage 、 Multi-data center solution , No need to rely on other tools （ such as ZooKeeper etc. ）, It's easy to use .
Consul follow CAP In principle CP principle , Strong consistency and partition fault tolerance are guaranteed , And it uses Raft Algorithm , Than zookeeper The use of Paxos The algorithm is simpler . Although strong consistency is guaranteed , But the availability goes down accordingly , For example, it takes a little longer to sign up , because Consul Of raft The protocol requires that more than half of the nodes are successfully written before the registration is successful ; stay leader After hanging up , Re elect leader It would lead to Consul Service not available ;
Consul In essence, it belongs to the registration mode outside the application , But it can go through SDK Simplify the registration process . And service discovery is the opposite , Default depends on SDK, But it can go through Consul Template（ As we'll see ） Remove SDK rely on .
Consul, The default service caller needs to rely on Consul SDK To discover services , There is no guarantee of zero intrusion into the application .
Fortunately, I passed Consul Template, You can start from Consul The cluster gets the latest list of service providers and refreshes LB To configure （ such as nginx Of upstream）, So for service callers , Just configure a unified service call address
Consul Strong consistency (C) What it brings is :
- Service registration compared to Eureka It will be a little slower . because Consul Of raft The protocol requires that more than half of the nodes are successfully written before the registration is successful
- Leader Hang up , The whole period of re-election consul Unavailable . It guarantees strong consistency at the expense of usability .
Eureka Guaranteed high availability (A) And final conformity ：
1 . Service registration is relatively fast , Because you don't need to wait for registration information replicate To other nodes , There is no guarantee that the registration information is replicate success
2 . When the data is inconsistent , although A, B Registration information on is not identical , But every Eureka The node can still provide normal external services , This will appear when querying the service information if the request A We can't find it , But the request B You can find out . This guarantees availability at the expense of consistency .
other aspects ,eureka That's it. servlet Program , Run in the servlet In the container ; Consul It is go Written .
Nacos It's open source from Alibaba ,Nacos Support based on DNS And based on RPC Service discovery of . stay Spring Cloud Use in Nacos, Just download Nacos And start the Nacos server,Nacos It only needs simple configuration to complete the service registration and discovery .
Nacos In addition to the registration discovery of the service , It also supports dynamic configuration of services . Dynamic configuration services allow you to centralize 、 Manage application configuration and service configuration of all environments in an external and dynamic way . Dynamic configuration eliminates the need to redeploy applications and services when configuration changes , Make configuration management more efficient and agile . Configuration centralized management makes it easier to implement stateless Services , Make it easier for services to scale elastically on demand .
In a word, it is Nacos = Spring Cloud Registry Center + Spring Cloud Configuration center .