Redis high availability: you call this sentinel cluster principle

Fan Li 2021-04-08 11:58:17
redis high availability sentinel cluster


Summary

We know 「 Master slave replication is the cornerstone of high availability 」, When the slave library is down, the request can still be sent to the master library or other slave libraries , however Master Downtime , Can only respond to read operations , The write request can no longer be executed .

So the master-slave replication architecture faces a serious problem , The main library is down , Unable to execute 「 Write operations 」, Can't automatically select one Slave Switch to a Master, That is, it can't fail over automatically .

Late at night with my girlfriend ……( Omit here 10000 word ), Sudden downtime , You can't lift your pants up from the bed and switch between master and slave by hand , Then inform other programmers to change the address to the new main library online .

After such a toss, I have been switched from my girlfriend to my ex boyfriend , I can't do it . So we have to have a highly available solution , So ,Redis The government provides a highly available solution —— sentry (Sentinel).

Redis The principle of sentry group

The opening remarks

“ The iteration of technology is very fast , But the thinking precipitated from technology benefits for life . So don't worry about midlife crisis , People who are worried about midlife crisis usually have a hard time growing up . As long as we grow up , As long as our cognition is constantly breaking through , You don't have to worry about midlife crisis , The world always needs those talents . ”

What is a sentry (Sentinel)

“65 Brother : Margo , Although I don't have a girlfriend , however , Prepare for a rainy day, I want to master this sentinel mode , To prevent me from being disturbed with my girlfriend in the middle of the night , Let's talk about the realization principle of sentry . ”

Three sentries are used to form a cluster , Three data nodes ( One master and two slaves ) Way to build , As shown in the figure below :

Redis The sentry cluster

The construction of sentry group The demonstration will not be repeated here , Readers in need can click on the bottom left corner 「 Read the original 」 see .

65 Brother, you've heard of 「 Wudang sect 」 Founder Zhang San is crazy ?Redis Master slave architecture is like Wudang , It's the leader Master. If the leader hangs up , You need to choose an able person from the seven swordsmen of Wudang to be the leader . This requires a department to monitor the life and death of the leader and the life status of other Wudang disciples , And can vote from Wudang disciples to elect a capable person as the new leader , Then a press conference will be held to announce the new leader's message to the world . This 「 department 」 It's the sentry .

Sentinels will encounter the following problems in electing a new leader :

  1. How to judge whether the leader is really dead , It's possible to feign death ;
  2. Which one of Wudang's children to choose as the new leader ?
  3. Inform all Wudang disciples about the new leader through the press conference (slave and master) And the whole Wulin ( client ).

The main task of the sentinel department is : Monitoring the whole Wudang 、 Choose a new leader , Inform the whole Wudang and the whole Wulin .

The main task of the sentinel mechanism

The sentry is Redis A mode of operation of , It's focused on Redis example ( Master node 、 From the node ) Monitoring the operation status of , When the master node fails, a series of mechanisms can be used to select the master and switch between the master and the slave , Achieve failover , Make sure that the whole Redis Availability of the system . combination Redis Of Official documents :https://redis.io/topics/sentinel, You can know Redis Sentinels have the following capabilities :

  • monitor : Continuous monitoring master 、slave Whether it is in the expected working state .
  • Switch the main library automatically : When Master Operational failure , Sentinels start the auto recovery process : from slave Choose one of them as the new master.
  • notice : Give Way slave perform replicaof , With the new master Sync ; And inform the client with the new master Establishing a connection .

Sentinel is also a Redis process , It's just that we don't provide external reading and writing services , Usually, the sentry should be configured as an odd number , Why? ? And listen to 「 Code byte 」 Analyze slowly .

“65 Brother : In the end 「 sentry 」 How does this mysterious department realize these three abilities ? ”

Let's look at the Sentinels from the whole picture , A brief understanding of the whole operation process , Then we will analyze each task in detail . Start with monitoring …...

monitor

Sentinel It's just a special department of Wudang disciples , By default ,Sentinel Pass the message to all Wudang disciples once a second through flying pigeons 、 The leader and the sentry ( Include Master、Slave、 other Sentinel , ) send out PING command , If slave Did not respond within the specified time 「 sentry 」 Of PING command ,「 sentry 」 I thought this guy might be belching , He will be recorded as 「 Offline status 」;

If master The leader didn't respond at the specified time 「 sentry 」 Of PING command , The sentry decided that the leader was off the line , Start execution 「 Automatic switch master representative or leader in a certain field 」 The process of .

PING There are two ways to reply to an order :

  1. Valid responses : return +PONG、-LOADING、-MASTERDOWN Any kind of ;
  2. Invalid response : A reply other than a valid reply , Or return any reply within a specified time .

“65 Brother : How do sentinels judge 「 representative or leader in a certain field 」 Hiccups ? What should I do if the leader swindles the corpse ? ”

In order to prevent the leader from 「 Feign death 」,「 sentry 」 Designed 「 Subjective offline 」 and 「 Objective offline 」 Two signals .

Subjective offline

Sentinels use PING Command to detect the leader 、 slave The state of life . If it's an invalid reply , The sentry marked this guy as 「 Subjective offline 」. It's Wudang boy detected , That is to say slave role . Then mark it directly 「 Subjective offline 」.

because master The leader is still ,slave My belch has little influence on Wudang . It's still open for meetings , Martial arts and swordsmanship 、 Eat and drink hot …...

If it's detected to be master The leader is finished , At this time, the Sentry can't simply mark 「 Subjective offline 」, Open a new leader election .

Because there may be misjudgment , The leader didn't belch , Once the leader switch is activated , Subsequent electors 、 Call for a press conference ,slave Take time with new master Synchronizing data consumes a lot of resources .

therefore 「 sentry 」 To reduce the probability of miscarriage of justice , Miscalculation usually occurs when the cluster network is under great pressure 、 Network congestion , Or when the main reservoir itself is under high pressure .

Since it's easy for a person to misjudge , Let's vote together . The sentry mechanism is similar , The cluster mode composed of multiple instances is adopted for deployment , This is the sentry group . Introduce several sentinel examples to judge together , You can avoid a single sentry because your network is not good , And misjudge that the main database is offline .

meanwhile , The probability of multiple sentinel networks being unstable at the same time is small , They make decisions together , The miscalculation rate can also be reduced .

Objective offline

Judge master There can't be only one 「 sentry 」 The final say , Only half of the Sentinels judged master already 「 Subjective offline 」, Only at this time can master Marked as 「 Objective offline 」, That is to say, it is an objective fact , The leader is really belching , Hua Tuo can't be cured in his second life .

Only master Judged as 「 Objective offline 」, It will further trigger the sentry to start the master-slave switching process .

Objective offline

The difference between subjective offline and objective offline

Simply speaking , Subjective offline is that the sentinel thinks the node is down , And the objective offline is not only the sentinel thinks that the node is down , And after the sentry communicates with other sentries , Up to a certain number of sentinels think it's time for the man to belch .

there 「 A certain amount of 」 It's a legal quantity (Quorum), It's determined by the sentinel monitoring configuration , Explain the configuration :

# sentinel monitor <master-name> <master-host> <master-port> <quorum>
# Examples are as follows :
sentinel monitor mymaster 127.0.0.1 6379 2

This configuration item is used to tell the sentinel which master node to listen on :

  • sentinel monitor: On behalf of monitoring .
  • mymaster: Represents the name of the master node , You can customize .
  • 192.168.11.128: Represents the master node of monitoring ip,6379 For port .
  • 2: Legal quantity , Represents when only two or more sentinels think the master node is unavailable , That's what makes master Set to objective offline state , Then proceed failover operation .

「 Objective offline 」 The standard is , When there is N A sentinel instance , Want to have N/2 + 1 Let's take an example to judge master by 「 Subjective offline 」, In order to finally determine Master by 「 Objective offline 」, It's more than half the mechanism .

Switch the main library automatically

“65 Brother : Since judgment master I'm off the line , Then it's time to choose a new leader . ”

「 sentry 」 My second task , Select new master representative or leader in a certain field . You need to choose a new leader from Wudang disciples according to certain rules , After selecting the leader , new master Lead all the disciples to eat and drink together .

According to a certain 「 filter 」 + 「 Scoring 」 Strategy , elect 「 The strongest King 」 As the leader , That is to say, through some conditions of audition filtering some 「 The incompetent 」, Then we will score and rank all the beauties who have passed the audition , Choose the highest as the new master.

As shown in the figure :

new master choice

It's not a good idea for a pretty guy who is often disconnected from the Internet , Would you , Even if it becomes master, But soon the network broke down , You have to choose a new one master, It's not for fun , We have to rule out !

filter

“65 Brother : What are the screening criteria ? ”

  • From the current online state of the library , The offline ones are discarded directly ;
  • Evaluate previous network connection status down-after-milliseconds \* 10: If the slave database is always disconnected from the master database , And the number of disconnection times exceeds a certain threshold (10 Time ), We have reason to believe that , The network condition of this slave database is not very good , You can sift this out of the library .

Scoring

Filter out inappropriate slave after , Then enter the scoring link . There are three rules for three rounds of scoring , The rules are :

  1. slave priority , adopt slave-priority Configuration item , Set different priorities for different slaves ( There's someone backstage who can't help it ), Those with higher priority will be promoted directly to new master representative or leader in a certain field .
  2. slave_repl_offset And master_repl_offset Progress gap ( The closer one's martial arts is to the previous leader's, the more powerful one will be ), If it's all the same , Let's move on to the next rule . It's just a comparison slave And the old master Copy progress gap ;
  3. slave runID, With the same priority and replication schedule ,ID The one with the smallest number gets the highest score from the library , Will be selected as the new master library .( arrange in order of seniority , according to runID To determine when , Early superior );

notice

“65 Brother : Why hold a press conference ? ”

Re elect a new master Such things as headmaster , What a big deal , How can we not tell the world . What's more slave I also need to know who the new leader is , Follow the new leader to be popular and drink spicy health care together .

The last task ,「 sentry 」 Will be new 「master representative or leader in a certain field 」 The connection information is sent to other slave Wudang disciples , And let slave perform replacaof command , New 「master representative or leader in a certain field 」 Establishing a connection , And copy the data to learn all the martial arts of the new leader .

besides ,「 sentry 」 You also need to inform the whole Wulin of the connection information of the new leader ( client ), Make everyone want to visit 、 Those who seek advice can find the new leader , In this way, many matters can be handed over to the new leader for decision ( Transfer the read / write request to the new master).

The main task of the sentry is to achieve the goal

Sentinels carry out tasks and targets

How sentinel clusters work

「 sentry 」 The Department is not alone , Many people work together to form a 「 The sentry cluster 」, Even though there are some 「 sentry 」 I was killed by Lao Wang , Other 「 sentry 」 We can still work together to complete the monitoring 、 New leader election and notice slave 、master And everyone in the Wulin ( client ).

When deploying sentry clusters , Sentinel configuration is only set up to monitor master IP and port, There is no connection information configured for other sentinels .

sentinel monitor <master-name> <ip> <redis-port> <quorum>

How do sentinels know each other ? How do you know slave And monitor their ? By which 「 sentry 」 To perform master-slave switching ?

With these questions , follow 「 Code byte 」 Let's go back to the source together , Deep into the heart of the sentinel cluster .

pub/sub Communication and discovery between sentinels slave

“65 Brother : How do sentinels know each other ? ”

Sentinels can communicate with each other, date and do things , Mainly due to Redis Of pub/sub Release / Subscribe mechanism .

The sentry and master Establish communication , utilize master Provide release / The subscription mechanism publishes its own information , Like height and weight 、 Are you single? 、IP、 port ……

master There is one __sentinel__:hello A dedicated channel for , Used to publish and subscribe messages between sentinels . It's like __sentinel__:hello Wechat group , Sentinels use master Set up a wechat group to release their own news , At the same time, follow the news from other sentinels .

Redis pub/sub Mechanism

When multiple sentinel instances have done publish and subscribe operations on the main database , They can know each other's IP Address and port , To discover and connect with each other .

Redis Manage messages separately through channels , The channels here are actually different wechat groups . such as “ Codebyte reader Technology Group ” It's a technology sharing group . Friends can pay attention to the official account , The background to reply “ Add group ”, Growing up together .

“65 Brother : The Sentinels are connected , But we need to talk to slave Establishing a connection , Otherwise, we can't monitor them , How do you know slave And monitor their ? ”

You bet , It's not enough to connect sentinels to form a cluster , I need to follow slave Establishing a connection , Or you can't monitor them , Unable to make heartbeat judgment on master-slave Library .

besides , If there is a master-slave switch, you have to notify slave Follow the new master Set up a connection to perform data synchronization . The principle of data synchronization in master-slave architecture can be changed step by step 《Redis High availability : You call this master-slave architecture data consistency synchronization 》.

The key is to use master To achieve , The sentry turned to master send out INFO command , master The leader naturally knows what he has salve My little brother's . therefore master After receiving the command , It will be slave The list tells the sentry .

The sentry is based on master Responsive slave List information with every salve Establishing a connection , And continuously monitor the sentry based on this connection .

As shown in the figure , sentry 2 towards Master send out INFO command ,Master Just put slave The list goes back to the sentinel 2, sentry 2 According to slave List connection information with each slave Establishing a connection , And realize continuous monitoring based on this connection .

The rest of the Sentinels also monitor based on this .

INFO Command acquisition slave Information

Select sentry to switch between master and slave

“65 Brother :master After belching , There are so many sentinels , Which Sentry is going to carry out the new master Switching ? ”

It's the sentry's judgment master “ Objective offline ” similar , It was also elected by vote .

Any sentinel judge master “ Subjective offline ” after , Will send to other sentinel friends is-master-down-by-addr command , Good friends are based on their own master The state of connection between them responds to Y perhaps N ,Y To vote for , N It's against .

If a sentinel gets the majority of sentinels “ Affirmative vote ” after , You can mark master by “ Objective offline ”, The Yes vote is through the sentinel profile quorum Configuration item settings .

sentinel monitor <master-name> <ip> <redis-port> <quorum>

For example, a total of 3 A group of sentinels , that quorum Can be configured to 2, When a sentry gets 2 Yes, yes , You can mark master “ Objective offline ”, Of course, this vote includes your own one .

A sentinel with a majority vote can send orders to other sentinels , State that you want to perform master-slave switching . And let the other sentinels vote , The voting process is called “Leader The election ”.

Want to be “Leader” It's not that simple , You have to have two brushes . The following conditions need to be met :

  1. More than half of the other sentinel friends voted for it ;
  2. The number of affirmative votes should be greater than or equal to that of the configuration file quorum Value .

If the sentry group has 2 An example , here , A sentinel wants to be Leader, Must obtain 2 ticket , instead of 1 ticket . therefore , If a sentinel goes down , that , At this time, the cluster is unable to switch between master and slave databases . therefore , Usually we will at least configure 3 A sentinel example .

This is also the reason why sentry clusters are deployed in an odd number , Even numbers are unnecessary and wasteful .

The election process is shown in the figure below :

Redis Sentinels perform master-slave switching

adopt pub/sub Implement client event notification

“65 Brother : new master It's chosen , How to publicize the world ? ”

A press conference, of course , Invite news related media reports to spread , Interested people naturally pay attention to subscription related events , And act on events .

stay Redis It's similar , adopt pub/sub Mechanisms release different events , Let the client subscribe to the message here . The client can subscribe to sentry messages , The sentinel has a lot of subscription channels , Different channels contain different key events in the process of master-slave switch .

That is to say, in different “ Wechat group ” Publish different events , Let the people who are interested in the event into the group .

master Offline events

  • +sdown: Get into “ Subjective offline ” state ;
  • -sdown: sign out “ Subjective offline ” state ;
  • +odown: Get into “ Objective offline ” state ;
  • -odown: sign out “ Objective offline ” state ;

slave Reconfigure Events

  • +slave-reconf-sent: The sentry sent replicaof Command to reconfigure the slave Library ;
  • +slave-reconf-inprog:slave New master, But it's not synchronized yet ;
  • +slave-reconf-done:slave New master, And with the new master Complete data synchronization ;

New main library switch

+switch-master:master The address has changed .

After knowing these channels , So that the client can subscribe to the message from the sentry . After the client reads the Sentinel's configuration file , You can get the sentry's address and port , Network with the sentry .

then , We can execute subscription commands on the client side , To get different event messages .

Take a chestnut : The following commands subscribe to “ Events in which all instances enter the objective offline state ”

SUBSCRIBE +odown

Notes and configuration instructions

Did you find out ,Redis Of pub/sub The publish subscribe mechanism is particularly important , With pub/sub Mechanism , Between the sentry and the sentry 、 Between the sentry and the slave 、 The connection can be established between the sentry and the client , The release of various events is also realized through this mechanism .

down-after-milliseconds

Sentinel In the configuration file down-after-milliseconds Option specifies Sentinel Determine the length of time it takes for the instance to enter the subjective logoff : If an example is in down-after-milliseconds In milliseconds , In succession Sentinel Return invalid reply , that Sentinel The data corresponding to this instance will be modified , This indicates that the instance has entered the subjective offline state .

Make sure that the configuration of all sentinel instances is consistent , Especially the subjective judgment value down-after-milliseconds. Because this value is not configured consistently on different sentinel instances , As a result, the sentinel cluster has not reached a consensus on the failed main database , So we didn't switch the main database in time , The end result of cluster service instability .

down-after-milliseconds * 10

down-after-milliseconds It is the maximum connection timeout that we determine that the master-slave database is disconnected . If in down-after-milliseconds In milliseconds , The master and slave nodes are not connected through the network , We can think that the master-slave node is disconnected . If the disconnection occurs more than 10 Time , This shows that the network condition of the slave database is not good , Not suitable as a new master library .

summary

The main task of the sentry is

Redis The sentinel mechanism is to achieve Redis One of the high availability means of uninterrupted service . Data synchronization of master-slave architecture cluster , It is the basic guarantee of data reliability ; Main library down , Automatic execution of master-slave switching is the key support for uninterrupted service .

Redis Sentry mechanism realizes the automatic switch between master and slave , I'm not afraid to be with my female friend any more master It's down. :

  • monitor master And slave Running state , Judge whether it is objective ;
  • master After the objective offline , Select a slave Switch to master;
  • notice slave And client new master Information .

The principle of sentry group

In order to avoid the failure of master-slave switch after single sentry failure , And to reduce the miscarriage of justice , And the sentinel group was introduced ; Sentinel cluster needs some mechanisms to support its normal operation :

  • be based on pub/sub Mechanism to realize the communication between sentry clusters ;
  • be based on INFO Command acquisition slave list , help The sentry and slave Establishing a connection ;
  • Through the sentry's pub/sub, Realize the event notification between client and sentry .

Master slave switch , It's not a random choice of a sentry to execute , It's arbitration by vote , Select a Leader, By this Leader Responsible for master-slave switching .

Reference material

  • [1][redis Design and implementation ] Huang Jianhong
  • [2][redis Core technology and actual combat ] https://time.geekbang.org/column/article/274483
  • [3][redis Deep Adventure : Core principles and practical application ] https://juejin.cn/book/6844733724618129422/section/6844733724722987021
  • [4][redis project : An in-depth interpretation of the sentinel model ] https://juejin.cn/post/6934984432273063967#heading-0
  • [5][redis The sentinel principle , I've put up with you for a long time !] https://www.modb.pro/db/25926

This article is from WeChat official account. - High performance server development (easyserverdev)

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

Original publication time : 2021-04-01

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

版权声明
本文为[Fan Li]所创,转载请带上原文链接,感谢
https://javamana.com/2021/04/20210408111751018y.html

  1. A love diary about http
  2. navicat连接win10 mysql8.0 报错2059
  3. [rocketmq source code analysis] in depth message storage (3)
  4. Implementation of service configuration center with spring cloud + Nacos (Hoxton version)
  5. SCIP: constructing data abstraction -- Explanation of queue and tree in data structure
  6. SCIP: abstraction of construction process -- object oriented explanation
  7. Using docker to build elasticsearch + kibana cluster
  8. What are the spring IOC features? I can't understand the source code!
  9. Spring cloud upgrade road - 2020.0. X - 3. Accesslog configuration of undertow
  10. 导致Oracle性能抖动的参数提醒
  11. 风险提醒之Oracle RAC高可用失效
  12. 小机上运行Oracle需要注意的进程调度bug
  13. Oracle内存过度消耗风险提醒
  14. Oracle SQL monitor
  15. 使用Bifrost实现Mysql的数据同步
  16. 揭秘Oracle数据库truncate原理
  17. 看了此文,Oracle SQL优化文章不必再看!
  18. Mybatis (3) map and fuzzy query expansion
  19. Kafka性能篇:为何这么“快”?
  20. 两个高频设计类面试题:如何设计HashMap和线程池
  21. [TTS] AIX - & gt; Linux -- Based on RMAN (real environment)
  22. 为什么学编程大部分人选Java编程语言?
  23. Redis 高可用篇:你管这叫 Sentinel 哨兵集群原理
  24. redis 为什么把简单的字符串设计成 SDS?
  25. [TTS] transfer table space AIX - & gt; Linux based on RMAN
  26. Linux 网卡数据收发过程分析
  27. Redis 高可用篇:你管这叫 Sentinel 哨兵集群原
  28. Redis 6.X Cluster 集群搭建
  29. [TTS] transfer table space AIX ASM - & gt; Linux ASM
  30. [TTS] transfer table space Linux ASM - & gt; AIX ASM
  31. 高性能通讯框架——Netty
  32. Brief introduction and test of orchestrator, a high availability management tool for MySQL
  33. [TTS] transfer table space Linux - & gt; AIX based on RMAN
  34. A love diary about http
  35. [rocketmq source code analysis] in depth message storage (3)
  36. Implementation of service configuration center with spring cloud + Nacos (Hoxton version)
  37. SiCp: abstraction of construction process -- object oriented explanation
  38. springboot网上点餐系统
  39. 【SPM】oracle如何固定执行计划
  40. 用好HugePage,告别Linux性能故障
  41. 3 W word long text, java basic interview questions! It's amazing!!!
  42. Spring cloud upgrade road - 2020.0. X - 3. Accesslog configuration of undertow
  43. Win10 uninstall mysql5.7
  44. CentOS下dotnet Core使用HttpWebRequest进行HTTP通讯,系统存在大量CLOSE_WAIT连接问题的分析,已解决。
  45. MySQL batch insert, how not to insert duplicate data?
  46. K8s cronjob application example
  47. Unconventional method, easy to deal with Oracle database critical exception
  48. How to use sqlplus - prelim in Oracle hang
  49. How to search Oracle official documents in full text
  50. Install mysql8.0 on win10
  51. Oracle OCR的备份与恢复
  52. Oracle kill session相关问题
  53. 《Oracle DBA工作笔记》第二章 常用工具和问题分析
  54. Oracle回收站及flashback drop
  55. Hand in hand to teach you to write a spring IOC container
  56. Exception in Java (1) - basic concept
  57. 3w 字长文爆肝 Java 基础面试题!太顶了!!!
  58. Error 2059 when Navicat connects to win10 mysql8.0
  59. Parameter reminder causing Oracle Performance jitter
  60. 「技术分享」Java线程状态间的互相转换看这个就行了