[redis] RDB and AOF of redis persistence

Zhou Er Ya 2020-11-07 20:19:09
redis rdb aof redis persistence


Redis Persistence

We know Redis All of the data is stored in memory , If the machine suddenly GG, Then all the data will be lost , Therefore, we need a persistence mechanism to ensure that data is not lost due to downtime .Redis It provides us with two persistence schemes , One is based on snapshots , The other is based on AOF journal . Now let's take a look at these two options .

Operating system and disk

First we need to know Redis What role does the database play in persistence , So let's first look at the data from Redis The process from middle to disk :

  • The client initiates to the database write Instructions ( The data is in the client's memory );
  • Database received write Instructions and corresponding write data ( The data is in the server memory );
  • The database calls the system call function that writes data to disk ( The data is in the system kernel buffer );
  • The operating system writes the data in the buffer to the disk controller ( The data is in the disk buffer );
  • The disk controller writes the data in the disk buffer to the physical media of the disk ( Data is actually written to disk ).

The above is just a brief introduction to the process , After all, the real cache level will only be more than that . But we can learn from it that , Database in the process of persistence should mainly go to the implementation steps 3, In other words, the data originally in memory is persisted to the kernel buffer of the operating system . As for the next two steps , It's what the operating system needs to care about , There's nothing the database can do . The database usually calls the system call to write data from memory to disk only when necessary .

Persistence scheme

For the persistence process we described above ,Redis There are several different persistence schemes available :

  • utilize RDB Persistence generates a point in time snapshot of a dataset at a specified time interval (point-in-time );
  • utilize AOF Persistence records all write commands received by the server , And when the server restarts , Use these commands to recover the dataset .AOF The command used is with Redis The orders of its own accord , Write data to backup file by appending , At the same time, when the backup file is too large ,Redis It can also compress the backup files .
  • If you only want the data to exist only when the database is running , You can also disable the persistence mechanism completely ;
  • Redis It can also be used at the same time AOF Persistence and RDB Persistence . under these circumstances , When AOF Restart time , Will be used preferentially AOF File to restore the original data . because AOF The data stored in is usually less than RDB The data stored in is more complete .

Next, I will focus on RDB Persistence scheme and AOF The similarities and differences between persistence schemes .

RDB Persistence

RDB(Redis Database) Save the data to disk in the form of snapshot . So called snapshot , It can be understood as taking photos and saving the data set at a certain point in time .Redis In this way, the data in the current system can be saved and backed up at a specified time interval or when a specific command is executed , Write to disk in binary form , Default file name dump.rdb.

RDB There are three mechanisms for triggering , perform save command ; perform bgsave command ; stay redis.config Configuration automation in .

save Trigger

Redis It's a single threaded program , This thread is responsible for concurrent read and write operations of multiple client sockets and logical read and write of memory structure . and save The command blocks the current Redis The server , During the execution of the order ,Redis Can't handle other commands , Until the whole RDB Until the process is complete , Describe the following with a picture :

When this command is finished , take RDB After the file is saved , To continue to respond to requests . This method is good for data backup on new machines , If it's used in production , So it's a disaster , Too much data , The blocking time is too long . This way is not advisable .

bgsave Trigger

In order not to block online business , that Redis It has to be persistent , While responding to the client's request . So in the implementation bgsave You can go through fork A subprocess , And then through this sub process to process all the next save work , The parent process can continue to respond to requests without having to worry about I/O operation .

redis.config To configure

Both of the above two methods need to be executed in the client save perhaps bgsave command , In a production situation, we need more of an automated trigger mechanism , that Redis It provides this mechanism , We can do it in redus.config Configure persistence in :

################################ SNAPSHOTTING ################################
#
# Save the DB on disk:
#
# save <seconds> <changes>
#
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
#
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
#
# Note: you can disable saving completely by commenting out all "save" lines.
#
# It is also possible to remove all the previously configured save
# points by adding a save directive with a single empty string argument
# like in the following example:
#
# save ""
save 900 1
save 300 10
save 60 10000

Like above in redis.config To configure , Such as save 900 1 It means in 900 Seconds , If there is one or more modification operations , Then do an automated backup automatically ;save 300 10 It also means in 300 If there are ten or more modifications in seconds , Then we can back up the data , By analogy .

If you don't want data persistence , Just want the data to exist in memory only when the database is running , Then you can go through save "" Disable data persistence .

Here we will introduce a few more in the configuration file with RDB Persistence related coefficients :

  • stop-writes-on-bgsave-error: The default value is yes, That is, when the last time RDB After failing to persist the file , Refuse to receive data . The benefit of this is that it allows users to realize that the data has not been successfully persisted , To avoid further serious business problems ;
  • rdbcompression: The default value is yes, This means that the snapshot stored in the disk is compressed ;
  • rdbchecksum: The default value is yes, After the snapshot storage is complete , We can also go through CRC64 Algorithm to verify the data , This will increase performance consumption ;
  • dbfilename: The default value is dump.rdb, Name the snapshot storage as dump.rdb;
  • dir: Set the storage path of the snapshot .

COW Mechanism

It was mentioned earlier Redis In order not to block online business , So we need to persist and respond to client requests , therefore fork A child process is created to handle these preservation tasks . So specifically this fork How does the child process come out to make Redis You can do persistence at the same time , While doing response work ? This involves COW (Copy On Write) Mechanism , Let's explain the following in detail COW Mechanism .

Redis It will be called during persistence glibc Function of fork Make a sub process , The completion of snapshot persistence is left to the child process , The parent process continues to respond to client requests . And when the child process is just created , It actually uses code and data segments from the parent process . therefore fork after ,kernel The permissions of all memory pages in the parent process are set to read-only, Then the address space of the child process points to the address space of the parent process . When the parent process writes memory ,CPU The hardware has detected that the memory page is read-only Of , Will trigger page break (page-fault), fall into kernel An interrupt routine for . In interrupt routines ,kernel A copy of the page that triggered the exception will be copied , So the father and son process each has its own share . At this time, the corresponding data of the subprocess has not changed , It's still the data from the moment the process was created , So the child process can traverse the data safely , Serialized and written to disk .

As the modification of the parent process continues , More and more shared pages will be separated , Memory will continue to grow , But it won't exceed twice the size of the original data memory (Redis The proportion of cold data in the case is often relatively high , So it's rare that all pages are separated ).

COW The benefits of the mechanism are obvious : First of all, it can reduce the instantaneous delay caused by allocation and replication , It can also reduce unnecessary resource allocation . But the disadvantages are obvious : If the parent process receives a large number of writes , There will be a lot of paging errors ( Page break page-fault).

RDB The advantages and disadvantages of

I believe that through the explanation of the above content , about RDB Persistence should have a general understanding of , So let's make a brief summary RDB Its advantages and disadvantages :

advantage :

  • RDB It's a very compact (compact) The file of ( Save binary data ), It has been saved. Redis Data set at a certain point in time . This kind of file is very suitable for backup : for instance , You can be in the nearest 24 Within hours , Back up every hour RDB file , And every day of every month , Also back up one RDB file . In this case , Even if there is a problem , You can also restore datasets to different versions at any time ;
  • RDB Ideal for disaster recovery (disaster recovery): It has only one file , And the content is very compact , Sure ( After encryption ) Send it to another data center ;
  • RDB Can be maximized Redis Performance of : The parent process is saving RDB The only thing to do when you file is fork Make a sub process , Then this subprocess will handle all the subsequent saving work , The parent process does not need to execute any disks I/O operation ;
  • RDB Speed ratio when recovering large data sets AOF It's faster to recover .

Inferiority :

  • If the business needs to avoid the loss of data in case of server failure , that RDB Not suitable for . although Redis Allows you to set different save points in (save point) To control the preservation RDB File frequency , however , because RDB The file needs to save the state of the entire dataset , So the process is not fast , Maybe at least 5 Only once in a minute RDB file save . under these circumstances , In the event of a breakdown stop , You could lose minutes of data .
  • Every time you save RDB When ,Redis Both fork() Make a sub process , And it's up to the subprocesses to do the actual persistence work . When the data set is large , fork() It can be very time consuming , Cause the server to stop processing the client in a millisecond ; If the data set is very large , also CPU When time is very tight , So this kind of stop time may even be as long as a whole second . although AOF Rewriting also requires fork() , But no matter AOF How long is the execution interval of the rewrite , There will be no loss of data durability .

AOF Persistence

The AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log in the background when it gets too big.

RDB Persistence is full backup , More time-consuming , therefore Redis It provides a more efficient way to AOF(Append Only-file) Persistence scheme , Briefly describe how it works :AOF The log stores Redis Server instruction sequence ,AOF Only record the instruction records that modify the memory .

When the server restarts ,Redis Will take advantage of AOF These operations recorded in the log build the original dataset from scratch .

Redis After receiving the modification instruction from the client , Make parameter changes 、 Logical processing , If there is no problem , Immediately store the instruction text in AOF In the log , in other words , Execute the instruction before saving the log . This is different from leveldb、hbase Wait for the storage engine , They all store logs first and then do logical processing .

AOF Trigger configuration for

AOF There are different trigger schemes , Here we briefly describe the following three trigger schemes :

  • always: Every time a data change occurs, it will be immediately recorded to the disk file , The integrity of this solution is good, but IO Costly , Poor performance ;
  • everysec: Synchronize every second , Speed has improved . But if it goes down in one second, it may lose the data in that second ;
  • no: The default configuration , I don't use AOF Persistence scheme .

Can be in redis.config To configure ,appendonly no Change to yes, And then by annotating or paraphrasing appendfsync Configure the solution you need :

############################## APPEND ONLY MODE ###############################
# By default Redis asynchronously dumps the dataset on disk. This mode is
# good enough in many applications, but an issue with the Redis process or
# a power outage may result into a few minutes of writes lost (depending on
# the configured save points).
#
# The Append Only File is an alternative persistence mode that provides
# much better durability. For instance using the default data fsync policy
# (see later in the config file) Redis can lose just one second of writes in a
# dramatic event like a server power outage, or a single write if something
# wrong with the Redis process itself happens, but the operating system is
# still running correctly.
#
# AOF and RDB persistence can be enabled at the same time without problems.
# If the AOF is enabled on startup Redis will load the AOF, that is the file
# with the better durability guarantees.
#
# Please check http://redis.io/topics/persistence for more information.
appendonly no
# The name of the append only file (default: "appendonly.aof")
appendfilename "appendonly.aof"
# ... Omit
# appendfsync always
appendfsync everysec
# appendfsync no

AOF Rewrite mechanism

With Redis Operation of ,AOF It's going to get longer and longer , If the instance goes down and restarts , So replay the whole thing AOF It's going to be very time consuming , And in the log book , There are a lot of meaningless records , For example, I now put a data incr A thousand times , So there's no need to record this 1000 Time modification , Just record the last value . So we need to do AOF rewrite .

Redis Provides bgrewriteaof Instructions are used for AOF The log is rewritten , When the instruction runs, it will open up a sub process to traverse the memory , And then convert it into a series of Redis Operation instructions of , Then serialize it into a log file . Replace the original when finished AOF file , complete .

The same can be found in redis.config To configure the trigger of rewriting mechanism :

By way of no-appendfsync-on-rewrite Set to yes, Turn on the rewriting mechanism ;auto-aof-rewrite-percentage 100 The size of the file has increased since it was written last time 100% Trigger rewrite again ;

auto-aof-rewrite-min-size 64mb When the document must at least reach 64mb Will trigger the brake override .

# ... Omit
no-appendfsync-on-rewrite no
# Automatic rewrite of the append only file.
# ... Omit
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

Rewriting is also resource intensive , So when there's enough disk space , I can put 64mb Adjust to more capital , Reduce the frequency of rewriting , To achieve the optimization effect .

fsync function

then AOF Configure to appendfsync everysec after ,Redis After processing a command , It doesn't call immediately write Write data to AOF file , Instead, write the data first AOF buffer(server.aof_buf). call write And command processing is separate ,Redis Only every time you enter epoll_wait Before write operation .

/* Write the append only file buffer on disk.
 *
 * Since we are required to write the AOF before replying to the client,
 * and the only way the client socket can get a write is entering when the
 * the event loop, we accumulate all the AOF writes in a memory
 * buffer and write it on disk using this function just before entering
 * the event loop again.
 *
 * About the 'force' argument:
 *
 * When the fsync policy is set to 'everysec' we may delay the flush if there
 * is still an fsync() going on in the background thread, since for instance
 * on Linux write(2) will be blocked by the background fsync anyway.
 * When this happens we remember that there is some aof buffer to be
 * flushed ASAP, and will try to do that in the serverCron() function.
 *
 * However if force is set to 1 we'll write regardless of the background
 * fsync. */
#define AOF_WRITE_LOG_ERROR_RATE 30 /* Seconds between errors logging. */
void flushAppendOnlyFile(int force) {
// aofWrite call write take AOF buffer Write to AOF file , Processed ENTR, Nothing else
ssize_t nwritten = aofWrite(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
/* Handle the AOF write error. */
if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
/* We can't recover when the fsync policy is ALWAYS since the
          * reply for the client is already in the output buffers, and we
          * have the contract with the user that on acknowledged write data
          * is synced on disk. */
serverLog(LL_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting...");
exit(1);
} else {
return; /* We'll try again on the next call... */
} else {
/* Successful write(2). If AOF was in error state, restore the
         * OK state and log the event. */
}
/* Perform the fsync if needed. */
if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
// redis_fsync Is a macro ,Linux For the actual fdatasync, Others are fsync
//  So it's better not to put redis.conf Medium appendfsync Set to always, This greatly affects performance
redis_fsync(server.aof_fd); /* Let's try to get this data on the disk */
}
else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC && server.unixtime > server.aof_last_fsync)) {
//  If already in sync state , No more repetition
// BIO Threads will set intervals sync_in_progress
// if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
//     sync_in_progress = bioPendingJobsOfType(BIO_AOF_FSYNC) != 0;
if (!sync_in_progress)
// everysec The performance is not that bad , Because it : In the background fsync.
// Redis It's not strictly single threaded , It actually creates a set of BIO Threads , Specifically dealing with blocking and slow operations
//  These operations include FSYNC, In addition, there are closing files and memory free Two operations .
//  Unlike always,EVERYSEC Patterns do not immediately call fsync,
//  Instead, the operation is lost to BIO Threads execute asynchronously ,
// BIO Threads are created when the process starts , Between the two through bio_jobs and bio_pending Two
//  Global object interaction , The main thread is responsible for writing ,BIO Threads are responsible for consumption .
aof_background_fsync(server.aof_fd);
server.aof_last_fsync = server.unixtime;
}
}

Redis The other two strategies , One is never to call fsync, Let the operating system decide the appropriate synchronous disk , It's not safe to do this ; The other is to call when an instruction comes fsync once , This leads to very slow results . Neither strategy will be used in a production environment , Just get to know .

AOF The advantages and disadvantages of

  • AOF The default policy for persistence is per second fsync once , In this configuration ,Redis Still maintain good performance , And even in the event of a breakdown , At most, it will only lose data in a second ;
  • AOF A file is a log file that only appends (append only log), So right. AOF Writing files does not need to be done seek , Even if the log contains commands that are not written completely for some reason ( For example, the disk is full when writing , Write stoppage , wait ), redis-check-aof Tools can also easily fix this problem .
  • Redis Can be in AOF When the file size becomes too large , Automatically in the background AOF Rewrite : The rewritten new AOF The file contains the minimum set of commands required to recover the current dataset . The whole rewrite operation is absolutely safe , because Redis Creating a new AOF In the process of documentation , Will continue to append the command to the existing AOF In the document , Even if there is a outage during the rewrite , The existing AOF Documents will not be lost . And once it's new AOF File creation complete ,Redis From the old AOF File switch to new AOF file , And start on the new AOF File to append .
  • AOF The file holds all writes to the database in an orderly manner , These write operations to Redis The format of the protocol is saved , therefore AOF The contents of the document are very easy to read , Analyze the document (parse) It's easy too . export (export) AOF The documents are also very simple : for instance , If you don't execute it carefully FLUSHALL command , But as long as AOF The file has not been rewritten , So just stop the server , remove AOF At the end of the document FLUSHALL command , And restart Redis , You can restore the dataset to FLUSHALL Status before execution .

AOF The shortcomings of

  • For the same dataset ,AOF The volume of the file is usually larger than RDB Volume of file .
  • According to the fsync Strategy ,AOF May be slower than RDB . In general , Per second fsync Performance is still very high , Shut down fsync It can make AOF Speed and RDB As fast as , Even under high load . But when dealing with large write loads ,RDB More guaranteed maximum delay time (latency).
  • AOF This has happened in the past bug : Because of individual orders , Lead to AOF When the file is reloaded , Unable to restore the dataset as it was when it was saved .

Mix persistence

restart Redis when , If you use RDB To restore memory state , It's going to lose a lot of data . And if you only use AOF Log replay , So the efficiency is too low .Redis 4.0 Provides a hybrid persistence scheme , take RDB The content of the file and the incremental AOF Log files exist together . there AOF Logs are no longer full logs , But from RDB The increment that occurs between the beginning of persistence and the end of persistence AOF journal , Usually this part of the log is very small .

So in Redis When restarting , You can load RDB The content of , Then replay the increment AOF journal , Can completely replace the previous AOF Full playback , As a result, the restart efficiency has been greatly improved .

Reference article

版权声明
本文为[Zhou Er Ya]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云