[redis] RDB and AOF of redis persistence

itread01 2020-11-07 16:46:44
redis rdb aof redis persistence

## Redis Persistence We know Redis All data stored in memory , If the machine suddenly GG, Then all the data will be lost , Therefore, we need a persistence mechanism to ensure that data will not be lost due to downtime .Redis It provides us with two persistence schemes , One is based on snapshot , The other is based on AOF The Journal . Let's take a look at these two options . ### Operating system and disk First of all, we need to know Redis What role does the database play in persistence , So let's start with the information from Redis The process of getting to disk : - The client initiates to the database write Instructions ( The data is in the memory of the client ); - The database received write Instructions and corresponding write data ( The data is in server memory ); - The database calls the system call function that writes data to disk ( The data is in the system core buffer ); - The operating system writes the data in the buffer to the disk controller ( Data in disk buffer ); - The disk controller writes the data in the disk buffer to the physical media of the disk ( Data is actually written to disk ). The above is just a brief introduction to the process , After all, the real cache level will only be more than that . But we can learn from it that , In the process of database persistence, the main steps should be implemented 3, That is to persist the data originally in memory into the core buffer of the operating system . To the next two steps , It's what the operating system needs to care about , There's nothing the database can do . The database usually calls the system call to write data from memory to disk only when necessary . ### Persistence scheme For the persistence process described above ,Redis There are several different persistence schemes available : - utilize RDB Persistence generates a point in time snapshot of the dataset at a specified time interval (point-in-time ); - utilize AOF Persistence records all write commands received by the server , And when the server restarts , Use these commands to recover the dataset .AOF The command used is with Redis The orders of its own accord , Write data to backup file by appending , At the same time, when the backup file is too large ,Redis It can also compress the backup files . - If you only want the data to exist only when the database is running , Then you can disable the persistence mechanism completely ; - Redis You can also use AOF Persistence and RDB Persistence . In this case , When AOF On restart , Will give priority to AOF File to restore the original data . Because AOF The data stored in is usually less than RDB The data stored in is more complete . Next, I will focus on RDB Persistence scheme and AOF The similarities and differences between persistence schemes . ## RDB Persistence `RDB(Redis Database)` Save data to disk in the form of snapshot . So called snapshot , It can be understood as taking photos and storing the data set at a certain time point .Redis In this way, the data in the current system can be stored and backed up at a specified time interval or when a specific command is executed , Write to disk in binary form , The default file name is `dump.rdb`. RDB There are three mechanisms for triggering , Execute `save` command ; Execute `bgsave` command ; stay `redis.config` Configuration automation in . ### save Trigger Redis It's a single thread program , This thread is responsible for concurrent read and write operations of multiple client sockets and logical read and write of memory structure . and save The command blocks the current Redis Server , During the execution of the order ,Redis Can't handle other commands , Until the whole RDB Until the process is complete , Describe the following with a picture : ![](https://img2020.cnblogs.com/blog/1614350/202011/1614350-20201107153903214-1937165420.png) When this command is finished , Will RDB After the files are saved , To continue to respond to requests . This method is good for data backup on the new machine , If it's used in production , That would be a disaster , The amount of data is too large , The blocking time is too long . This is not a good way . ### bgsave Trigger In order not to block online business , So Redis It has to be persistent , While responding to the client's request . So in the implementation `bgsave` You can do it by `fork` A subroutine , Then through this subroutine to deal with all the following storage work , The parent program can then continue to respond to requests without caring about `I/O` operation . ![](https://img2020.cnblogs.com/blog/1614350/202011/1614350-20201107153930593-1648469936.png) ### redis.config To configure Both of the above two methods need to be executed in the client `save` perhaps `bgsave` command , In a production situation, we need more of an automated trigger mechanism , So Redis It provides this mechanism , We can do it in `redus.config` Configure persistence in : ```config ################################ SNAPSHOTTING ################################ # # Save the DB on disk: # # save # # Will save the DB if both the given number of seconds and the given # number of write operations against the DB occurred. # # In the example below the behaviour will be to save: # after 900 sec (15 min) if at least 1 key changed # after 300 sec (5 min) if at least 10 keys changed # after 60 sec if at least 10000 keys changed # # Note: you can disable saving completely by commenting out all "save" lines. # # It is also possible to remove all the previously configured save # points by adding a save directive with a single empty string argument # like in the following example: # # save "" save 900 1 save 300 10 save 60 10000 ``` Like above in `redis.config` Configuration in , Such as `save 900 1` It means in 900 Seconds , If there is one or more modification operations , Then make an automatic backup automatically ;`save 300 10` It also means in 300 If there are ten or more modifications in seconds , Then backup the data , And so on . If you don't want data persistence , You only want the data to exist in memory only when the database is running , Then you can go through `save ""` Disable data persistence . Here we will introduce a few more in the configuration file with RDB Persistence related coefficients : - `stop-writes-on-bgsave-error`: The default value is yes, That is, when the last time RDB After failed to persist the file , Refuse to receive data . The advantage of this is that the user can be aware that the data has not been successfully persisted , To avoid further serious business problems ; - `rdbcompression`: The default value is yes, This means that the snapshot stored on the disk is compressed ; - `rdbchecksum`: The default value is yes, After the snapshot is saved , We can also go through CRC64 An algorithm is used to check the data , This increases the cost of performance ; - `dbfilename`: The default value is `dump.rdb`, Name the snapshot storage as `dump.rdb`; - `dir`: Set the storage path of the snapshot . ### COW Mechanism It was mentioned earlier Redis In order not to block online business , So we need to persist and respond to client requests , therefore `fork` A subroutine is created to handle these storage tasks . So specific this `fork` How does the subroutine come out to make Redis You can do persistence at the same time , While doing response work ? This involves `COW (Copy On Write)` Mechanism , Let's explain the following in detail COW Mechanism . Redis When persistent, it calls `glibc` Function of `fork` Out of a subroutine , The completion of snapshot persistence is handled by the subroutine , The parent program continues to respond to client requests . And when the subroutine is just generated , It actually uses code segments and data segments from the parent program . therefore `fork` After that ,kernel The permissions for all memory pages in the parent program are set to `read-only`, Then the address space of the subroutine points to the address space of the parent program . When the parent program writes memory ,CPU The hardware has detected that the memory page is `read-only` Of , Will trigger page break (page-fault), fall into kernel An interrupt routine for . In interrupt routines ,kernel A copy of the page that triggered the exception will be copied , Therefore, the father and son have their own separate share . At this time, the corresponding data of subroutine has not changed , It's still the data from the moment the program was created , Therefore, the subroutine can safely traverse the data , Serialized and written to disk . As the modification of the parent program continues , More and more shared pages will be separated , Memory will continue to grow , But it won't be more than twice the size of the original data memory (Redis The proportion of cold data in the case is often relatively high , So it's rare that all pages are detached ). COW The benefits of the mechanism are clear : First of all, it can reduce the instantaneous delay caused by allocation and replication , It can also reduce unnecessary resource allocation . But the disadvantages are obvious : If the parent program receives a large number of writes , There will be a lot of paging errors ( Page break page-fault). ### RDB The pros and cons of I believe that through the explanation of the above content , For RDB Persistence should have a general understanding of , So let's make a brief summary RDB Its advantages and disadvantages : Advantage : - RDB It's a very compact (compact) Files of ( Storing binary data ), It stores Redis Dataset at a point in time . This file is perfect for backup : For example , You can be in the nearest 24 Within hours , Back up every hour RDB Archives , And on every day of the month , Back up one too RDB Archives . In this case , Even if there's a problem , You can also restore a dataset to a different version at any time ; - RDB Ideal for disaster recovery (disaster recovery): It has only one file , And it's very compact , Sure ( After encryption ) Send it to another data center ; - RDB Can be maximized Redis The effectiveness of : The parent program is storing RDB The only thing to do when filing is `fork` Out of a subroutine , And then this subroutine will handle all the storage work that follows , The parent program does not need to execute any disks I/O operation ; - RDB Speed ratio when recovering large data sets AOF It's faster to recover . Weakness : - If business needs to avoid data loss in case of server failure , So RDB Not suitable for . Although Redis Allows you to set different storage points (save point) To control storage RDB The frequency of the files , however , Because of RDB Files need to store the state of the entire dataset , So the process is not fast , Maybe at least 5 Only once in a minute RDB File storage . In this case , In case of failure, stop the machine , You could lose a few minutes of data . - Every time you store RDB When ,Redis Both `fork()` Out of a subroutine , And the actual persistence is done by the subroutine . When the data set is large , `fork()` It can be very time consuming , Causes the server to stop processing clients within a certain millisecond ; If the data set is very large , And CPU If time is very tight , Then this kind of stop time may even be as long as a full second . Although AOF Rewriting also needs to be done `fork()` , But whatever AOF How long is the execution interval of rewriting , There is no loss of data durability . ## AOF Persistence > The AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log in the background when it gets too big. RDB Persistence is full backup , It's time consuming , therefore Redis It provides a more efficient way to `AOF(Append Only-file)` Persistence scheme , Briefly describe how it works :AOF The log stores Redis Server command sequence ,AOF Records only instructions that modify memory . When the server is rebooted ,Redis Will use it AOF These operations recorded in the log are used to reconstruct the original data set . ![](https://img2020.cnblogs.com/blog/1614350/202011/1614350-20201107153958305-159489071.png) Redis After receiving the client modification instruction , Modify the argument 、 Logical processing , If there is no problem , Immediately save the command text to AOF In the Journal , That is to say , Execute the command before archiving the log . This is different from leveldb、hbase Wait for the storage engine , They all store logs first and then do logical processing . ### AOF Trigger configuration for AOF There are different trigger schemes , Here we briefly describe the following three trigger schemes : - always: Every time a data change occurs, it is immediately recorded in the disk file , The integrity of this solution is good, but IO It costs a lot , Poor performance ; - everysec: Synchronize every second , Speed has improved . But if it goes down in one second, it may lose the data in that second ; - no: Default configuration , I don't use AOF Persistence scheme . Can be in `redis.config` Configuration in ,`appendonly no` Change it to yes, And then by annotating or paraphrasing `appendfsync` Configure the solution you need : ```config ############################## APPEND ONLY MODE ############################### # By default Redis asynchronously dumps the dataset on disk. This mode is # good enough in many applications, but an issue with the Redis process or # a power outage may result into a few minutes of writes lost (depending on # the configured save points). # # The Append Only File is an alternative persistence mode that provides # much better durability. For instance using the default data fsync policy # (see later in the config file) Redis can lose just one second of writes in a # dramatic event like a server power outage, or a single write if something # wrong with the Redis process itself happens, but the operating system is # still running correctly. # # AOF and RDB persistence can be enabled at the same time without problems. # If the AOF is enabled on startup Redis will load the AOF, that is the file # with the better durability guarantees. # # Please check http://redis.io/topics/persistence for more information. appendonly no # The name of the append only file (default: "appendonly.aof") appendfilename "appendonly.aof" # ... Omit # appendfsync always appendfsync everysec # appendfsync no ``` ### AOF Rewriting mechanism With Redis Implementation of ,AOF It's going to get longer and longer , If the instance is down and restarted , So replay the whole thing AOF It's going to be very time consuming , And in the log book , There are a lot of meaningless records , For example, I now put a data incr A thousand times , Then there's no need to record this 1000 Time modification , Just record the last value . So we need to do AOF Rewrite . Redis Provides `bgrewriteaof` An instruction is used to deal with AOF The log is rewritten , When the instruction is executed, it will open a subroutine to traverse the memory , And then convert it into a series of Redis Operation instructions of , Then serialize it into a log file . Replace the original when finished AOF Archives , complete . The same can be done in `redis.config` To configure the trigger of rewriting mechanism : By putting `no-appendfsync-on-rewrite` Set to yes, Turn on the rewriting mechanism ;`auto-aof-rewrite-percentage 100` The size of the file has increased since it was written last time 100% Trigger rewrite again ; `auto-aof-rewrite-min-size 64mb` When a file must at least reach 64mb Will trigger the brake override . ```config # ... Omit no-appendfsync-on-rewrite no # Automatic rewrite of the append only file. # ... Omit auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb ``` Rewriting is also resource intensive , So when there's enough disk space , Here you can put 64mb Adjust to more capital , Reduce the frequency of rewriting , To achieve the optimization effect . ### fsync Function And then AOF Configure to `appendfsync everysec` After that ,Redis After processing a command , Don't call directly and immediately `write` Write data to AOF Archives , Instead, write the data into AOF buffer(server.aof_buf). call write And command processing is separate ,Redis Only every time you enter `epoll_wait` Before write operation . ```c /* Write the append only file buffer on disk.  *  * Since we are required to write the AOF before replying to the client,  * and the only way the client socket can get a write is entering when the  * the event loop, we accumulate all the AOF writes in a memory  * buffer and write it on disk using this function just before entering  * the event loop again.  *  * About the 'force' argument:  *  * When the fsync policy is set to 'everysec' we may delay the flush if there  * is still an fsync() going on in the background thread, since for instance  * on Linux write(2) will be blocked by the background fsync anyway.  * When this happens we remember that there is some aof buffer to be  * flushed ASAP, and will try to do that in the serverCron() function.  *  * However if force is set to 1 we'll write regardless of the background  * fsync. */ #define AOF_WRITE_LOG_ERROR_RATE 30 /* Seconds between errors logging. */ void flushAppendOnlyFile(int force) { // aofWrite call write Will AOF buffer Write to AOF Archives , Dealt with ENTR, Nothing else ssize_t nwritten = aofWrite(server.aof_fd,server.aof_buf,sdslen(server.aof_buf)); /* Handle the AOF write error. */ if (server.aof_fsync == AOF_FSYNC_ALWAYS) { /* We can't recover when the fsync policy is ALWAYS since the           * reply for the client is already in the output buffers, and we           * have the contract with the user that on acknowledged write data           * is synced on disk. */ serverLog(LL_WARNING,"Can't recover from AOF write error when the AOF fsync policy is 'always'. Exiting..."); exit(1); } else { return; /* We'll try again on the next call... */ } else { /* Successful write(2). If AOF was in error state, restore the          * OK state and log the event. */ } /* Perform the fsync if needed. */ if (server.aof_fsync == AOF_FSYNC_ALWAYS) { // redis_fsync It's a huge collection ,Linux It's actually fdatasync, Others are fsync //  So it's better not to put redis.conf Medium appendfsync Set to always, This greatly affects efficiency redis_fsync(server.aof_fd); /* Let's try to get this data on the disk */ } else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC && server.unixtime > server.aof_last_fsync)) { //  If already in sync Status , No more repetition // BIO The thread will interval set sync_in_progress // if (server.aof_fsync == AOF_FSYNC_EVERYSEC) //     sync_in_progress = bioPendingJobsOfType(BIO_AOF_FSYNC) != 0; if (!sync_in_progress) // everysec The efficiency is not that bad , Because it : In the background fsync. // Redis It's not strictly a single thread , In fact, it creates a set of BIO Thread , Specifically dealing with blocking and slow operations //  These operations include FSYNC, In addition, there are closing files and memory free Two operations . //  Unlike always,EVERYSEC Mode does not immediately call fsync, //  Instead, the operation is lost to BIO The thread is not running synchronously , // BIO The thread is created when the program starts , Between the two through bio_jobs and bio_pending Two //  Global object interaction , The main thread is responsible for writing ,BIO The thread is responsible for consumption . aof_background_fsync(server.aof_fd); server.aof_last_fsync = server.unixtime; } } ``` Redis The other two strategies , One is never calling fsync, Let the operating system decide the appropriate synchronization disk , It's not safe to do this ; The other is to call on a command fsync once , This leads to very slow results . Neither strategy will be used in a production environment , Just get to know . ### AOF The pros and cons of - AOF The default policy for persistence is per second `fsync` once , In this configuration ,Redis Can still maintain good performance , And even if there's a breakdown , At most, it will only lose data in one second ; - AOF A file is a log file that can only be appended (append only log), So it's right AOF File writing is not required `seek` , Even if the log contains commands that are not written completely for some reason ( For example, when writing, the disk is full , Write stop midway , wait ), `redis-check-aof` Tools can also easily fix this problem . - Redis Can be in AOF When the file volume becomes too large , Automatically in the background AOF Rewrite : New after rewriting AOF The file contains the minimum set of commands required to recover the current dataset . The whole rewrite operation is absolutely safe , Because Redis Building a new AOF In the process of filing , Will continue to append the command to the existing AOF In the file , Even if there is a outage during the rewrite , Existing AOF The files will not be lost . And once it's new AOF The files are established ,Redis From the old AOF File switch to new AOF Archives , And start on the new AOF Add files . - AOF The file orderly stores all writes to the database , These write operations Redis Format storage of agreement , therefore AOF The contents of the file are easy to read , Analyze files (parse) It's easy, too . Export (export) AOF The files are very simple : For example , If you don't execute it carefully [FLUSHALL](http://redisdoc.com/database/flushall.html#flushall) command , But as long as AOF The file has not been rewritten , Then just stop the server , remove AOF At the end of the file [FLUSHALL](http://redisdoc.com/database/flushall.html#flushall) command , And restart Redis , You can restore the dataset to [FLUSHALL](http://redisdoc.com/database/flushall.html#flushall) Status before execution . ## AOF The shortcomings of - For the same dataset ,AOF Files are usually larger than RDB The volume of the file . - According to the `fsync` Strategy ,AOF May be slower than RDB . In general , Per second `fsync` It's still very efficient , And shut down `fsync` Can make AOF Speed and RDB Just as fast , Even under high load . But when dealing with huge write loads ,RDB More guaranteed maximum delay time (latency). - AOF This has happened in the past bug : Because of individual orders , Lead to AOF When the file is reloaded , Unable to restore the dataset as it was when it was saved . ## Mix persistence Restart Redis When , If you use RDB To restore memory state , A lot of data will be lost . And if you only use AOF Log replay , That's too inefficient .Redis 4.0 Provides a hybrid persistence scheme , Will RDB The contents of the file and the incremental AOF Log files exist together . Here AOF Logs are no longer full logs , But from RDB The increment that occurs between the beginning of persistence and the end of persistence AOF The Journal , Usually this part of the log is very small . ![](https://img2020.cnblogs.com/blog/1614350/202011/1614350-20201107154043136-1912792479.png) It is in Redis When you restart , You can load RDB Content of , Then replay the increment AOF The Journal , Can completely replace the previous AOF Full playback , As a result, the restart efficiency has been greatly improved . ## Reference article : - [【Redis Persistence】](https://redis.io/topics/persistence); - [【Redis persistence demystified】](http://antirez.com/post/redis-persistence-demystified.html); - [【Redis Of appendfsync Detailed explanation of quotation 】](https://blog.csdn.net/aquester/article/details/84869158); - 《Redis Deep adventure Core principles and applications

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云