Redis data structure dictionary

sherlock_lin 2020-11-07 23:59:07
redis data structure dictionary

1、 explain

When we use Redis Of Hash In operation , The underlying implementation is the dictionary .

After introducing the dictionary , Let's first recall Redis Medium Hash operation . The most common is HSET and HGET> HSET user name sherlock
(integer) 1> HSET user age 20
(integer) 1> HGET user name
"sherlock"> HGET user age

except HSET and HGET Other common instructions are :HDEL、HEXISTS、HGETALL、HMGET wait , Here is not a list ,Redis Of Hash The operation is generally based on H At the beginning .

We can see ,Hash Operation can save many sets of key value pairs , The bottom line of sight is the dictionary


Dictionary definition in the source directory src/dict.h In file , For the sake of understanding , Let's go up from the most basic structure


First of all dictEntry, It represents a set of key value pairs in a dictionary , The statement is as follows :

typedef struct dictEntry {
void *key; // key
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v; // value
struct dictEntry *next; // Point to the next key value pair
} dictEntry;
  • key Represents the key of a key value pair ;

  • v Represents the value of a key value pair ,v It's a community , Indicates that the value type here can be a pointer 、uint64_t、int64_t and double One of them , Using community can save memory ;

  • dictEntrynext Point to the next set of key value pairs , Here is the linked list , When the key value to be stored is repeated with the last calculated storage location index , Just use the linked list , Store multiple key value pairs in an array element , and ,Redis in , The new data is stored at the top of the list


dictht That is to say Redis The value structure of the operation , Used to save multiple sets of key value pairs , The statement is as follows :

typedef struct dictht {
dictEntry **table; // Key value pair array , The element of an array is a linked list
unsigned long size; // The size of the key value pair array
unsigned long sizemask; // Mask , Used to calculate the index
unsigned long used; // Number of key-value pairs
} dictht;
  • table It's a dictEntry Array pointer of type , Every element of it is a pointer , All point to one dictEntry type ;
  • size Represents the size of the array of key value pairs ;
  • sizemask For the mask , Used to calculate the array index when the key value pair is inserted , It's always size - 1, I'll talk about it again later ;
  • used Represents the number of key value pairs stored in the current hash table ;

Here is a dictht Storage structure of ,k0 and k1 The same index is calculated for the key value of , So put it in an array element :



dict It's the ultimate dictionary data structure , The statement is as follows :

typedef struct dictType {
unsigned int (*hashFunction)(const void *key);
void *(*keyDup)(void *privdata, const void *key);
void *(*valDup)(void *privdata, const void *obj);
int (*keyCompare)(void *privdata, const void *key1, const void *key2);
void (*keyDestructor)(void *privdata, void *key);
void (*valDestructor)(void *privdata, void *obj);
} dictType;
typedef struct dict {
dictType *type;
void *privdata;
dictht ht[2];
long rehashidx; /* rehashing not in progress if rehashidx == -1 */
int iterators; /* number of iterators currently running */
} dict;
  • dictType Is a set of operation function pointers , Used to manipulate a specific type of key value pair ,Redis Different types of functions will be set for different types of key value pairs ;

  • privdata Represents the specific private data that needs to be passed to the operator function ;

  • ht It's a dictht An array of types , There are two elements , The reason why two , It's because of the need to rehash, I'll talk about it again later ;

  • rehashidx Express rehash speed of progress ,-1 It means that there is no progress at present rehash;

  • iterators Represents the number of iterators currently running , No special explanation will be given this time ;

The figure below shows one that is not in rehash Dictionary


3、 How to store dictionaries

Redis Hash operations in , seeing the name of a thing one thinks of its function , Storage methods must have something to do with hashing , Here is a brief introduction to hash algorithm .

3.1、 The hash algorithm

Hash algorithm is also called hash function , It takes any length of input , Through a series of algorithms , To a fixed length output .

Redis The underlying hash algorithm is MurmurHash Algorithm , By the first Austin Appleby stay 2008 The invention of , The advantage is , Whether the input is regular or not , The output is randomly distributed , And very fast .

In fact, students who understand hash can understand , Use limited hash Value to represent infinite data , There is bound to be the problem of duplicate hash values from different data , Of course , Professional saying is not repetition , It's called Collision .

3.2、 Stored procedure and key conflict

Back to the point , Now when a key value pair is added to the dictionary , The current key value pair needs to be stored in table Medium index, There are two steps in the calculation , First, calculate according to the key hash value , According to hash Value and mask calculation index

hash = dict->type->hashFunction(key); // Calculate it according to the key hash value
// according to hash Values and masks calculate indexes ,ht[x] It could be h[0], It could be h[1], according to rehash To decide
index = hash & dict->ht[x].sizemask;

First , For different key, It's calculated here hash The value may be the same , secondly , Different hash The value is taken through and mask &, There will be the same index, That's why you use linked lists , stay Redis It's called a key collision in (collision)

We said earlier ,sizemask Always size - 1, The maximum index of the array ,hash & sizemask It's guaranteed that index It must be less than or equal to sizemask Value , That is, it must be in the array

Wrong calculation index after , Save the key value pair to table Corresponding position ,table The elements of are all linked lists , Newly inserted key value pairs , It will be kept at the top of the list , This is the most efficient , The time complexity is O(1), If it's on the back end , So the time complexity is O(N). It's the same at the front end and the end anyway , Take the fastest one to insert .

After this storage is completed , We get data, that is, in arrays and linked lists , Time complexity is O(1) + O(N), there N The smaller the better. , The smaller the list, the better , It's better not to have key conflicts , So time complexity is O(1)


rehash to hash Tables are expanded and shrunk .

This operation is very necessary , You can imagine , Let's say we created dictEntry The size of the array is only 100 individual , As time goes on , The number of saved key value pairs gradually increases , Five or six hundred , that , The probability of key collisions increases exponentially , In the end, the linked list elements in an individual array will have more than one , This greatly increases the efficiency of reading , At this point, it is necessary for the original only 100 Expand the array of elements , For example, it will be expanded to five or six hundred , Try to make sure that , The number of nodes in the linked list is the smallest . Empathy , When the saved key value pairs are deleted , Reducing the array can save memory , It's empty anyway , It's better to release .

mention hash, We have to mention the load factor (load factor) The concept of , It is a ratio of the saved key value pairs to the size of the array , Use expansion and contraction , Keep it within a reasonable range , It can avoid memory waste and read inefficiency

The calculation formula of load factor is as follows :

load_factor = ht[0].used / ht[0].size

used Is the number of key value pairs saved ,size Is the size of the array

so , If the load factor is too large , There are many linked list elements that represent the elements in the array , That is, the probability of a key collision will increase ;

And if the load factor is too small , Indicates that some elements in the array may not be used , That is, some memory is wasted ;

The contraction and expansion of hash table :

Redis in , When one of the following two conditions is met , Will automatically expand the operation :

  1. When the server does not execute BGSAVE and BGREWRITEAOF command , And the load factor is greater than or equal to 1 When ;
  2. When the server is executing BGSAVE and BGREWRITEAOF command , And the load factor is greater than or equal to 5 When ;

This is because the two commands are in the process of execution ,Redis You need to create a child process , Most operating systems use copy on write technology to optimize the efficiency of sub processes , So during the existence of the child process , The server will increase the load factor needed to trigger the expansion , Try to avoid extending operations during the existence of child processes , Maximize memory savings .

When Redis Conduct rehash In this operation , The client is still using , Both should be taken into account rehash and client , It is necessary to ensure that the original data and new data are stored and queried at the same time . This is it. dict In structure ht The size is 2 Why .

In the absence of rehash When , Use only ht[0],rehash The old and new data will be hashed again , Deposit in ht[1] in .rehash When it's done , take ht[1] become ht[0], The original ht[0] release , Let's create another one ht[1]

3.4、 Progressive type rehash

When there's a lot of data ,rehash If the operation is one-time, it will h[0] Data go to ht[1], Can cause service downtime , This is unacceptable . therefore ,Redis Of rehash The operation is not a one-off 、 Do it centrally , It's a lot of times 、 Completed gradually .

The general steps are as follows :

  1. by ht[1] Allocate space ,, Let the dictionary hold ht[0] and ht[1] Two hash tables ;
  2. take rehashindex Value is set to 0, To begin rehash;
  3. stay rehash period , Every time the dictionary is added, deleted, checked and modified , By the way, the program will ht[0] The hash table in rehashindex All key value pairs on rehash To ht[1] On , The only way to finish is ,rehashindex Increasing ;
  4. With the continuous implementation of dictionary operations , At some point in time ,ht[0] All key value pairs of will be rehash to ht[1], Now ,rehash Complete , take rehashindex Set to -1;

Progressive type rehash Is that , It's in the form of divide and conquer , take rehash The workload of key value pairs is distributed to every addition, deletion, query and modification of the dictionary , Avoid centralized rehash It's a huge amount of computation

Progressive type rehash Hash table operations during execution :

rehash When ,ht[0] and ht[1] Use at the same time , The operation of adding, deleting, looking up and modifying the dictionary will be performed successively in ht[0] and ht[1] On the implementation , For example, when looking for , First in ht[0] Up lookup , If not found , It's in ht[1] Up lookup .

rehash When ,ht[0] Just delete, modify and check , Will not add , Add will be added directly to ht[1] in , To ensure the ht[0] The number of key value pairs in is only reduced but not increased , until rehash When it's done ,ht[0] Become an empty watch .

4、 summary

  1. Dictionaries are widely used to implement Redis Various functions of , Including databases and hash keys ;
  2. The bottom layer of the dictionary is implemented using hash table , Each dictionary has two hash tables , A normal use , One rehash When you use ;
  3. Hash table uses a single Necklace table to resolve key conflicts ;
  4. Hash table expansion and contraction ,Redis Will put a hash table rehash To another hash table , This operation is progressive , It's not done in one time ;

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云