1、 explain
When we use Redis Of Hash In operation , The underlying implementation is the dictionary .
After introducing the dictionary , Let's first recall Redis Medium Hash operation . The most common is HSET and HGET 了
127.0.0.1:6379> HSET user name sherlock
(integer) 1
127.0.0.1:6379> HSET user age 20
(integer) 1
127.0.0.1:6379> HGET user name
"sherlock"
127.0.0.1:6379> HGET user age
"20"
127.0.0.1:6379>
except HSET and HGET Other common instructions are :HDEL、HEXISTS、HGETALL、HMGET wait , Here is not a list ,Redis Of Hash The operation is generally based on H At the beginning .
We can see ,Hash Operation can save many sets of key value pairs , The bottom line of sight is the dictionary
2、dict
Dictionary definition in the source directory src/dict.h In file , For the sake of understanding , Let's go up from the most basic structure
2.1、dictEntry
First of all dictEntry, It represents a set of key value pairs in a dictionary , The statement is as follows :
typedef struct dictEntry {
void *key; // key
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v; // value
struct dictEntry *next; // Point to the next key value pair
} dictEntry;
-
key Represents the key of a key value pair ;
-
v Represents the value of a key value pair ,v It's a community , Indicates that the value type here can be a pointer 、uint64_t、int64_t and double One of them , Using community can save memory ;
-
dictEntrynext Point to the next set of key value pairs , Here is the linked list , When the key value to be stored is repeated with the last calculated storage location index , Just use the linked list , Store multiple key value pairs in an array element , and ,Redis in , The new data is stored at the top of the list
2.2、dictht
dictht That is to say Redis The value structure of the operation , Used to save multiple sets of key value pairs , The statement is as follows :
typedef struct dictht {
dictEntry **table; // Key value pair array , The element of an array is a linked list
unsigned long size; // The size of the key value pair array
unsigned long sizemask; // Mask , Used to calculate the index
unsigned long used; // Number of key-value pairs
} dictht;
- table It's a dictEntry Array pointer of type , Every element of it is a pointer , All point to one dictEntry type ;
- size Represents the size of the array of key value pairs ;
- sizemask For the mask , Used to calculate the array index when the key value pair is inserted , It's always size - 1, I'll talk about it again later ;
- used Represents the number of key value pairs stored in the current hash table ;
Here is a dictht Storage structure of ,k0 and k1 The same index is calculated for the key value of , So put it in an array element :
2.3、dict
dict It's the ultimate dictionary data structure , The statement is as follows :
typedef struct dictType {
unsigned int (*hashFunction)(const void *key);
void *(*keyDup)(void *privdata, const void *key);
void *(*valDup)(void *privdata, const void *obj);
int (*keyCompare)(void *privdata, const void *key1, const void *key2);
void (*keyDestructor)(void *privdata, void *key);
void (*valDestructor)(void *privdata, void *obj);
} dictType;
typedef struct dict {
dictType *type;
void *privdata;
dictht ht[2];
long rehashidx; /* rehashing not in progress if rehashidx == -1 */
int iterators; /* number of iterators currently running */
} dict;
-
dictType Is a set of operation function pointers , Used to manipulate a specific type of key value pair ,Redis Different types of functions will be set for different types of key value pairs ;
-
privdata Represents the specific private data that needs to be passed to the operator function ;
-
ht It's a dictht An array of types , There are two elements , The reason why two , It's because of the need to rehash, I'll talk about it again later ;
-
rehashidx Express rehash speed of progress ,-1 It means that there is no progress at present rehash;
-
iterators Represents the number of iterators currently running , No special explanation will be given this time ;
The figure below shows one that is not in rehash Dictionary
3、 How to store dictionaries
Redis Hash operations in , seeing the name of a thing one thinks of its function , Storage methods must have something to do with hashing , Here is a brief introduction to hash algorithm .
3.1、 The hash algorithm
Hash algorithm is also called hash function , It takes any length of input , Through a series of algorithms , To a fixed length output .
Redis The underlying hash algorithm is MurmurHash Algorithm , By the first Austin Appleby stay 2008 The invention of , The advantage is , Whether the input is regular or not , The output is randomly distributed , And very fast .
In fact, students who understand hash can understand , Use limited hash Value to represent infinite data , There is bound to be the problem of duplicate hash values from different data , Of course , Professional saying is not repetition , It's called Collision .
3.2、 Stored procedure and key conflict
Back to the point , Now when a key value pair is added to the dictionary , The current key value pair needs to be stored in table Medium index, There are two steps in the calculation , First, calculate according to the key hash value , According to hash Value and mask calculation index
hash = dict->type->hashFunction(key); // Calculate it according to the key hash value
// according to hash Values and masks calculate indexes ,ht[x] It could be h[0], It could be h[1], according to rehash To decide
index = hash & dict->ht[x].sizemask;
First , For different key, It's calculated here hash The value may be the same , secondly , Different hash The value is taken through and mask &, There will be the same index, That's why you use linked lists , stay Redis It's called a key collision in (collision)
We said earlier ,sizemask Always size - 1, The maximum index of the array ,hash & sizemask It's guaranteed that index It must be less than or equal to sizemask Value , That is, it must be in the array
Wrong calculation index after , Save the key value pair to table Corresponding position ,table The elements of are all linked lists , Newly inserted key value pairs , It will be kept at the top of the list , This is the most efficient , The time complexity is O(1), If it's on the back end , So the time complexity is O(N). It's the same at the front end and the end anyway , Take the fastest one to insert .
After this storage is completed , We get data, that is, in arrays and linked lists , Time complexity is O(1) + O(N), there N The smaller the better. , The smaller the list, the better , It's better not to have key conflicts , So time complexity is O(1)
3.3、rehash
rehash to hash Tables are expanded and shrunk .
This operation is very necessary , You can imagine , Let's say we created dictEntry The size of the array is only 100 individual , As time goes on , The number of saved key value pairs gradually increases , Five or six hundred , that , The probability of key collisions increases exponentially , In the end, the linked list elements in an individual array will have more than one , This greatly increases the efficiency of reading , At this point, it is necessary for the original only 100 Expand the array of elements , For example, it will be expanded to five or six hundred , Try to make sure that , The number of nodes in the linked list is the smallest . Empathy , When the saved key value pairs are deleted , Reducing the array can save memory , It's empty anyway , It's better to release .
mention hash, We have to mention the load factor (load factor) The concept of , It is a ratio of the saved key value pairs to the size of the array , Use expansion and contraction , Keep it within a reasonable range , It can avoid memory waste and read inefficiency
The calculation formula of load factor is as follows :
load_factor = ht[0].used / ht[0].size
used Is the number of key value pairs saved ,size Is the size of the array
so , If the load factor is too large , There are many linked list elements that represent the elements in the array , That is, the probability of a key collision will increase ;
And if the load factor is too small , Indicates that some elements in the array may not be used , That is, some memory is wasted ;
The contraction and expansion of hash table :
Redis in , When one of the following two conditions is met , Will automatically expand the operation :
- When the server does not execute BGSAVE and BGREWRITEAOF command , And the load factor is greater than or equal to 1 When ;
- When the server is executing BGSAVE and BGREWRITEAOF command , And the load factor is greater than or equal to 5 When ;
This is because the two commands are in the process of execution ,Redis You need to create a child process , Most operating systems use copy on write technology to optimize the efficiency of sub processes , So during the existence of the child process , The server will increase the load factor needed to trigger the expansion , Try to avoid extending operations during the existence of child processes , Maximize memory savings .
When Redis Conduct rehash In this operation , The client is still using , Both should be taken into account rehash and client , It is necessary to ensure that the original data and new data are stored and queried at the same time . This is it. dict In structure ht The size is 2 Why .
In the absence of rehash When , Use only ht[0],rehash The old and new data will be hashed again , Deposit in ht[1] in .rehash When it's done , take ht[1] become ht[0], The original ht[0] release , Let's create another one ht[1]
3.4、 Progressive type rehash
When there's a lot of data ,rehash If the operation is one-time, it will h[0] Data go to ht[1], Can cause service downtime , This is unacceptable . therefore ,Redis Of rehash The operation is not a one-off 、 Do it centrally , It's a lot of times 、 Completed gradually .
The general steps are as follows :
- by ht[1] Allocate space ,, Let the dictionary hold ht[0] and ht[1] Two hash tables ;
- take rehashindex Value is set to 0, To begin rehash;
- stay rehash period , Every time the dictionary is added, deleted, checked and modified , By the way, the program will ht[0] The hash table in rehashindex All key value pairs on rehash To ht[1] On , The only way to finish is ,rehashindex Increasing ;
- With the continuous implementation of dictionary operations , At some point in time ,ht[0] All key value pairs of will be rehash to ht[1], Now ,rehash Complete , take rehashindex Set to -1;
Progressive type rehash Is that , It's in the form of divide and conquer , take rehash The workload of key value pairs is distributed to every addition, deletion, query and modification of the dictionary , Avoid centralized rehash It's a huge amount of computation
Progressive type rehash Hash table operations during execution :
rehash When ,ht[0] and ht[1] Use at the same time , The operation of adding, deleting, looking up and modifying the dictionary will be performed successively in ht[0] and ht[1] On the implementation , For example, when looking for , First in ht[0] Up lookup , If not found , It's in ht[1] Up lookup .
rehash When ,ht[0] Just delete, modify and check , Will not add , Add will be added directly to ht[1] in , To ensure the ht[0] The number of key value pairs in is only reduced but not increased , until rehash When it's done ,ht[0] Become an empty watch .
4、 summary
- Dictionaries are widely used to implement Redis Various functions of , Including databases and hash keys ;
- The bottom layer of the dictionary is implemented using hash table , Each dictionary has two hash tables , A normal use , One rehash When you use ;
- Hash table uses a single Necklace table to resolve key conflicts ;
- Hash table expansion and contraction ,Redis Will put a hash table rehash To another hash table , This operation is progressive , It's not done in one time ;