How to deal with uneven hash distribution in redis

Gemini lone wolf 2021-01-20 21:46:36
deal uneven hash distribution redis


Preface

Redis Is a key value pair database , The key is stored by hash . Whole Redis It can be thought of as an outer hash , It is called outer hash , Because Redis A hash type is also provided internally , This can be called an internal hash . When we use hash objects for data storage , To the whole Redis for , It goes through two layers of hash storage .

Hash object

The hash object itself is also a key-value Storage structure , The underlying storage structure can also be divided into two types :ziplist( Compressed list ) and hashtable( Hashtable ). These two storage structures are also distinguished by coding :

Encoding properties describe object encoding Command return value
OBJ_ENCODING_ZIPLIST Hash objects using compressed lists ziplist
OBJ_ENCODING_HT Hash objects using dictionaries hashtable

hashtable

Redis Medium key-value It's through dictEntry Object wrapped , And the hash table is to put dictEntry The object is packaged again to get , This is the hash table object dictht

typedef struct dictht {
dictEntry **table;// Hash table array
unsigned long size;// Hash table size
unsigned long sizemask;// Mask size , Used to calculate index values , Always equal to size-1
unsigned long used;// The number of existing nodes in the hash table
} dictht;

Be careful : In the above structure definition table Is an array , Each of its elements is a dictEntry object .

Dictionaries

Dictionaries , Also known as the symbol table (symbol table), Associative array (associative array) Or map (map), A hash table is nested inside the dictionary dictht object , Here's a dictionary ht The definition of :

typedef struct dict {
dictType *type;// Some specific functions of dictionary type
void *privdata;// Private data ,type You may need to use
dictht ht[2];// Hashtable ( Notice that there is 2 Hash table )
long rehashidx; //rehash Indexes , be not in rehash when , The value is -1
unsigned long iterators; // Number of iterators in use
} dict;

among dictType Some common functions are defined internally , Its data structure is defined as follows :

typedef struct dictType {
uint64_t (*hashFunction)(const void *key);// Compute hash function
void *(*keyDup)(void *privdata, const void *key);// Copy key functions
void *(*valDup)(void *privdata, const void *obj);// Copy value function
int (*keyCompare)(void *privdata, const void *key1, const void *key2);// Contrast the bond function
void (*keyDestructor)(void *privdata, void *key);// Destroy key functions
void (*valDestructor)(void *privdata, void *obj);// Destroy the value function
} dictType;

When we create a hash object , You can get the following diagram ( Some attributes are omitted ):

rehash operation

dict An array is defined in ht[2],ht[2] Two hash tables are defined in :ht[0] and ht[1]. and Redis By default, only ht[0], It doesn't use ht[1], Not for ht[1] Initialize allocation space .

When setting a hash object , It's going to be a hash array ( In the picture above dictEntry[3]) Which subscript in , It's determined by calculating the hash value . If there is a hash collision ( The calculated hash values are consistent ), Then there will be more than one subscript dictEntry, So as to form a linked list ( The far right in the picture above points to NULL The location of ), However, it should be noted that the last element inserted is always at the top of the list ( When a hash conflict occurs , Always put the node at the head of the list ).

When reading data, we encounter a node with multiple elements , You need to traverse the linked list , So the longer the list , The worse performance . To ensure the performance of the hash table , When one of the following two conditions is satisfied , Do... On the hash table rehash( Re hash ) operation :

  • The load factor is greater than or equal to 1 And dict_can_resize by 1 when .
  • The load factor is greater than or equal to the security threshold (dict_force_resize_ratio=5) when .

PS: Load factor = Number of nodes used by hash table / Hash table size ( namely :h[0].used/h[0].size).

rehash step

Both extended hashes and contracted hashes are executed by rehash To complete , This involves the allocation and release of space , There are five steps :

  1. For the dictionary dict Of ht[1] Hash table allocation space , Its size depends on the number of saved nodes in the current hash table ( namely :ht[0].used):

    • If it is an extension operation, then ht[1] The size is 2 Of n The first of the powers is greater than or equal to ht[0].used * 2 The value of the property ( such as used=3, here ht[0].used * 2=6, so 2 Of 3 The second party is 8 The first one is greater than used * 2 Value (2 Of 2 Power < 6 And 2 Of 3 Power > 6)).
    • If it is a contraction operation, then ht[1] The size is 2 Of n The first of the powers is greater than or equal to ht[0].used Value .
  2. Put the attributes in the dictionary rehashix Is set to 0, It means that it is executing rehash operation .

  3. take ht[0] All key value pairs in the hash are recalculated in turn , And on the ht[1] Array corresponding position , Every time you complete a key value pair rehash after rehashix The value of needs to increase 1.

  4. When ht[0] All key value pairs in are migrated to ht[1] after , Release ht[0] , And will ht[1] It is amended as follows ht[0], Then create a new one ht[1] Array , For the next time rehash To prepare for .

  5. Put the attributes in the dictionary rehashix Set to -1, This time rehash End of operation , Wait for the next time rehash.

Progressive type rehash

Redis This rehashing operation in Because it's not all at once rehash, It's a lot of times, slowly ht[0] Key value pairs in rehash To ht[1], So this kind of operation is also called progressive rehash. Progressive type rehash You can avoid centralization rehash The huge amount of computation , It's the idea of divide and rule .

In a progressive way rehash In the process , Because there may be new key value pairs saved in , here ** Redis The new method is to put the newly added key value pairs into ht[1] in , This ensures that ht[0] The number of key value pairs will only decrease **.

When executing rehash In operation , If the server receives a command from the client requesting operation , be We'll check ht[0], I can't find the result, and then I go to ht[1] Query in .

ziplist

About ziplist Some characteristics of , Previous articles have analyzed it separately , Want to know more about , Sure Click here . But note that in the hash object ziplist And list objects ziplist The difference is that the hash object is a key-value form , So its ziplist It is also shown in key-value,key and value Close together :

ziplist and hashtable Code conversion of

When a hash object can satisfy either of the following two conditions , The hash object will choose to use ziplist Code to store :

  • The total length of all key value pairs in the hash object ( Including keys and values ) Less than or equal to 64 byte ( This threshold can be determined by the parameter hash-max-ziplist-value To control ).
  • The number of key value pairs in the hash object is less than or equal to 512 individual ( This threshold can be determined by the parameter hash-max-ziplist-entries To control ).

If either of these two conditions is not satisfied , The hash object will choose to use hashtable Code for storage .

Common commands for hash objects

  • hset key field value: Set individual field( Hash object key value ).
  • hmset key field1 value1 field2 value2 : Set up multiple field( Hash object key value ).
  • hsetnx key field value: Hash table key Mesosphere field Is set to value, If field Already exists , Do nothing .
  • hget key field: Get hash table key In the domain field Corresponding value.
  • hmget key field1 field2: Get hash table key Multiple domains in field Corresponding value.
  • hdel key field1 field2: Delete hash table key One or more of field.
  • hlen key: Return hash table key Number of domains .
  • hincrby key field increment: Hash table key In the domain field Plus the increment increment ,increment Can be negative , If field If it's not a number, it will report an error .
  • hincrbyfloat key field increment: Hash table key In the domain field Plus the increment increment,increment Can be negative , If field No float Type will report an error .
  • hkeys key: Get hash table key All domains in .
  • hvals key: Get the values of all fields in the hash table .

Learn about common commands for manipulating hash objects , We can verify the type and encoding of the hash object mentioned above , Before testing, in order to prevent other key The interference of values , Let's do it first flushall Command clear Redis database .

Then execute the following commands in turn :

hset address country china
type address
object encoding address

The results are as follows :

You can see that when there is only one key value pair in our hash object , The underlying code is ziplist.

Now we will hash-max-ziplist-entries Change the parameter to 2, And then restart Redis, Finally, enter the following command to test :

hmset key field1 value1 field2 value2 field3 value3
object encoding key

After output, we get the following result :

You can see , Coding has become hashtable.

summary

This paper mainly introduces Redis in 5 Hash type is one of the most commonly used data types, and the underlying storage structure hashtable Use , And when hash When the distribution is uneven Redis It's about how to rehash , Finally, we learned about some common commands of hash objects , Some examples are given to verify the conclusion .

版权声明
本文为[Gemini lone wolf]所创,转载请带上原文链接,感谢
https://javamana.com/2021/01/20210120214608665z.html

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云