Redis: choose the right data structure and reduce the memory consumption by 80%. Have you got these points?

Ma Nong blossoms 2021-01-14 19:43:08
redis choose right data structure

redis As the most popular nosql Cache database , With its excellent performance 、 Rich data structure has become the preferred caching tool in most scenarios .

because redis It's a pure memory database , When storing large amounts of data , The memory footprint will be huge . So in some scenarios , By choosing the right data structure to store , Can greatly reduce the memory consumption , It can even reduce 80%-99% Memory footprint .

utilize zipList To replace a lot of Key-Value

Let's take a look at the scene , stay Dsp Advertising system 、 Mass user systems often encounter such needs , It is required to quickly find the user's id. For example, according to mac Address or uuid Or cell phone number md5, Go to find the user's id.

It's characterized by a large amount of data 、 Ten million or one hundred million ,key It's a long string , Such as 32 Bit md5 perhaps uuid such .

If not dealt with , Directly to key-value Form for storage , We can test it out very briefly , Go to redis Insert inside 1 Ten million pieces of data ,1550000000 - 1559999999, The form is key(md5(1550000000))→ value(1550000000) such .

And then use info memory Look at the memory usage .


You can see , this 1 Ten million pieces of data , Occupied redis total 1.17G Of memory . When the amount of data becomes 1 One hundred million hours , The measured occupancy is about 8 individual G.

The same data , Let's change the storage mode , Let's see the results first :


Before we use zipList after , Memory usage is 123M, About reduced 85% The space occupied by , How does this work ? Let's come from redis To analyze the underlying storage .

1 redis Data structure and coding


2 redis How to store strings

string yes redis The most commonly used data structure in ,redis The default string and C Language strings are different , It's a self built one called “ Simple dynamic string SDS” Abstract type of .


Specific to the string Underlying storage ,redis There are three common ways , Namely int、embstr and raw.

for example set k1 abc and set k2 123 It will be used separately embstr、int. When value Is longer than 44( or 39, Different versions are different ) Bytes , Will be used raw.


int It's a fixed length structure , Occupy 8 Bytes ( Be careful , amount to java Inside long), It can only be used to store long plastic .

embstr It's dynamic expansion , Every expansion 1 times , exceed 1M when , Just expand the capacity every time 1M.

raw Used to store larger than 44 A byte string .

In our case ,key yes 32 A byte string (embstr),value It's a long plastic (int), So if you can 32 Bit md5 become int, So in key It can be directly reduced on the storage 3/4 Memory footprint .

This is the first optimization point .

3 redis How to store Hash

from 1.1 We can see Hash data structure , There are two ways of coding ,1 yes hashTable,2 yes zipList.

hashTable You are familiar with , and java Inside hashMap It's like , It's all arrays + The way of linked list .java in hashmap In order to reduce the hash Conflict , Set the load factor to 0.75. Again ,redis Of hash There are similar expansion load factors . Don't mention the details , Just make an impression , use hashTable Code words , It will cost at least more than the stored data 25% Only in this space can we save these data . It looks like this :


zipList, Compress the list , It looks like this :


You can see ,zipList The biggest characteristic is , It's not at all hash structure , It's a long string , take key-value They are placed in a long string in order to store . If you want to find someone key Words , Just go through the whole long string .

So obviously ,zipList than hashTable Take up much less space . But it will cost more cpu To query .

So when to use hashTable、zipList Well ? stay redis.conf You can find :


It's when hash The inner layer of the structure field-value No more than 512, also value The number of bytes does not exceed 64 when , Just use zipList.

By measuring ,value Quantity in 512 when , Performance and simplicity hashTable Almost no difference , stay value No more than 1024 when , There's only a tiny drop in performance , A lot of times you can ignore .

And memory usage ,zipList Comparable hashTable It's down a lot .

This is the second optimization point .

4 use zipList Instead of key-value

Through the knowledge above , We came to two conclusions . use int As key, than string Save a lot of space . use hash Medium zipList, than key-value Save a lot of space .

So let's transform the original 1 Ten million key-value.

First step :

We're going to 1 Ten million key value pairs , Put it in N individual bucket in , Every bucket It's a redis Of hash data structure , And let each bucket No more than the default 512 Elements ( If you change the configuration file , Such as 1024, The modified value cannot be exceeded ), To avoid hash Code from zipList become hashTable.

1 Ten million / 512 = 19531. Because in the future, all of key Hash algorithm , To try to spread it all out bucket in , But because of the uncertainty of hash function , It may not be completely evenly distributed . So we need to reserve some space , For example, I assign 25000 individual bucket, or 30000 individual bucket.

The second step :

Choose hash algorithm , It was decided that key Where to put it bucket. Here we use efficient and balanced well-known algorithm crc32, The hash algorithm can change a string into a long The number of type , By getting this md5 Type key Of crc32 after , Right again bucket Take the remaining quantity , You can be sure of key Which to put bucket in .


The third step :

Through the second step , We're sure key To be stored in redis in hash The outer layer of the structure key, For the inner layer field, Let's choose another one hash Algorithm , To avoid two completely different values , adopt crc32(key) % COUNT after , happen field Again the same , produce hash The situation in which the conflict results in the value being overridden . Inner layer field We chose to use bkdr The hash algorithm ( Or directly choose Java Of hashCode), The algorithm will also get a long Plastic numbers .value The storage of remains the same .


Step four :

Load data . The original data structure is key-value,0eac261f1c2d21e0bfdbd567bb270a68 → 1550000000.

Now the data structure is hash,key by 14523,field yes 1927144074,value yes 1550000000.

By measuring , take 1 Tens of millions of data are stored in 25000 individual bucket after , whole hash More balanced , Every bucket There's probably 300 Multiple field-value Key value pair . In theory, as long as it doesn't happen twice hash After algorithm , All produce the same value , Then we can rely on key-field To find the original value. This can be confirmed by calculating the total amount . actually , stay bucket When there are more , And each bucket Next ,value It's not a lot , The probability of a continuous collision is extremely low , Measured in storage 50 In the case of 100 million cell phone numbers , No obvious collision .

Test query speed :

After storing this 1 After tens of millions of data , We did a query test , use key-value The type and hash type , Check separately 100 Ten thousand data , Look at the impact on query speed .

key-value Time consuming :10653、10790、11318、9900、11270、11029 millisecond

hash-field Time consuming :12042、11349、11126、11355、11168 millisecond .

You can see , Take... As a whole hash After storage , Inquire about 100 Ten thousand of them take time , It just added 500 Less than a millisecond . The impact on performance is minimal . But memory usage from 1.1G Turned into 120M, Brought the approach 90% Memory saving .


a large number of key-value, Take up too much key,redis In order to deal with hash Collision , Need to take up more space to store these key-value data .

If key The long and short of , For example, some 40 position , There are some 10 position , Because of alignment problems , So there's a huge amount of memory fragmentation , The occupation of space is more serious . therefore , keep key The length is uniform ( For example, unified adoption int type , Fixed length 8 Bytes ), It also helps with memory usage .

string Type md5, Occupied 32 Bytes . And by hash After algorithm , take 32 Down to 8 Byte long shaping , This significantly reduces key The space occupied by .

zipList Than hashTable Significantly reduced memory usage , Its storage is very compact , It also has little impact on query efficiency . So we should be good at using zipList, To avoid the hash In structure , Store more than 512 individual field-value Elements .

If value Is string 、 Object etc. , Try to use byte[] To store , It can also greatly reduce the memory consumption . For example, you can choose google Of Snappy Compression algorithm , Convert the string to byte[], Very efficient , Compression is also very high .

In order to reduce redis Preallocation and expansion of strings ( Double every time ), Cause memory fragmentation , Should not be used append,setrange etc. . It's direct use set, Replace the original .

Program drawback :

hash Structure does not support for single field Timeout settings for . But the deletion can be controlled by code , For long-term data that doesn't need to time out , There is no such concern .

There is a smaller hash Probability of conflict , For extremely accurate data requirements , Not suitable for this kind of compression .

Based on the above scheme , I rewrote springboot Source code redisTemplate, Provides a CompressRedisTemplate class , It can be taken as redisTemplate Use , It will automatically key-value To hash For storage , To achieve the above .

follow-up , We'll be based on more extreme scenarios , Such as statistics of independent visitors , Take a look at redis The unusual data structure , How to use memory by 20G Down to 5M.

Welcome to the official account 【 Magnon blossoms 】 Learn and grow together
I'll share it all the time Java dried food , We will also share free learning materials, courses and interviews
reply :【 Computer 】【 Design patterns 】【 interview 】 There's a surprise

本文为[Ma Nong blossoms]所创,转载请带上原文链接,感谢

  1. springboot异常处理之404
  2. Spring boot security international multilingual I18N
  3. Spring boot exception handling 404
  4. Netty系列化之Google Protobuf编解码
  5. Netty之编解码
  6. Java编解码
  7. Netty解码器
  8. Netty与TCP粘包拆包
  9. Netty开发入门
  10. Java集合遍历时遇到的坑
  11. Spring IOC 源码解析(下)
  12. Spring IoC源码解析(上)
  13. Google protobuf codec of netty serialization
  14. Encoding and decoding of netty
  15. Java codec
  16. Netty decoder
  17. Netty and TCP packet sticking and unpacking
  18. Introduction to netty development
  19. Problems encountered in Java collection traversal
  20. Spring IOC source code analysis (2)
  21. Spring IOC source code analysis (Part one)
  22. 半小时用Spring Boot注解实现Redis分布式锁
  23. Implementing redis distributed lock with spring boot annotation in half an hour
  24. What should we do if we can't get tickets for Spring Festival transportation? You can solve this problem by using these ticket grabbing apps!
  25. 百度智能(文本识别),API传图OC代码与SDK使用
  26. springboot源码解析-管中窥豹系列之aware(六)
  27. Baidu intelligent (text recognition), API map, OC code and SDK
  28. Spring boot source code analysis
  29. springboot源码解析-管中窥豹系列之aware(六)
  30. 百度智能(文本识别),API传图OC代码与SDK使用
  31. Spring boot source code analysis
  32. Baidu intelligent (text recognition), API map, OC code and SDK
  33. Java学习笔记
  34. Java learning notes
  35. Sentry(v20.12.1) K8S 雲原生架構探索, SENTRY FOR JAVASCRIPT 手動捕獲事件基本用法
  36. 我的程式設計師之路:自學Java篇
  37. SpringBoot專案,如何優雅的把介面引數中的空白值替換為null值?
  38. Sentry (v20.12.1) k8s cloud native architecture exploration, sentry for JavaScript manual capture event basic usage
  39. My way of programmer: self study java
  40. Spring boot project, how to gracefully replace the blank value in the interface argument with null value?
  41. Redis 用的很溜,了解过它用的什么协议吗?
  42. Redis is easy to use. Do you know what protocol it uses?
  43. 《零基础看得懂的C++入门教程 》——(10)面向对象
  44. Introduction to zero basic C + + (10) object oriented
  45. HTTP status code and troubleshooting
  46. Java NIO之Channel(通道)入门
  47. Introduction to Java NiO channel
  48. Spring中的@Valid 和 @Validated注解你用对了吗
  49. Are you using the @ valid and @ validated annotations correctly in spring
  50. Spring中的@Valid 和 @Validated注解你用对了吗
  51. Are you using the @ valid and @ validated annotations correctly in spring
  52. Redis | 慢查询
  53. Redis | slow query
  54. RabbitMQ一个优秀的.NET消息队列框架
  55. Autofac一个优秀的.NET IoC框架
  56. 如何使用Redis实现分布式缓存
  57. Rabbitmq an excellent. Net message queue framework
  58. Autofac is an excellent. Net IOC framework
  59. How to use redis to realize distributed cache
  60. JDK1.7-HashMap原理