Detailed explanation of concurrenthashmap

Help me to the Internet cafe 2022-05-14 14:40:03 阅读数:561




The following technical points are based on JDK1.8~

Basic review

We all know , from JDK1.8 rise ,ConcurrentHashMap The underlying data structure has changed from the original Segment The section lock becomes Array + Linked list + Red and black trees The form of .

It is a concurrent container , A container containing data will definitely have various problems in the concurrent environment . You play stand-alone in a single threaded environment , In a concurrent environment, other threads will grab data with you , Grab bucket position . So write JUC The great God of Bao Doug Lea They have also adapted to these scenes one by one , It can be said to be absolutely concurrent security , At least it's been running for so many years and I haven't encountered anything bug.

Red and black trees

Red black tree data structure

JDK1.8 The red and black trees here , To be exact, it's a TreeBin proxy class , As a concrete implementation of red black tree, it plays a role of storage , and TreeNode It is the data structure that encapsulates the red black tree , So you can understand TreeBin Is the packaging TreeNode A container for .

The red and black trees are ConcurrentHashMap The embodiment inside is a two-way linked list :

Red black tree insert data

ad locum , The red black tree maintains a field dir.

When inserting data, the node's information will be obtained hash value , So as to be consistent with the current node p Of hash Value comparison , If the node is inserted hash Less than the current node , be dir The value of is -1, Otherwise 1:

therefore , When dir The value of is -1 when , This means that the inserted node needs to be inserted into the left child node of the current node or continue to search the left child tree , On the contrary, if dir The value is 1 Then look right , The rules here are the same as those of binary search tree .

Read and write operations under multithreading competition

Because the read operation itself is naturally thread safe . Therefore, there is no problem for multiple threads to read the same bucket at the same time .

But if it's a competing write operation , It is through Synchronized Lock to ensure that only one thread of a bucket can obtain resources at the same time .

You can see through the source code ,put() The core of the method is putVal():

putVal() For a long , It is mainly through Synchronized To lock every node to ensure the security of concurrency . The two most important points here , One is Judge you put The element that goes in , Is it in the linked list or on the red and black tree ; The second is Judge the currently inserted key Whether it is consistent with an element in the linked list or red black tree . If currently inserted key With all the elements in the linked list key When they are inconsistent , Then the current insert operation will be appended to the end of the linked list . Otherwise replace key Corresponding value.

Expansion principle

Before you know the expansion principle , You need to know what will lead to capacity expansion .

So two important fields to know :

  • MIN_TREEIFY_CAPACITY : The initial length of the array , The default is 64
  • TREEIFY_THRESHOLD : 2641;, 21270;38408;654040; , The length of the specified bucket list has reached 8 Words , Tree operation may occur

The thread adds each element to the bucket , Will judge the length of the linked list , Only the number of elements is greater than the threshold MIN_TREEIFY_CAPACITY And the length of the list is longer than 8, Will call treeifyBin() Turn the linked list into a red black tree , Otherwise, the capacity will be expanded .

The expansion here , It refers to expanding the number of buckets of the array , So as to put more elements .

besides , The expansion also maintains another important field ,sizeCtl:

By translating , We can see that this field has three states :

  • sizeCtl < 0: if -1 It serves as a marker , Tell other threads that they are initializing at this time ; Other values indicate the current table Expanding capacity
  • sizeCtl = 0: Representation creation table The array has not been expanded yet , No initial capacity specified
  • sizeCtl > 0: Said when table Trigger conditions for the next expansion after initialization

The value of the field can be converted to 32 The binary number of bits , Its height 16 Bit indicates the capacity expansion identification stamp , It is used to identify the expansion range , For example, from the length 16 Expand to 32; low 16 Bit indicates the number of threads currently participating in capacity expansion .

Capacity expansion will newly build A longer array , Then migrate all the elements on the old array to the new array .

The essential purpose of capacity expansion is to reduce the length of bucket linked list , Improve query efficiency . Because the query complexity of linked list is O(n), If the linked list is too long, it will affect the query efficiency .

Suppose the length of the bucket is from 16 Expand to 32, It means that there are more barrels , After migrating to the new array, you need elements to go to the new bucket . This requires some algorithms to make a comparison between the element positions of the old array and the new array mapping . Because some elements need to be migrated to new locations after capacity expansion , Some are still in the same position as the old array , It's just an array .

How to calculate which bucket this element will stay in after migration ? A bitwise and algorithm is used here . Is to place this bucket key Of hash value & ( Array length before expansion - 1), If the generated value is equal to 0 There is no need to migrate , Otherwise, you have to migrate . And maintain two variables ln and hn Represents whether location migration is required . Then use The tail interpolation Insert elements into . This is it. LastRun Mechanism .

notes : Tail interpolation means that the elements inserted later are behind the previous element

There is no problem with simple and ordinary capacity expansion here , Most scenes are related to HashMap The expansion of is the same .

The problem is that it is currently in a concurrent environment , And capacity expansion also takes time .

Expanding capacity && There are multiple threads competing

therefore , Here comes the more complicated scene . If the bucket is expanding , And there are multiple threads competing to read and write ? Thank you for your generous gift

No problem , We still discuss it according to the situation .

Read operation during capacity expansion

If during capacity expansion , There are threads to read elements , Like you go to get() Some key Of value, What if you can't read it ?

The answer is Sure . But the premise is that your node has been migrated , If you are a node that is expanding and migrating , Then you can't access .

Specific operation , To call find().

When a bucket needs data migration , Will put a... On this bucket seat ForwardingNode node . In addition, you need to identify whether the node is being migrated or has been migrated ;

Here, we collectively call the bucket node before migration the old node , The bucket node after migration is called new node . When one of the nodes is migrated , You'll add one to the old node fwd quote , It points to the address of the new node .

So when a thread accesses this node , See it there fwd quote , It means that at present table Expanding capacity , Then it will be based on the newtable Field to find the data in the corresponding bucket of the new array, and then return .

Write operations during capacity expansion

The write operation is a little more complicated than the read operation , The reason is that the read operation only needs to obtain the corresponding data and return , The write operation also needs to modify the data , Therefore, when a write thread modifies data, it just encounters the container during capacity expansion , Then it has to Assist in container expansion .

The specific capacity expansion operation still needs to be divided into situations , If the accessed bucket data has not been migrated , Then compete directly for the lock , Then operate on the old node .

But if The node modified by the thread is just a fwd node , Indicates that the current node is in capacity expansion operation , So in order to save the number of threads and complete the task quickly , The current thread will assist in capacity expansion . If there are multiple threads writing at the same time , Then they all call helpTransfer() Assist in capacity expansion .

The way to assist in capacity expansion here is to get a capacity expansion identification stamp , The function of this identification stamp is to identify the expanded capacity . Because each thread is independent , Don't communicate with each other , But what they have to do is the same , Is to expand the bucket position by the same value , So they have to get the same identification stamp , Only when the identification stamp is consistent can the capacity be expanded .

Suppose a container starts from 16 The capacity of a bucket is expanded to 32 A bucket , There are threads A、B Two threads .

if A Triggered the mechanism of capacity expansion , So thread A It's going to expand , The thread B Also write , If you find that the capacity is being expanded, you will enter the step of assisting in capacity expansion .

So threads A And thread B Jointly responsible for the expansion of bucket space .

The number of buckets that a thread is responsible for expanding , It's based on CPU Calculated by the core number . At least 16 individual , That is, a thread should be responsible for at least 16 Expansion of elements :

We mentioned it above ,sizeCtl Turn into 32 Behind you , It's low 16 Bit indicates the number of threads currently participating in capacity expansion . So when A After the thread triggers capacity expansion , It will sizeCtl low 16 The last value of bit +1, Indicates that the capacity expansion thread has one more bit , When it exits capacity expansion, the value of the last bit will be -1, Indicates that the capacity expansion thread is missing one bit , In this way, all threads maintain this field together .

So you must be curious : How do I maintain the last thread that exits the expansion ? Yes , The last thread has something else to do . When a thread completes the task, judge sizeCtl It's worth it , Find its low 16 Only the last one left is 1, Further reduction is 0 了 , That means it is the last thread to exit the expansion . At this time, it also needs to check the old one table Array , Determine if there is anything missing slot No migration . The specific operation is to poll to check whether there is still fwd node , If not, the migration is complete , If so, you need to continue to migrate it to the new bucket :

Because the source code is very long , So we won't post all the source code , The flow chart is used to help you understand the operation during the expansion :

版权声明:本文为[Help me to the Internet cafe]所创,转载请带上原文链接,感谢。