Preface

Say it first , This article is a bit of a title party , How can he de, a vegetable chicken like me, interview Ali P7 Gang , however , This is Ali p7 The interview questions for the first-class post , Of course , It's not me who's going to the interview , It's a big guy in my department . He shared his interview experience with me , Also let me indirectly experience the difficulty of Ali level interview , That's how it works , I've had an interview with ALI P7 The people in the post of , Suddenly, I felt confidence soar .

General interview questions

about HashMap, We can't be more familiar with , Most commonly used in daily development Java The set class is it , And when interviewing for HashMap Knowledge is basically a must , Take my previous interview experience as an example , These are the most frequently asked questions :

1、HashMap What is the underlying storage structure of ?

2、 Is thread safe ? Why is it not safe ?

3、1.7 and 1.8 Version of HashMap What's the difference? ?1.7 What are the hidden dangers of , What causes it ?

4、hashcode Is it the only one ? How to compare when inserting elements ?

5、 Follow HashTable,ConcurrentHashMap What's the difference? ?

For these questions , If you've seen some blogs , Or roughly browse the source code , Basically can answer , I've had a lot of interviews before , It's rarely in HashMap This one has been lost .

The fact proved that , I'm still a little younger ( Anyway, it's also 90 After ). occasionally , It's not because you know so much , It's not deep , If you don't have a deep understanding and thinking about the source code , Others look at it from a slightly different angle , You may be in trouble .

It's like the title on the title , Why? HashMap The standard of list treeling is 8 individual ? Tell the truth , Although I knew before that the threshold for treeing is 8, But why is this number? I haven't really thought about it carefully , Take this opportunity , I've combed it all over again HashMap Source code , This paper is also a summary of some new ideas .

HashMap Basic knowledge of

HashMap Can be said to be Java The most commonly used collection class in a project , As a typical K-V Stored data structure , Its bottom layer is made up of arrays - The list consists of , When you add a new element , It will be based on the elements of hash It's worth finding the corresponding " bucket ", That is to say HashMap Source code Node<K, V> Elements in , And insert it into the linked list at the corresponding position , If the number of linked list elements is too long, it will be converted to red black tree (JDK1.8 Later version ),

Why turn into a red and black tree ?

We all know , The link list takes elements from the beginning node to the corresponding node , The complexity of this process is O(N) , And red black trees are based on the structure of binary trees , The complexity of finding elements is O(logN) , therefore , When there are too many elements , Using red black tree storage can improve the efficiency of search .

Since red and black trees are efficient , So why don't you start with a red and black tree ?

This is actually based on the balance of space and time ,JDK The source code of this problem has been explained :

* Because TreeNodes are about twice the size of regular nodes, we* use them only when bins contain enough nodes to warrant use* (see TREEIFY_THRESHOLD). And when they become too small (due to* removal or resizing) they are converted back to plain bins.

Look at the first four lines in the note , Single TreeNode The space needed is about normal Node Twice as many , So only if it contains enough Nodes It turns into TreeNodes, There are enough criteria for this TREEIFY_THRESHOLD Value ( The default value is 8) Decisive . When the number of nodes in the bucket is removed or resize ( Capacity expansion ) When it's less , The red and black tree will be transformed into a common linked list , The threshold is UNTREEIFY_THRESHOLD( The default value is 6).

/*** The bin count threshold for using a tree rather than list for a* bin.  Bins are converted to trees when adding an element to a* bin with at least this many nodes. The value must be greater* than 2 and should be at least 8 to mesh with assumptions in* tree removal about conversion back to plain bins upon* shrinkage.*/static final int TREEIFY_THRESHOLD = 8;/*** The bin count threshold for untreeifying a (split) bin during a* resize operation. Should be less than TREEIFY_THRESHOLD, and at* most 6 to mesh with shrinkage detection under removal.*/static final int UNTREEIFY_THRESHOLD = 6;

It's easy to see here , Although the query efficiency of red black tree is higher than that of linked list , But nodes take up a lot of space , Only when it reaches a certain number can it have the meaning of treeing , This is a Based on the balance of time and space .

Why the treeling standard is 8 individual

As for why the number of tree standards is 8 individual , In the source code , There is a long note at the end of the above note , We can find the answer from that note , The original text is like this :

* usages with well-distributed user hashCodes, tree bins are* rarely used.  Ideally, under random hashCodes, the frequency of* nodes in bins follows a Poisson distribution* (http://en.wikipedia.org/wiki/Poisson_distribution) with a* parameter of about 0.5 on average for the default resizing* threshold of 0.75, although with a large variance because of* resizing granularity. Ignoring variance, the expected* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /* factorial(k)). The first values are:** 0:    0.60653066* 1:    0.30326533* 2:    0.07581633* 3:    0.01263606* 4:    0.00157952* 5:    0.00015795* 6:    0.00001316* 7:    0.00000094* 8:    0.00000006* more: less than 1 in ten million

It means : If hashCode If the distribution is well dispersed , So red and black trees are rarely used , Because the values are evenly distributed , It's rare to have a long list . In an ideal situation , The length of the list conforms to Poisson distribution , The hit probability of each length decreases in turn , It's shown in the notes 1-8 Specific hit probability of length , When the length is 8 When , The probability probability is only 0.00000006, Such a small probability ,HashMap Black trees almost never change , Because we don't store so much data in our daily use , You'll save tens of millions of data to HashMap Medium ?

Of course , This is the ideal algorithm , But some users may use HashMap Process led hashCode Scenes with poor dispersion , At this time, converting to red and black trees is a good concession strategy .

As to what kind of situation will lead to , You can think for yourself or find the answer online , I won't go back to , Save some energy .

other , Let's talk , Can't I go on writing , It's not easy. , I'll let you go for nothing , And be crowned with the title of scum man , What am I trying to do ?

Let's start with , stay HashMap in , Decide which one of the objects falls on “ bucket “, It's by the object hashCode Decisive ,JDK Can't prevent users from implementing their own hash algorithm , If the user rewrites hashCode, And the algorithm implementation is relatively poor , It will probably make HashMap The list becomes very long , Like this :

public class HashMapTest {    public static void main(String[] args) {        Map<User, Integer> map = new HashMap<>();        for (int i = 0; i < 1000; i++) {            map.put(new User(" I'm Xue " + i), i);        }    }    static class User{        private String name;        public User(String name) {            this.name = name;        }        @Override        public int hashCode() {            return 1;        }    }}

We designed a hashCode For ever 1 Class User, In this way, the HashMap All of the User Objects are stored in the same “ bucket ” in , In this way, the query efficiency will undoubtedly be very low , And this is also HashMap The reason why the linked list is changed to red and black tree , It can effectively prevent users from implementing a bad hash algorithm, which leads to a long list .

hash Method

When it comes to hashing algorithms , Let's expand another knowledge point , That's what I think HashMap One of the most amazing designs in .

stay HashMap In the source code , Store the object hashCode Is calculated by hash() method-determined ,hash() yes HashMap The core function in , When storing data , take key The operation is carried out in the input , obtain key Hash value of , Only by this hash value operation can we get key It should be placed in “ bucket ” Which position of , The following is the source of the method :

static final int hash(Object key) {    int h;    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);}

As you can see from the code , Pass in key after ,hash() Will get key Of hashCode Move to the right without a sign 16 position , And then do bitwise XOR , And return the calculated value to , This value right here is key Hash value of . This operation is to reduce collisions , Because most of the elements hashCode It's the same in the low post , It's easy to cause conflict if you don't deal with it .

Besides doing 16 Bit displacement processing , In the method of adding elements ,HashMap And put it hash Value and table.length - 1, That is to say “ bucket ” The size of the array and operation , The result is the corresponding “ bucket ” Index of the array , To find the linked list to which the element belongs . In the source code :

// n The value of is table.lengthif ((p = tab[i = (n - 1) & hash]) == null)    tab[i] = newNode(hash, key, value, null);

When the corresponding index cannot be found , A new node will be created as the head node of the list . So why do we use i = (n - 1) & hash As an index operation ?

This is actually an optimization tool , Because the size of the array is always one 2 The next power , After the expansion , The new index of an element is either in its original position , Or add the capacity before expansion to the original location . The cleverness of this method lies in & operation , As mentioned before & Operations only focus on n – 1(n = The length of the array ) The effective bit , After expansion ,n There will be one more significant bit more than before (n It's going to double what it was before , So make sure the array length is always 2 The power is important ), And then just judge hash The position of the new significant bit is 0 still 1 You can calculate the new index position , If it is 0, So the index doesn't change , If it is 1, The index is the original index plus the capacity before expansion .

It is represented by an effect picture :

Bit operation , You don't have to recalculate every time you expand hash, Save a lot of time , And the new significant bit is 0 still 1 It's random , The first two collided Entry It is also possible to spread evenly again during expansion , To achieve a better distribution of discrete effect , I have to sigh , The design skills are really amazing , A few seemingly simple code actually contains so much knowledge .

Why is the threshold to degenerate into a linked list 6

The above said , When the list length reaches the threshold 8 It turns into a red and black tree when it's time , But the threshold of red black tree degradation to linked list is 6, Why not less than 8 Just degenerate ? for instance 7 And then it degenerates , It must be less than or equal to 6?

It's mainly a transition , Avoid frequent conversions between linked lists and red and black trees . If the threshold is 7 Words , To delete an element, the red black tree must degenerate into a linked list , To add an element, you have to tree , Switching structures back and forth will undoubtedly degrade performance , So the threshold is not so critical .

Last

HashMap There are still a lot of knowledge about , Here I also strongly urge you to see the source code several times , Not just for interviews , It's also about how to use yourself better HashMap To have a clearer understanding of , After all, it's so common , If you don't use it well, it's easy to produce bug. and , I think JDK The source code of our developers really have a lot to study , Like this HashMap, It doesn't have a lot of real code , But it's very efficient , most important of all , Every version of it is constantly optimized , Every line of code is exquisitely crafted , When I look at the source code, I have been sighing in my heart , If I could write that awesome code , Is it a matter to enter Ali or something ?


I am Xue , An Internet person who is not limited to technology , I want to read more interesting articles to pay attention to my official account , It's not just technology , There are also interesting water blowing ~~~


Originality is not easy. , Look at the officials 【 Three even 】 It will be the biggest driving force of my creation , Thank you for your support !

This article USES the mdnice Typesetting