Hashmap and ConcurrentHashmap
Hashmap
- Array init size defaults to 16. Double the size when it is 75% full
- Length is always a power of two, because
hash%length==hash&(length-1)
- How to avoid/optimize rehash when resizing
- When array length > 64 and some link list(LL) length > 8, this LL will be changed to red-black tree(RBT)
- When iterating entrys, prefer EntrySet to retrive k & v at the same time.
Hashmap’s concurrency problem
- When insert two elements with the same hash, while there is no existing node at the entry, in
putVal()
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
and one node will be overwritten
- When
resize()
, this may introduce a cycle in the link
ConcurrentHashMap
Node type
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
volatile V val;
volatile Node<K,V> next;
//Note that Node.setValue gives UnsupportedOperationException(),i.e., Node itself is immutable
//....
Members
/**
* The array of bins. Lazily initialized upon first insertion.
* Size is always a power of two. Accessed directly by iterators.
*/
transient volatile Node<K,V>[] table;
/**
* The next table to use; non-null only while resizing.
*/
private transient volatile Node<K,V>[] nextTable;
Put
- Assuming the normal case, I deleted/commented/erased out some lines for the ease of reading
- CHM does not allow key or value to be null
final V putVal(K key, V value, boolean onlyIfAbsent) {
int hash = spread(key.hashCode());
int binCount = 0;//size in the list to decide if need to convert to RBT
for (Node<K,V>[] tab = table;;) {
Node<K,V> f;//current node at tbl
int n; //size of array, always a power of two
int i; //index at the tbl array
int fh; //mod hash value
if (tab == null || (n = tab.length) == 0)
tab = initTable();
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
// (n-1) & hash, fast mod with 2^n, i,e. first time ever at this slot
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
//can't CAS, spin again
}
else if ((fh = f.hash) == MOVED)
//resizing is underway, let it finish first, may create a thread to speed up resizing
tab = helpTransfer(tab, f);
else {
V oldVal = null;
synchronized (f) {
if (tabAt(tab, i) == f) {
//now that we are in mutex, first check if the node value remains valid
if (fh >= 0) {
binCount = 1;
//linked list upsert
}
else if (f instanceof TreeBin) {
binCount = 2;
//RBT upsert
}
}
}
if (binCount != 0) {
if (binCount >= TREEIFY_THRESHOLD)
treeifyBin(tab, i);
if (oldVal != null)
//update case. no need to resize
return oldVal;
break; //conclude spinning
}
}
}
//attempt to help resize in case of insert
addCount(1L, binCount);
return null;
}
get()
does not use any mutex or CAS, why does it work?- Every time we update the table, we acquire the mutex first, i.e., in effect only 1 thread updating volatiles. Therefore, reading volatiles becomes thread safe
Concurrent resizing
Inside transfer()
- Mark copied node and empty node as
ForwardingNode
. When iterating, when we see such nodes, move to the next - Uses
MIN_TRANSFER_STRIDE
to lower bound the range - No new thread is spawned. It reuses threads in
put()
.