浅析HashMap，何时树化？常见面试题解析

tech2025-05-02 55

首先来看下HashMap的类描述

/** * Hash table based implementation of the <tt>Map</tt> interface. This * implementation provides all of the optional map operations, and permits * <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt> * class is roughly equivalent to <tt>Hashtable</tt>, except that it is * unsynchronized and permits nulls.) This class makes no guarantees as to * the order of the map; in particular, it does not guarantee that the order * will remain constant over time. * * This implementation provides constant-time performance for the basic * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function * disperses the elements properly among the buckets. Iteration over * collection views requires time proportional to the "capacity" of the * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number * of key-value mappings). Thus, it's very important not to set the initial * capacity too high (or the load factor too low) if iteration performance is * important. * * An instance of <tt>HashMap</tt> has two parameters that affect its * performance: initial capacity and load factor. The * capacity is the number of buckets in the hash table, and the initial * capacity is simply the capacity at the time the hash table is created. The * load factor is a measure of how full the hash table is allowed to * get before its capacity is automatically increased. When the number of * entries in the hash table exceeds the product of the load factor and the * current capacity, the hash table is rehashed (that is, internal data * structures are rebuilt) so that the hash table has approximately twice the * number of buckets. * * As a general rule, the default load factor (.75) offers a good * tradeoff between time and space costs. Higher values decrease the * space overhead but increase the lookup cost (reflected in most of * the operations of the <tt>HashMap</tt> class, including * <tt>get</tt> and <tt>put</tt>). The expected number of entries in * the map and its load factor should be taken into account when * setting its initial capacity, so as to minimize the number of * rehash operations. If the initial capacity is greater than the * maximum number of entries divided by the load factor, no rehash * operations will ever occur. * * If many mappings are to be stored in a <tt>HashMap</tt> * instance, creating it with a sufficiently large capacity will allow * the mappings to be stored more efficiently than letting it perform * automatic rehashing as needed to grow the table. Note that using * many keys with the same {@code hashCode()} is a sure way to slow * down performance of any hash table. To ameliorate impact, when keys * are {@link Comparable}, this class may use comparison order among * keys to help break ties. * * Note that this implementation is not synchronized. * If multiple threads access a hash map concurrently, and at least one of * the threads modifies the map structurally, it must be * synchronized externally. (A structural modification is any operation * that adds or deletes one or more mappings; merely changing the value * associated with a key that an instance already contains is not a * structural modification.) This is typically accomplished by * synchronizing on some object that naturally encapsulates the map. * * If no such object exists, the map should be "wrapped" using the * {@link Collections#synchronizedMap Collections.synchronizedMap} * method. This is best done at creation time, to prevent accidental * unsynchronized access to the map:<pre> * Map m = Collections.synchronizedMap(new HashMap(...));</pre> * * The iterators returned by all of this class's "collection view methods" * are fail-fast: if the map is structurally modified at any time after * the iterator is created, in any way except through the iterator's own * <tt>remove</tt> method, the iterator will throw a * {@link ConcurrentModificationException}. Thus, in the face of concurrent * modification, the iterator fails quickly and cleanly, rather than risking * arbitrary, non-deterministic behavior at an undetermined time in the * future. * * Note that the fail-fast behavior of an iterator cannot be guaranteed * as it is, generally speaking, impossible to make any hard guarantees in the * presence of unsynchronized concurrent modification. Fail-fast iterators * throw <tt>ConcurrentModificationException</tt> on a best-effort basis. * Therefore, it would be wrong to write a program that depended on this * exception for its correctness: the fail-fast behavior of iterators * should be used only to detect bugs.

总体来说，HashMap具有以下特性

key & value 可以为空多线程不安全大小为2的幂，就近取负载因子 0.75 是权衡时间&空间的较好值泊松分布初始容量默认 16 负载因子 & 初始容量 ——> 高效的查询和存储均匀分布数组扩容受其影响 * * Implementation notes. * * This map usually acts as a binned (bucketed) hash table, but * when bins get too large, they are transformed into bins of * TreeNodes, each structured similarly to those in * java.util.TreeMap. Most methods try to use normal bins, but * relay to TreeNode methods when applicable (simply by checking * instanceof a node). Bins of TreeNodes may be traversed and * used like any others, but additionally support faster lookup * when overpopulated. However, since the vast majority of bins in * normal use are not overpopulated, checking for existence of * tree bins may be delayed in the course of table methods. * * Tree bins (i.e., bins whose elements are all TreeNodes) are * ordered primarily by hashCode, but in the case of ties, if two * elements are of the same "class C implements Comparable<C>", * type then their compareTo method is used for ordering. (We * conservatively check generic types via reflection to validate * this -- see method comparableClassFor). The added complexity * of tree bins is worthwhile in providing worst-case O(log n) * operations when keys either have distinct hashes or are * orderable, Thus, performance degrades gracefully under * accidental or malicious usages in which hashCode() methods * return values that are poorly distributed, as well as those in * which many keys share a hashCode, so long as they are also * Comparable. (If neither of these apply, we may waste about a * factor of two in time and space compared to taking no * precautions. But the only known cases stem from poor user * programming practices that are already so slow that this makes * little difference.) * * Because TreeNodes are about twice the size of regular nodes, we * use them only when bins contain enough nodes to warrant use * (see TREEIFY_THRESHOLD). And when they become too small (due to * removal or resizing) they are converted back to plain bins. In * usages with well-distributed user hashCodes, tree bins are * rarely used. Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) with a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, the expected * occurrences of list size k are (exp(-0.5) * pow(0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * more: less than 1 in ten million * * The root of a tree bin is normally its first node. However, * sometimes (currently only upon Iterator.remove), the root might * be elsewhere, but can be recovered following parent links * (method TreeNode.root()). * * All applicable internal methods accept a hash code as an * argument (as normally supplied from a public method), allowing * them to call each other without recomputing user hashCodes. * Most internal methods also accept a "tab" argument, that is * normally the current table, but may be a new or old one when * resizing or converting. * * When bin lists are treeified, split, or untreeified, we keep * them in the same relative access/traversal order (i.e., field * Node.next) to better preserve locality, and to slightly * simplify handling of splits and traversals that invoke * iterator.remove. When using comparators on insertion, to keep a * total ordering (or as close as is required here) across * rebalancings, we compare classes and identityHashCodes as * tie-breakers. * * The use and transitions among plain vs tree modes is * complicated by the existence of subclass LinkedHashMap. See * below for hook methods defined to be invoked upon insertion, * removal and access that allow LinkedHashMap internals to * otherwise remain independent of these mechanics. (This also * requires that a map instance be passed to some utility methods * that may create new nodes.) * * The concurrent-programming-like SSA-based coding style helps * avoid aliasing errors amid all of the twisty pointer operations. */ /** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

HashMap常见面试题：

HashMap的底层数据结构？

HashMap的存取原理？

Java7和Java8的区别？（1.7 resize的时候，单链表的头插入方式，同一位置上新元素总会被放在链表的头部位置，在旧数组中同一条Entry链上的元素，通过重新计算索引位置后，有可能被放到了新数组的不同位置上。可能出现有环链表！而1.8采用尾插法避免此问题！）

为啥会线程不安全？(多线程情况最容易出现的就是：无法保证刚刚put的值，get的时候还是原值，所以线程安全还是无法保证。)

有什么线程安全的类代替么?

默认初始化大小是多少？为啥是这么多？为啥大小都是2的幂？（均匀分布 & 存取高效）

HashMap的扩容方式？负载因子是多少？为什么是这么多？

HashMap的主要参数都有哪些？

HashMap是怎么处理hash碰撞的？

hash的计算规则？

快速失败 & 安全失败？

底层数据结构：

1.8 数组和链表当链表的长度大于8时 ——> 红黑树提升查询效率

当链表的长度超过了默认阈值8的时就树形化

在treeifyBin(Node<K,V>[] tab, int hash)方法里面还要判断下 table 的 length 是否大于64，小于64是直接resize原数组长度的 2 倍。

看下Node

/** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) */ static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }

扩容

当元素个数达负载因子乘上初始容量时 2倍扩容将原有的数据重新hash存储到新的数组

线程安全可替代的类

SynchronizedMap、HashTable、ConcurrentHashMap（并发更高分段锁1.7，而1.8利用CAS和Synchronized来保证并发，内部虽然定义了segment，但仅仅是为了保证序列化时的兼容性！）

/** * Stripped-down version of helper class used in previous version, * declared for the sake of serialization compatibility. */ static class Segment<K,V> extends ReentrantLock implements Serializable { final float loadFactor; Segment(float lf) { this.loadFactor = lf; } }

hash碰撞处理

/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

快速失败&安全失败？

/** * An optimized version of AbstractList.Itr */ private class Itr implements Iterator<E> { int cursor; // index of next element to return int lastRet = -1; // index of last element returned; -1 if no such int expectedModCount = modCount; public boolean hasNext() { return cursor != size; } @SuppressWarnings("unchecked") public E next() { checkForComodification(); int i = cursor; if (i >= size) throw new NoSuchElementException(); Object[] elementData = ArrayList.this.elementData; if (i >= elementData.length) throw new ConcurrentModificationException(); cursor = i + 1; return (E) elementData[lastRet = i]; } public void remove() { if (lastRet < 0) throw new IllegalStateException(); checkForComodification(); try { ArrayList.this.remove(lastRet); cursor = lastRet; lastRet = -1; expectedModCount = modCount; } catch (IndexOutOfBoundsException ex) { throw new ConcurrentModificationException(); } } @Override @SuppressWarnings("unchecked") public void forEachRemaining(Consumer<? super E> consumer) { Objects.requireNonNull(consumer); final int size = ArrayList.this.size; int i = cursor; if (i >= size) { return; } final Object[] elementData = ArrayList.this.elementData; if (i >= elementData.length) { throw new ConcurrentModificationException(); } while (i != size && modCount == expectedModCount) { consumer.accept((E) elementData[i++]); } // update once at end of iteration to reduce heap write traffic cursor = i; lastRet = i - 1; checkForComodification(); } final void checkForComodification() { if (modCount != expectedModCount) throw new ConcurrentModificationException(); } }

当通过remove移除HashMap中的一个元素时，会修改modCount值，其他修改HashMap集合的方法也会修改modCount值。该值在创建迭代器的时候，会赋值给expectedModCount，在迭代器工作的时候，会判定检查modCount值是否修改了。如果该值被修改了，则抛出ConcurrentModificationException异常。

//直接从hashtable增删数据就会报错。 //hashtable，hashmap等非并发集合，如果在迭代过程中增减了数据，会快速失败 (一检测到修改，马上抛异常) 修改为期望的值不会抛异常 //java.util.ConcurrentModificationException

Iterator的安全失败是基于对底层集合做拷贝，因此，它不受源集合上修改的影响。

采用安全失败机制的集合容器，在遍历时不是直接在集合内容上访问的，而是先复制原有集合内容，在拷贝的集合上进行遍历。

原理：由于迭代时是对原集合的拷贝进行遍历，所以在遍历过程中对原集合所作的修改并不能被迭代器检测到，所以不会触发Concurrent Modification Exception。

缺点：基于拷贝内容的优点是避免了Concurrent Modification Exception，但同样地，迭代器并不能访问到修改后的内容，即：迭代器遍历的是开始遍历那一刻拿到的集合拷贝，在遍历期间原集合发生的修改迭代器是不知道的。

场景：java.util.concurrent包下的容器都是安全失败，可以在多线程下并发使用，并发修改。

java.util包下面的所有的集合类都是快速失败的，而java.util.concurrent包下面的所有的类都是安全失败的。快速失败的迭代器会抛出ConcurrentModificationException异常，而安全失败的迭代器永远不会抛出这样的异常。

集合遍历是使用Iterator, Iterator是工作在一个独立的线程中，且有一个互斥锁。Iterator 被创建之后会建立一个指向原来对象的单链索引表，当原来的对象数量发生变化时，这个索引表的内容不会同步改变，所以当索引指针往后移动的时候就找不到要迭代的对象，所以按照 fail-fast原则 Iterator 会马上抛出java.util.ConcurrentModificationException 异常。 Iterator 在工作的时候是不允许被迭代的对象被改变的。

HashMap的线程不安全主要体现在下面两个方面：

1.在JDK1.7中，当并发执行扩容操作时会造成环形链和数据丢失的情况。 2.在JDK1.8中，在并发执行put操作时会发生数据覆盖的情况、死循环。

最新回复(0)