java hashcode

tech2023-02-19  111

java hashcode

At SitePoint we’re always looking to expand the range of topics we cover. Lately, we’ve set our sights on exploring the world of Java. If you’re a strong Java developer who wants to contribute to our coverage, get in touch with a few ideas for articles you’d like to write.

在SitePoint,我们一直在寻求扩大我们涵盖的主题范围。 最近,我们将目光投向了探索Java的世界。 如果您是一名强大的Java开发人员,想为我们的报道做出贡献,请与您想写的文章的一些想法保持联系。

So you’ve decided that identity isn’t enough for you and wrote a nice equals implementation? Great! But now you have to implement hashCode as well.

因此,您已经确定身份对您来说还不够,并编写了一个不错的equals实现 ? 大! 但是现在您还必须实现hashCode 。

Let’s see why and how to do it correctly.

让我们看看为什么以及如何正确进行。

平等和哈希码 (Equality and Hash Code)

While equality makes sense from a general perspective, hash codes are much more technical. If we were being a little hard on them, we could say that they are just an implementation detail to improve performance.

虽然从一般角度讲平等是有意义的,但哈希码更具技术性。 如果我们对它们稍加苛刻,可以说它们只是为了提高性能的实现细节。

Most data structures use equals to check whether they contain an element. For example:

大多数数据结构使用equals来检查它们是否包含元素。 例如:

List<String> list = Arrays.asList("a", "b", "c"); boolean contains = list.contains("b");

The variable contains is true because, while instances of "b" are not identical (again, ignoring String interning), they are equal.

变量contains为true是因为,尽管"b"实例不相同(再次忽略String interning ),但它们是相等的。

Comparing every element with the instance given to contains is wasteful, though, and a whole class of data structures uses a more performant approach. Instead of comparing the requested instance with each element they contain, they use a shortcut that reduces the number of potentially equal instances and then only compare those.

但是,将每个元素与指定给contains的实例进行比较是浪费的,并且整个数据结构类都使用性能更高的方法。 他们没有使用请求的实例与它们包含的每个元素进行比较,而是使用了一种捷径来减少可能相等的实例的数量,然后仅对它们进行比较。

This shortcut is the hash code, which can be seen as an object’s equality boiled down to an integer value. Instances with the same hash code are not necessarily equal but equal instances have the same hash code. (Or should have, we will discuss this shortly.) Such data structures are often named after this technique, recognizable by the Hash in their name, with HashMap the most notable representative.

此快捷方式是哈希码,可以看作是将对象的等式简化为整数值。 具有相同哈希码的实例不一定相等,但是相等的实例具有相同的哈希码。 (或者应该简短讨论,我们将对此进行简短讨论。)此类数据结构通常是根据这项技术命名的,其名称由Hash识别,其中HashMap是最著名的代表。

This is how they generally work:

它们通常是这样工作的:

When an element is added, its hash code is used to compute the index in an internal array (called a bucket).

添加元素时,其哈希码用于计算内部数组(称为存储桶)中的索引。 If other, non-equal elements have the same hash code, they end up in the same bucket and must be bundled together, e.g. by adding them to a list.

如果其他不相等的元素具有相同的哈希码,则它们将以相同的桶结尾,并且必须捆绑在一起,例如,通过将它们添加到列表中。

When an instance is given to contains, its hash code is used to compute the bucket. Only elements therein are compared to the instance.

将实例赋予contains ,其哈希码用于计算存储桶。 仅将其中的元素与实例进行比较。

This way, very few, ideally no equals comparisons are required to implement contains.

这样,只有极少数(理想情况下)不需要equals比较来实现contains 。

As equals, hashCode is defined on Object.

作为equals , hashCode其上定义Object 。

关于散列的思考 (Thoughts on Hashing)

If hashCode is used as a shortcut to determine equality, then there is really only one thing we should care about: Equal objects should have the same hash code.

如果将hashCode用作确定相等性的捷径,那么实际上我们只需要关注一件事:相等的对象应该具有相同的哈希码。

This is also why, if we override equals, we must create a matching hashCode implementation! Otherwise things that are equal according to our implementation would likely not have the same hash code because they use Object‘s implementation.

这也是为什么,如果我们重写equals ,我们必须创建一个匹配的hashCode实现! 否则,根据我们的实现相等的事物可能不会具有相同的哈希码,因为它们使用Object的实现。

hashCode合同 (The hashCode Contract)

Quoting the source:

引用来源 :

The general contract of hashCode is:

hashCode的一般约定为:

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

只要在Java应用程序执行期间在同一对象上多次调用它, hashCode方法就必须始终返回相同的整数,前提是不修改该对象的equals比较中使用的信息。 从一个应用程序的执行到同一应用程序的另一执行,此整数不必保持一致。

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

如果根据equals(Object)方法两个对象相等,则在两个对象中的每个对象上调用hashCode方法必须产生相同的整数结果。

It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

根据equals(Object)方法,如果两个对象不相等,则不需要在两个对象中的每一个上调用hashCode方法都必须产生不同的整数结果。 但是,程序员应该意识到,为不相等的对象生成不同的整数结果可能会提高哈希表的性能。

The first bullet mirrors the consistency property of equals and the second is the requirement we came up with above. The third states an important detail that we discuss will in a moment.

第一个项目符号反映了equals的一致性属性,第二个项目符号是我们上面提出的要求。 第三部分指出了我们稍后将讨论的重要细节。

实现hashCode (Implementing hashCode)

A very easy implementation of Person.hashCode is the following:

Person.hashCode一个非常简单的实现如下:

@Override public int hashCode() { return Objects.hash(firstName, lastName); }

The person’s hash code is computed by computing the hash codes for the relevant fields and combining them. Both is left to Objects‘ utility function hash.

该人员的哈希码是通过计算相关字段的哈希码并将其组合来计算的。 两者都留给Objects的效用函数hash 。

选择字段 (Selecting Fields)

But which fields are relevant? The requirements help answer this: If equal objects must have the same hash code, then hash code computation should not include any field that is not used for equality checks. (Otherwise two objects that only differ in those fields would be equal but have different hash codes.)

但是哪些领域是相关的? 需求有助于回答这一问题:如果相等的对象必须具有相同的哈希码,则哈希码计算不应包括未用于相等性检查的任何字段。 (否则,两个仅在那些字段中不同的对象将相等,但具有不同的哈希码。)

So the set of fields used for hashing should be a subset of the fields used for equality. By default both will use the same fields but there are a couple of details to consider.

因此,用于哈希的字段集应该是用于相等的字段的子集。 默认情况下,两者都将使用相同的字段,但是有几个细节需要考虑。

一致性 (Consistency)

For one, there is the consistency requirement. It should be interpreted rather strictly. While it allows the hash code to change if some fields change (which is often unavoidable with mutable classes), hashing data structures are not prepared for this scenario.

首先,有一致性要求。 应该相当严格地解释它。 虽然它允许在某些字段发生更改时更改哈希代码(这对于可变类通常是不可避免的),但没有为这种情况准备哈希数据结构。

As we have seen above the hash code is used to determine an element’s bucket. But if the hash-relevant fields change, the hash is not recomputed and the internal array is not updated.

正如我们在上面看到的,哈希码用于确定元素的存储桶。 但是,如果与哈希相关的字段发生更改,则不会重新计算哈希,并且不会更新内部数组。

This means that a later query with an equal object or even with the very same instance fails! The data structure computes the current hash code, different from the one used to store the instance, and goes looking in the wrong bucket.

这意味着以后使用相等对象甚至具有相同实例的查询将失败! 数据结构会计算当前的哈希码(不同于用于存储实例的哈希码),并在错误的存储桶中查找。

Conclusion: Better not use mutable fields for hash code computation!

结论:最好不要将可变字段用于哈希代码计算!

性能 (Performance)

Hash codes might end up being computed about as often as equals is called. This can very well happen in performance critical parts of the code so it makes sense to think about performance. And unlike equals there is a little more wiggle room to optimize it.

哈希代码可能最终会被调用的次数大约equals 。 这很可能在代码的性能关键部分发生,因此考虑性能是有意义的。 而且与equals不同的是,还有更多的回旋余地来优化它。

Unless sophisticated algorithms are used or many, many fields are involved, the arithmetic cost of combining their hash codes is as negligible as it is unavoidable. But it should be considered whether all fields need to be included in the computation! Particularly collections should be viewed with suspicion. Lists and sets, for example, will compute the hash for each of their elements. Whether calling them is necessary should be considered on a case-by-case basis.

除非使用复杂的算法或涉及许多领域,否则将它们的哈希码组合在一起的算术成本是可忽略的,因为它是不可避免的。 但是应该考虑是否所有字段都需要包含在计算中! 特别是应该怀疑收藏品。 例如,列表和集合将为每个元素计算哈希值。 是否应根据需要考虑调用它们。

If performance is critical, using Objects.hash might not be the best choice either because it requires the creation of an array for its varargs.

如果性能至关重要,则使用Objects.hash也不是最佳选择,因为它需要为其varargs创建一个数组。

But the general rule about optimization holds: Don’t do it prematurely! Use a common hash code algorithm, maybe forego including the collections, and only optimize after profiling showed potential for improvement.

但是关于优化的一般规则仍然成立:不要过早地这样做! 使用通用的哈希码算法,可能会放弃包括集合的算法,并且只有在剖析显示出改进的潜力后才能进行优化。

碰撞 (Collisions)

Going all-in on performance, what about this implementation?

全力以赴,如何实现呢?

@Override public int hashCode() { return 0; }

It’s fast, that’s for sure. And equal objects will have the same hash code so we’re good on that, too. As a bonus, no mutable fields are involved!

很快,这是肯定的。 相等的对象将具有相同的哈希码,因此我们也很擅长。 另外,不涉及任何可变字段!

But remember what we said about buckets? This way all instances will end up in the same! This will typically result in a linked list holding all the elements, which is terrible for performance. Each contains, for example, triggers a linear scan of the list.

但是还记得我们所说的水桶吗? 这样,所有实例都将以相同的方式结束! 通常,这将导致包含所有元素的链表对性能造成严重影响。 每个contains ,例如,触发列表的线性扫描。

So what we want is as few items in the same bucket as possible! An algorithm that returns wildly varying hash codes, even for very similar objects, is a good start.

因此,我们想要的是在同一存储桶中尽可能少的物品! 即使对于非常相似的对象,也可以返回千差万别的哈希码的算法是一个不错的开始。

How to get there partly depends on the selected fields. The more details we include in the computation, the more likely it is for the hash codes to differ. Note how this is completely opposite to our thoughts about performance. So, interestingly enough, using too many or too few fields can result in bad performance.

如何到达那里部分取决于所选字段。 我们在计算中包含的详细信息越多,散列码发生差异的可能性就越大。 请注意,这与我们对性能的想法完全相反。 因此,有趣的是,使用太多或太少的字段可能会导致性能下降。

The other part to preventing collisions is the algorithm that is used to actually compute the hash.

防止冲突的另一部分是用于实际计算哈希的算法。

计算哈希 (Computing The Hash)

The easiest way to compute a field’s hash code is to just call `hashCode` on it. Combining them could be done manually. A common algorithm is to start with some arbitrary number and to repeatedly multiply it with another (often a small prime) before adding a field’s hash:

计算字段哈希码的最简单方法是在其上调用`hashCode`。 可以手动组合它们。 一种常见的算法是从某个任意数字开始,然后将其与另一个(通常是一个小的素数)重复相乘,然后再添加字段的哈希值:

int prime = 31; int result = 1; result = prime * result + ((firstName == null) ? 0 : firstName.hashCode()); result = prime * result + ((lastName == null) ? 0 : lastName.hashCode()); return result;

This might result in overflows, which is not particularly problematic because they cause no exceptions in Java.

这可能会导致溢出,这并不是特别有问题,因为它们在Java中不会引起异常。

Note that even great hashing algorithms might result in uncharacteristically frequent collisions if the input data has specific patterns. As a simple example assume we would compute the hash of points by adding their x and y-coordinates. May not sound too bad until we realize that we often deal with points on the line f(x) = -x, which means x + y == 0 for all of them. Collisions, galore!

请注意,如果输入数据具有特定的模式,那么即使是出色的哈希算法也可能导致异常频繁的冲突。 作为一个简单的示例,假设我们将通过添加点的x和y坐标来计算点的哈希值。 直到我们意识到我们经常处理f(x) = -x线上的点时,听起来可能不太糟,这意味着所有点都x + y == 0 。 碰撞,丰盛!

But again: Use a common algorithm and don’t worry until profiling shows that something isn’t right.

再说一遍:使用通用算法,在剖析表明不正确之前不要担心。

摘要 (Summary)

We have seen that computing hash codes is something like compressing equality to an integer value: Equal objects must have the same hash code and for performance reasons it is best if as few non-equal objects as possible share the same hash.

我们已经看到,计算哈希码就像将等式压缩为整数值:相等的对象必须具有相同的哈希码,并且出于性能方面的考虑,最好是尽可能少的不相等的对象共享同一哈希。

This means that hashCode must always be overridden if equals is.

这意味着,如果equals则必须始终覆盖hashCode 。

When implementing hashCode:

在实现hashCode :

Use a the same fields that are used in equals (or a subset thereof).

使用与equals (或其子集)相同的字段。

Better not include mutable fields.

最好不要包含可变字段。

Consider not calling hashCode on collections.

考虑不对集合调用hashCode 。

Use a common algorithm unless patterns in input data counteract them.

除非输入数据中的模式抵消了它们,否则请使用通用算法。

Remember that hashCode is about performance, so don’t waste too much energy unless profiling indicates necessity.

请记住, hashCode与性能有关,因此,除非分析表明必要,否则不要浪费太多精力。

翻译自: https://www.sitepoint.com/how-to-implement-javas-hashcode-correctly/

java hashcode

相关资源:jdk-8u281-windows-x64.exe
最新回复(0)