python hash函数_Python hash()函数

python hash函数

Python hash() is one of the built-in function. Today we will look into the usage of hash() function and how we can override it for our custom object.

Python hash()是内置函数之一。 今天,我们将研究hash()函数的用法以及如何为自定义对象覆盖它。

什么是哈希? (What is a hash?)

In simplest terms, a hash is a fixed size integer which identifies a particular value. Please note that this is the simplest explanation.

用最简单的术语来说,哈希是标识特定值的固定大小的整数。 请注意,这是最简单的解释。

Let us point out what a fixed hash can mean:

让我们指出固定哈希值的含义:

  • Same data will have same hash value.

    相同的数据将具有相同的哈希值。
  • Even a slight change in original data can result in a completely different hash value.

    即使原始数据稍有变化,也可能导致完全不同的哈希值。
  • A hash is obtained from a hash function, whose responsibility is to convert the given information to encoded hash.

    从哈希函数获得哈希,哈希函数负责将给定信息转换为编码哈希。
  • Clearly, the number of objects can be much more than the number of hash values and so, two objects may hash to the same hash value. This is called Hash collision. This means that if two objects have the same hash code, they do not necessarily have the same value.

    显然,对象的数量可能远大于哈希值的数量,因此,两个对象可能会哈希为相同的哈希值。 这称为哈希冲突 。 这意味着, 如果两个对象具有相同的哈希码,则它们不一定具有相同的值

什么是Python哈希函数? (What is Python hash function?)

We can move into great detail about hashing but an important point about making a GOOD Hash function is worth mentioning here:

我们可以深入介绍有关散列的细节,但是在此处值得一提的是有关制作GOOD Hash函数的重要一点:

A good hash function is the one which results in the least number of collisions, meaning, no 2 set of information should have the same hash values.
一个好的哈希函数是导致冲突次数最少的函数,也就是说,没有2组信息应该具有相同的哈希值。

Apart from the above definition, hash value of an object should be cheap to calculate in terms of space and memory complexity.

除了上述定义之外,就空间和内存复杂性而言,对象的哈希值应该便宜。

Hash codes are most used in when comparison for dictionary keys is done. Hash code of dictionary keys is compared when dictionary lookup is done for a specific key. Comparing hash is much faster than comparing the complete key values because the set of integers that the hash function maps each dictionary key to is much smaller than the set of objects itself.

在完成字典键的比较时,最常用哈希码。 当对特定键进行字典查找时,将比较字典键的哈希码。 比较散列比比较完整键值要快得多,因为散列函数将每个字典键映射到的整数集比对象集本身小得多。

Also, note that if two numeric values can compare as equal, they will have the same hash as well, even if they belong to different data types, like 1 and 1.0.

另外,请注意,如果两个数值可以比较相等,则即使它们属于不同的数据类型(例如1和1.0),它们也将具有相同的哈希值。

Python hash()字符串 (Python hash() String)

Let us start constructing simple examples and scenarios in which the hash() function can be very helpful. In this example, we will simply get the hash value of a String.

让我们开始构建简单的示例和方案,其中hash()函数可能会非常有帮助。 在此示例中,我们将简单地获取String的哈希值。

name = "Shubham"

hash1 = hash(name)
hash2 = hash(name)

print("Hash 1: %s" % hash1)
print("Hash 2: %s" % hash2)

We will obtain the following result when we run this script:

python hash string, python hash function

运行此脚本时,将获得以下结果:

Here is an important catch. If you run the same script again, the hash changes as shown below:

python hash函数_Python hash()函数_第1张图片

这是一个重要的收获。 如果再次运行相同的脚本,则哈希值将发生如下变化:

So, the life of a hash is only for the program scope and it can change as soon as the program has ended.

因此,哈希的寿命仅适用于程序范围,并且可以在程序结束后立即更改。

Python哈希值略有变化 (Python hash with slight change in data)

Here, we will see how a slight change in data can change a hash value. Will it change completely or just a little? A better way is to find out through a script!

在这里,我们将看到数据的微小变化如何改变哈希值。 它会完全改变还是改变一点? 更好的方法是通过脚本找出答案!

name1 = "Shubham"
name2 = "Shubham!"

hash1 = hash(name1)
hash2 = hash(name2)

print("Hash 1: %s" % hash1)
print("Hash 2: %s" % hash2)

Let’s run this script now:

python hash int value

现在运行此脚本:

See how the hash changed completely when only one character changed in original data? This makes a hash value completely unpredictable!

看看当原始数据中只有一个字符改变时,哈希如何完全改变? 这使得哈希值完全不可预测!

如何为自定义对象定义hash()函数? (How to define hash() function for custom objects?)

Internally, hash() function works by overriding the __hash__() function. It is worth noticing that not every object is hashable (mutable collections aren’t hashable). We can also define this function for our custom class. Actually, that’s what we will do now. Before that, let’s point out some important points:

在内部, hash()函数通过重写__hash__()函数来工作。 值得注意的是,并非每个对象都是可哈希的(可变集合不是可哈希的)。 我们还可以为我们的自定义类定义此函数。 实际上,这就是我们现在要做的。 在此之前,让我们指出一些要点:

  • Hashable implementation should not be done for mutable collections as key’s of collections should be immutable for hashing.

    对于可变集合,不应执行可哈希实现,因为集合的键对于哈希而言应是不可变的。
  • We don’t have to define a custom __eq__() function implementation as it is defined for all objects.

    我们不必为所有对象定义自定义__eq__()函数实现。

Now, let us define an object and override the __hash__() function:

现在,让我们定义一个对象并覆盖__hash__()函数:

class Student:
    def __init__(self, age, name):
        self.age = age
        self.name = name

    def __eq__(self, other):
        return self.age == other.age and self.name == other.name

    def __hash__(self):
        return hash((self.age, self.name))

student = Student(23, 'Shubham')
print("The hash is: %d" % hash(student))

Let’s run this script now:

python hash object

现在运行此脚本:

This program actually described how we can override both the __eq__() and the __hash__() functions. This way, we can actually define our own logic to compare any objects.

该程序实际上描述了我们如何覆盖__eq__()__hash__()函数。 这样,我们实际上可以定义自己的逻辑来比较任何对象。

为什么可变的对象不能散列? (Why mutable objects cannot be Hashed?)

As we already know, only immutable objects can be hashed. This restriction of not allowing a mutable object to be hashed simplify the hash table a lot. Let’s understand how.

众所周知,只能对不可变的对象进行哈希处理。 不允许对可变对象进行哈希处理的限制大大简化了哈希表。 让我们了解如何。

If a mutable object is allowed to be hashed, we need to update the hash table every time the value of the objects updates. This means that we will have to move the object to a completely different bucket. This is a very costly operation to be performed.

如果允许对可变对象进行哈希处理,则每次对象值更新时,我们都需要更新哈希表。 这意味着我们将不得不将对象移至完全不同的存储桶。 这是要执行的非常昂贵的操作。

In Python, we have two objects that uses hash tables, dictionaries and sets:

在Python中,我们有两个使用哈希表的对象, 字典和集合 :

  • A dictionary is a hash table and is called an associative array. In a dictionary, only keys are hashed and not the values. This is why a dictionary key should be an immutable object as well while values can be anything, even a list which is mutable.

    字典是一个哈希表,称为关联数组。 在字典中,仅对键进行哈希处理,而不对值进行哈希处理。 这就是为什么字典键也应该是一个不变的对象,而值可以是任何东西,甚至是可变列表的原因。
  • A set contains unique objects which are hashable. If we have non-hashable items, we cannot use set and must instead use list.

    一组包含可哈希的唯一对象。 如果我们有不可散列的项目,则不能使用set,而必须使用list 。

That’s all for a quick roundup on python hash() function.

这就是对python hash()函数的快速汇总。

Reference: API Doc

参考: API文档

翻译自: https://www.journaldev.com/17357/python-hash-function

python hash函数

你可能感兴趣的:(python,java,机器学习,编程语言,redis)