哈希什么意思_哈希什么

哈希什么意思

Hashing is an important topic for programmers and computer science students to be familiar with. This article is specifically targeted to students, and programmers with a few months to a year of coding experience.

哈希是程序员和计算机科学专业学生熟悉的重要主题。 本文专门针对具有几个月到一年的编码经验的学生和程序员。

什么是散列 (What Hashing Is)

Hashing: generating a value or values from a string using a mathematical function

散列:使用数学函数从字符串生成一个或多个值

Hashes are mostly used for three things:

哈希主要用于三件事:

  1. Storing stuff without actually knowing what it is

    在不实际知道什么的情况下存储东西
  2. As a convenient way to remember where you put something

    作为记住放置位置的便捷方式
  3. To make sure the thing you received is the thing you wanted

    确保您收到的东西是您想要的东西

That’s super confusing but bear with me.

那真是令人困惑,但请忍受我。

这个怎么运作 (How it works)

Hashing is otherwise described as doing a non-reversible operation on a thing that turns it into a completely different thing but would turn into the same thing if you did it again with the same input.

否则,散列被描述为对某个事物执行不可逆的操作,这会将其转变为完全不同的事物,但是如果您使用相同的输入再次进行操作,则将变为相同的事物。

It’s a bit like hard boiled eggs. You can’t un-boil an egg, but you know what you’ll get out if you put a raw egg in some boiling water for 6-8 minutes. In much the same way, you can’t un-hash something.

有点像煮鸡蛋。 您不能将鸡蛋煮沸 ,但是您知道将生鸡蛋放在沸水中6-8分钟会得到什么。 以几乎相同的方式,您不能取消哈希处理

哈希什么意思_哈希什么_第1张图片
Photo by Jason Leung on Unsplash 照片由 Jason Leung在 Unsplash上 拍摄

Here’s a sample of a very simple “hashing function” for integers: it divides a number by 10, then takes the remainder:

这是一个非常简单的用于整数的“哈希函数”的示例:它将数字除以10,然后取余数:

modulo10 (egg) {
return egg % 10
}

If egg=55 it will give me 5, but I have no way of turning 5 back into 55. For modulo10(), the numbers 9, 23950829, 309 and 29 will all turn up 9. We have an infinite number of values that could have gone through that hashing function and returned the same thing.

如果egg=55 ,它将给我5 ,但是我无法将5 55 。 对于modulo10()数字92395082930929将全部转起来9 。 我们可以通过该哈希函数并返回相同内容的值是无限的。

When two things have the same hash, it’s called a collision. In a cryptographic hashing function, it should be very improbable for two values to have the same hash.

当两个事物具有相同的哈希值时,称为冲突。 在密码哈希函数中,两个值具有相同的哈希应该非常不可能。

There are two types of hashing functions which are used for different things. Fast ones and slow ones.

有两种类型的哈希函数用于不同的事物。 快的和慢的。

The fast ones are used for when you don’t care if anyone knows that 5 came from a 25. They’re used in a few data structures where you need to look stuff up really fast. An example is a hash table which is pretty neat (I wrote a whole article about them). Fast hashes are also used for verifying data integrity.

如果您不在乎是否有人知道5来自25,则使用快速的那些。它们用于一些数据结构中,您需要真正快速地查找它们。 一个例子是一个非常整洁的哈希表 (我写了一篇关于它们的整篇文章 )。 快速哈希还用于验证数据完整性。

数据的完整性 (Data Integrity)

Lets say I torrent a piece of software, the ISO for a Linux distro for example. I might be unsure if what I got is what I meant to download. I could have missed a piece in transit, it could be an older version, or someone may have tampered with it. Lucky for me, I can go to an authority and find a checksum which I can compare my file against. A checksum is the value the developers got when they hashed the file they released. Since I have the ability to hash the file I got in the exact same way and compare the two values, I can verify that I have the correct file.

可以说我洪流了一个软件, 例如 Linux发行版的ISO。 我可能不确定我所得到的是我要下载的内容。 我可能会错过运输中的作品,它可能是较旧的版本,或者有人对其进行了篡改。 对我来说幸运的是,我可以去找一个权威机构,然后找到一个可以与我的文件进行比较的校验和 。 校验和是开发人员对发布的文件进行哈希处理时获得的值。 由于我可以用完全相同的方式对获得的文件进行哈希处理并比较两个值,因此可以验证我是否具有正确的文件。

You can also use slow hashes for data integrity but it’s not a huge deal if you used something too fast like MD5.

您也可以使用慢速散列来确保数据完整性,但是如果您使用的速度太快(例如MD5),这并不是什么大问题。

密码 (Passwords)

Slow hashes are for when you need to keep whatever you hashed a secret. Because they’re slow and take a lot of computing power, they’re harder to ‘crack’ or figure out what their original value was. Slow hashes are perfect for passwords. This is why we talk about ‘cracking passwords’.

慢散列用于需要将散列的内容保密的情况。 由于它们运行缓慢且需要大量计算能力,因此很难“破解”或弄清楚它们的原始价值是什么。 慢散列非常适合密码。 这就是为什么我们谈论“破解密码”的原因。

On some sites, when you enter a password, the site matches what you typed in with what it has on the server. However, it doesn’t actually know what your password is. When you sign up, the site generates a bit of random data (a salt), tacks it on to the password you chose, and puts it through a hashing function. It then stores the result of that hash and the salt it used.

在某些站点上,当您输入密码时,该站点将您输入的内容与服务器上的内容进行匹配。 但是, 它实际上并不知道您的密码是 。 当您注册时,该站点会生成一些随机数据(盐),将其附加到您选择的密码上,并通过哈希函数进行处理。 然后,它存储该哈希的结果及其使用的盐。

When you want to use your password to log in again, it grabs the salt (which is usually kept in the same place as the password hash), does the same process again, and then compares the two results.

当您想使用密码再次登录时,它会抢占先机(通常与密码哈希保存在同一位置),再次执行相同的过程,然后比较两个结果。

如何破解密码 (How to crack passwords)

Remember, since it’s impossible to know for 100% sure what the original value of a hash was, we have to use our best guess. Most of the time, this involves using a list of common passwords and trying each of them against each hash. To do that you have to compute each one, so the slower the hash, the more expensive it will be for a hacker to guess passwords.

请记住,由于不可能100%知道哈希的原始值是什么,因此我们必须使用最佳猜测。 在大多数情况下,这涉及使用常见密码列表,并针对每个哈希尝试对每个密码进行尝试。 为此,您必须计算每个密码,因此哈希值越慢,黑客猜测密码的成本就越高。

A salt is important too.

盐也很重要。

Lets say User1 and User2 both used pa$$word as their passwords. The MD5 hash for pa$$word is A61A78E492EE60C63ED8F2BB3A6A0072. Hackers already know what the hashes for all the top passwords are. In fact, you can even look up MD5 hashes on sites like crackstation.net. Additionally, if a password is less common, they can guess it once and then compromise the accounts of everyone else who used that password.

假设User1和User2都使用了pa$$word作为密码。 pa$$word的MD5哈希为A61A78E492EE60C63ED8F2BB3A6A0072 。 黑客已经知道所有顶级密码的哈希值是什么。 实际上,您甚至可以在crackstation.net等网站上查找MD5哈希。 此外,如果密码不太常见,他们可以猜测一次,然后破坏使用该密码的其他所有人的帐户。

If I add a salt, then the hashes will be different. For example, using usernames as a salt (just an example, not a good idea in practice):

如果我加盐,则哈希值将有所不同。 例如,使用用户名作为补充(仅作为示例,实际上不是一个好主意):

user1.pa$$word = 8CF41DEBA430F88EBC5DDA0936B3435B
user2.pa$$word = 5161758DEEF000FA5C190573574FAFB9 # <-- completely different hash

See? Completely different hashes. If we had used something other than MD5, those user accounts would be as safe as they can be (which is not very because ‘pa$$word’ is a terrible password).

看到? 完全不同的哈希。 如果我们使用的不是MD5,则这些用户帐户将尽可能安全(这不是非常安全,因为“ pa $$ word”是一个糟糕的密码)。

再见MD5 (Goodbye MD5)

I used a pretty bad example of a slow hash. MD5 was originally designed to be good enough to use on passwords, and it was — up until around 2005. Now it is considered broken and unsafe to use — mostly because it’s too fast. Computers have gotten more powerful so we need stronger encryption. Some better alternatives nowadays are bcrypt and PBKDF2.

我使用了一个很慢的哈希哈希示例。 MD5最初被设计为足以在密码上使用,直到2005年左右 。 现在,它被认为是损坏的并且使用不安全-主要是因为它太快了 。 计算机变得越来越强大,因此我们需要更强大的加密。 如今,一些更好的替代方法是bcrypt和PBKDF2 。

当技术发展太快时 (When Technology Moves Too Fast)

Unfortunately, MD5 is still widely used. If you look at HaveIBeenPwned.com and search for ‘MD5’, lots of results come up from sites that were hacked long after 2005. Why haven’t companies moved away from this highly insecure method?

不幸的是,MD5仍被广泛使用。 如果您查看HaveIBeenPwned.com并搜索“ MD5”,则很多结果来自于2005年以后被黑客入侵的网站。为什么公司没有放弃这种高度不安全的方法?

Part of the problem is that overhauling software, much like cracking secure passwords, can be time consuming and expensive. The other problem is the nature of hashing itself.

问题的一部分是,大修软件(类似于破解安全密码)可能既耗时又昂贵。 另一个问题是散列本身的性质。

If you don’t actually know what anyone’s password is, you can’t just change the hashing method. Since you can’t turn a hash back into a password, you definitely can’t turn a hash into a different hash that works for the same password.

如果您实际上不知道任何人的密码,那么就不能仅更改哈希方法。 由于您不能将哈希转换回密码,因此您绝对不能将哈希转换为适用于相同密码的其他哈希。

The best method to deal with this is to send out an email and force everyone to change their passwords. Users really hate this, so many companies have opted to re-hash passwords the next time the user logs in, but still support the old method until every password has been replaced. That’s why you’ll see MD5 on some sites which also used another method.

解决此问题的最佳方法是发送电子邮件,并强迫每个人更改密码。 用户真的很讨厌这一点,因此许多公司在用户下次登录时选择重新哈希密码,但是仍然支持旧方法,直到替换了每个密码为止。 这就是为什么您会在某些使用其他方法的站点上看到MD5的原因。

散列不是什么 (What Hashing Isn’t)

Encoding and encryption are two things that may be confused for hashing. They all have one thing in common: they turn data into other data that looks different to a human.

编码和加密是哈希可能混淆的两件事。 它们都有一个共同点:将数据转换为看起来与人类不同的其他数据。

正在加密 (Encrypting)

Encryption is different from hashing because it allows you to turn encrypted data back into what it was originally: to decrypt it. To do this you need a special key.

加密与哈希处理不同,因为它使您可以将加密的数据恢复为原始数据: 解密 。 为此,您需要一个特殊的密钥。

Sometimes you might hear bloggers or tech writers say “passwords are encrypted”, this is not technically the case. Passwords should always be ‘hashed’ with one exception: when they are in transit between your keyboard and the program that hashes them.

有时您可能会听到博客作者或技术作家说“密码已加密”,从技术上讲并非如此。 密码应始终“散列”,但有一个例外:在键盘和对其进行哈希处理的程序之间传递密码时。

编码方式 (Encoding)

Students and novice programmers often confuse encoding for hashing or encrypting. This is not good because encoding, like encryption allows you to turn encoded data back into its original form — except you don’t need a key to do it at all. Anyone can decode encoded data provided they know what encoding it currently uses and originally used. Encoding data does not protect it from being seen by prying eyes.

学生和新手程序员经常混淆编码以进行哈希或加密。 这是不好的,因为像加密一样的编码允许您将编码的数据转换回其原始形式-除非您根本不需要密钥即可执行该操作。 任何人都可以解码编码的数据,前提是他们知道当前使用和最初使用的编码。 对数据进行编码并不能防止被窥视。

An example is JWTs: JSON Web Tokens.

一个示例是JWT:JSON Web令牌。

An example JWT looks like the following: not legible to a human unless you can convert Base64 in your head (I doubt anyone could do that for a string this long).

JWT的示例如下所示:除非您可以在自己的脑海中转换Base64(否则我怀疑有人可以这么长时间使用一个字符串),否则人类将难以理解。

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

JWTs are pretty cool! However, students and newbies often look at them and think the data is secret because they can’t read it. In reality, JWTs are Base64Url encoded, not hashed or encrypted. This means anyone can read the first and second parts of them (in fact there’s a handy tool for it, try it out). The signature at the end is proof that it really came from where it claims to have come from. You can encrypt JWTs if you want, but they are readable by default.

JWT非常酷! 但是,学生和新手经常看着他们,认为数据是秘密的,因为他们看不懂。 实际上,JWT是Base64Url编码的,不是散列或加密的。 这意味着任何人都可以阅读其中的第一部分和第二部分(实际上有一个方便的工具 ,可以尝试一下)。 最后的签名证明它确实来自它声称的来源。 您可以根据需要加密JWT,但默认情况下它们是可读的。

Does this mean JWTs are insecure? No! This is by design. Just don’t put anything you don’t want the end user or a hacker to see in one.

这是否意味着JWT不安全? 没有! 这是设计使然。 只是不要放入您不希望最终用户或黑客看到的任何东西。

摘要 (Summary)

Hashing is pretty cool. You can use it to:

哈希很酷。 您可以使用它来:

  1. make hash tables that can store data in a way that makes it fast to retrieve

    制作可以以快速检索方式存储数据的哈希表
  2. store passwords in a way that keeps them super secret

    以使密码超级机密的方式存储密码
  3. verify the integrity of data in case it was corrupted in transit or tampered with

    验证数据的完整性,以防数据在传输中被破坏或被篡改
  4. A whole bunch of other stuff I didn’t cover.

    一大堆我没有讲的东西。

Hashing is not the same as encoding or encrypting and it’s important to understand the difference between these.

哈希与编码或加密不同,重要的是要了解它们之间的区别。

翻译自: https://medium.com/@jasminedevv/hashing-whats-it-for-fb0340c3330c

哈希什么意思

你可能感兴趣的:(哈希表,python)