本文章不允许转载,不允许散播,不允许用于任何商业目的,所有权利归于Chromium.org,解释权归于Chromium.org。
原作者: Ian Fette 产品经理
翻译者: [email protected]
Understanding Phishing and Malware Protection in Google Chrome
Friday, November 14, 2008
Google Chrome includes features to help protect users against phishing and malware attacks. If you have ever hit a red page with the title "Warning: Visiting this site may harm your computer!" (Such as our test page) or "Warning: Suspected phishing site! "then you have already seen these features in action. While we try to provide an explanation of what's happening on that warning page, a number of people have asked for more information about how this feature works, in terms of where the data behind those warnings come from, how that data gets to the computer, and what privacy implications the feature has.
了解Google Chrome中的反钓鱼和恶意软件拦截
Google Chrome拥有帮助用户反钓鱼和击退恶意软件攻击的功能. 如果你遇到一个红色的页面,并且标题是"警告:浏览该网站可能对你的计算机造成危害!"(例如我们的测试网页)或者是"警告:可能的网络钓鱼网站!",那么你已经看到了这些功能的运行.当我们试图解释这个警告页面出现时发生了什么,许多人问的更多的是这个机制是如何工作的,在这些警告出现时,隐藏在警告后面的数据是从哪里来的,这些数据是怎么到达用户电脑的,以及这个机制对隐私有什么样的影响.
Where does the phishing and malware data come from?
Google is constantly crawling and re-crawling the web, all the while finding new and changed websites. These websites are found by following links from other websites, crawling URLs submitted by webmasters and users, and so forth. Sometimes, during that process, we discover a website where something doesn't seem right. A website may look like a phishing website, designed to steal your personal information, or it may contain signs of potentially malicious activity that would install malware onto your computer without your consent. If we find a website that looks like it's a phishing page, it gets added to a list of suspected phishing websites. If we find a website that contains signs of potentially malicious activity, we start up a virtual machine, browse to that website, and watch what happens . If we see certain activities happen on that virtual machines (such as viruses being downloaded and installed), we add that website to a list of suspected malware-infected websites. The process for discovering suspected malware-infected websites is described in more detail in a paper written by Niels Provos and colleagues from Google's anti-malware team.
钓鱼和恶意软件数据来自哪里?
Google经常一次又一次地抓取网页,始终查找新的和变更的网站.这些网站是从其它网站的链接中,从网站管理员和用户提交的URL中,等等地方获取到的.有时候,在这个处理过程中,我们发现一个网站有时候看上去不太正常.一个网站可能看上去像一个钓鱼网站,用于偷取你的私人信息,或者可能包含标签,指明潜在的恶意活动,可能不经你的同意在你的电脑中安装恶意软件.如果我们发现一个网站看起来像,就判定它是一个钓鱼网站,它被添加到一个可疑钓鱼网站的列表中.如果我们发现一个网站包含了潜在恶意活动的标记,我们启动一个虚拟机,在里面浏览这个网站,观察会发生什么.如果我们在虚拟机上看到某些行为(例如下载病毒并安装),我们把这个网站添加到一个可疑的恶意软件感染的网站列表中.发现可疑恶意软件感染网站的过程在Niels Provos和Google反恶意软件小组写的论文中叙述得更加详细.
How does this data get to my computer?
If you have phishing and malware protection enabled, then Google Chrome will contact servers at Google within five minutes of startup, and approximately every half hour thereafter, to download updated lists of suspected phishing and malware websites. These lists are then stored on your computer, so that as you browse the web, each page can be checked against the list of suspected phishing and malware websites locally, without sending the address of each webpage you visit to Google. This is designed to offer both performance (by not having to wait on a round-trip request to Google's servers) and privacy (by not sending a record of your browsing session to Google).
As the lists are large (hundreds of thousands of entries), we looked for ways to reduce the amount of information that had to be sent to and stored on users' computers, to reduce the amount of bandwidth and storage space consumed. One way we achieve this is by using partial hashes of URLs in the lists downloaded by the computer. What this means is that rather sending down the full URL of each website, we do the following. First, we hash the URL using SHA-256. Then, we send add the first 32 bits of that 256-bit hash into the list of phishing or malware websites. Those lists of 32-bit hash prefixes are then downloaded by Google Chrome in the background as described earlier.
这些数据是怎么到达我的电脑的?
如果你启用了反钓鱼和反恶意软件的功能,Google Chrome将在启动后的五分钟内与服务器通信,在启动后的每大约半小时,下载最新的可疑的钓鱼和恶意软件网站列表.这些列表将存在你的电脑上,因此当你浏览网页时,每一个页面都在本地与可疑列表对比,而不需要将你浏览的每一个页面都发到Google。这个设计提高了性能(不需要发送请到到Google服务器)和保护隐私(不发送你浏览期间的记录到Google)。
因为列表很大(几十万条),我们寻找方法减少发送并保存在用户电脑上的信息量,减轻宽带负荷与磁盘空间消耗。一种方式是,我们使用URL的部分HASH值。这意味着,发送整个每个网站URL负荷的减轻。首先,我们使用SHA-256算法计算URL的HASH值,然后,我们发送256位中的前32位到可疑钓鱼和恶意软件列表中。这些32位HASH前缀的列表像所面所讲的,被Google Chrome在后台下载。
How is this data used, and what is sent back to Google?
When you browse the web using Google Chrome, the hash of each URL is computed, and the first 32 bits of that URL's hash is compared against the list of suspected phishing and malware websites. This includes the URL of the website you are visiting, as well as the URL of any included resources (such as included JavaScript or Adobe Flash movies). If the first 32 bits of the hash match an entry in the list, it is likely that the URL is on the list of suspected phishing or malware websites . At this point, we can only say likely, because there is still a reasonable chance of hash collisions in the 32-bit space - two distinct URLs with distinct 256-bit hashes where the first 32 bits of those hashes are the same. To confirm that the URL is suspected as a phishing or malware website, and not just a 32-bit hash collision, the 32-bit hash is sent to Google. Google then returns the full 256-bit hashes suspected of being phishing or malware and starting with those 32 bits. The full 256-bit hash of the URL in question can then be compared against the 256-bit hash (es) returned by Google, to make a determination of whether in fact the URL in question is or is not on the list of suspected phishing or malware websites. Using this scheme, Google Chrome is able to quickly check the website and its resources against a local database, and only sends information back to Google when the site matches an entry on the locally stored lists. In the case where information is sent to Google to verify such a suspicion, that information consists only of a part of the hash of a URL, not the URL itself. As such, Google never gets information that would definitively indicate whether a user has visited a particular website or not. The end result is a low-overhead efficient mechanism to help protect against phishing and malware, while also helping to protect users' privacy.
数据如何使用,什么被发回Google?
当你使用Google Chrome浏览网页时,每一个URL的HASH值被计算,然后这个URL的HASH值的前32位与可疑钓鱼与恶意软件列表对比。这包插了你浏览的网站的URL,也包括任何包含资源的URL(例如包含JavaScript或者是一个Adobe Flash影片)。如果前32位HASH值符合列表中的一项,那么很可能这个URL在钓鱼和恶意软件列表中。在是,我们只能说可能,因为有可能在32位HASH值上有冲突——两个完全不同的URL的256位HASH值的前32位HASH完全相同。为了确定这个URL怀疑是钓鱼或者恶意软件站点,32HASH值发往Google。 Google返回完整的256位HASH值。这个完整的256位HASH值可以与当前疑问中的URL的256位HASH值对比,来确定事实上当前这个疑问中的URL是否位于钓鱼或者恶意软件列表中。使用这种方法,Google Chrome可以快速地检查网站和它的资源,而且只有当网站符合本地列表条目的时候才发送信息到Google。在这种情况下,信息发往Google来检验嫌疑,信息只包括URL HASH的一部分,而不是URL本身。这样,Google永远不会获取判定用户是否浏览过的部分网站的信息。最终的结果是低消耗的机制来帮助反钓鱼和恶意软件 ,同样也保护了用户的隐私。
Posted by Ian Fette, Product Manager