使用google的web服务实现搜索
Response.Write(Page.IsValid);
string queryStr = HttpUtility.UrlEncode(txtSearch.Text);
Response.Redirect("http://www.google.com/search?q=" + queryStr);
作者:Scott Mitchell
介绍
你知道google提供了一个web服务让你可以使用他的数据库去搜索吗?让你使用他
的数据库去找回已经被收藏的web页面吗?让你使用他的数据库去执行拼写检查吗?使
用google的web服务你可以在你自己的站点轻松的实现像google那样的搜索。下个月我
将写两到三篇的文章去说明如何利用google的web服务,这里我们先来看看如何利用
google的数据库去实现搜索的功能吧!
来自google的许可证
google的web服务接口目前还处在测试阶段,仅供个人使用。为了
限制限制过度的使用这个接口,google需要那些想去使用他的人可以得到
一个唯一的许可(这个许可是免费的).这个许可是为了限制个人使用者每天
的使用量不要超过1000次。请务必要认真的去读这个许可证。
web服务的快速入门
Web服务是一个额外的接口,这个接口由某个网站提供,可以被其他的网站调用。
你可以把web服务想象成一个自我包含的组件,同时带有一种或多种的调用方法。他
可以驻扎在Inetnet的任何地方。通过他所提供的调用方法而被世界任何一个地方的客户
端所调用。例如,google的web服务就提供了三种方法:doGoogleSearch(),
doGetCachedPage(), 和 doSpellingSuggestion().*The doGoogleSearch()这个方法
将返回和你输入的查询字符相符合的结果。然后这个方法返回一个GoogleSearchResult类的实例,这个实例中包含了搜索的结果。
Web服务是建立在开放的协议和标准上的。例如一个想要消费web服务的客户和
Web服务本身的通讯就是通过HTTP这样一个众所周知的开放协议来进行的。而参数和
返回值之间来来回回的传送也是通过SOAP这样一个众所周知的数据引导和排序协议来
进行的。这样一来,web服务就可以完全暴露了,打个比方来说:基于微软的IIS配置的
服务器上的由ASP.net所写的web页就可以被基于Apache培植的服务器上的PHP程序所
消费,甚至可以作为完全的桌面应用程序来使用。
当消费一个web服务的时候,通常都是先建立一个代理,这个代理将保护客户端免
于对web服务的复杂的请求。代理类是一个类,他本身包含了web服务的所有暴露出来
的方法和对象。当一个客户端程序发出请求时,这些方法将控制和引导这些参数进入
SOAP,然后通过HTTP协议传送这些SOAP请求,接着接受来自web服务的响应,再次
的引导和控制这些返回值。代理类允许客户端程序调用web服务就像是在调用自己本地
的组件一样。
如果你不是很熟悉web服务,那么这个入门指导将是一个很好的引导,但是你还是
要抽出一定的时间去读一读建立web服务和建立并消费web服务这两篇文章。
Google web服务的API
Google Web服务的信息你可以在http://www.google.com/apis/找到。在开始使用
Google的web服务之前,你需要先去下载 Google Web API Developer's Kit.这个666K
的文件包含了完整的WSDL文件用以描述这个web服务,还包含一些使用Java,VB.net
和C#开发的例子。下载完了Google的web服务开发工具包以后,你还需要在Google
申请一个帐号。这个帐号可以在
https://www.google.com/accounts/NewAccount?continue=http://api.google.com/createkey&followup=http://api.google.com/createkey这个地址去做。
一旦这个免费帐号被建立,那么你就会被分配给一个唯一的许可号。以后每当Google
web服务被调用的时候,这个帐号都必须 被使用,这个帐号的作用就是限制一个号每天
调用web服务的次数不可以超过1000次。
建立代理类
一旦你拥有了帐号和google接口开发工具包,那么接下来你就需要去建立用来调用
web服务的代理类了。在完成这些以前,我们需要先在WSDL文件中得到帮助,这是
一个XML格式的文件用来去描述google的web服务所提供的服务种类。这个WSDL
文件GoogleSearch.wsdl可以在google接口开发工具包中找到。
如果你使用Visual Studio.NET,那么你可以拷贝这个文件到存放ASP.NET文件的
文件夹中(如:C:/Inetpub/wwwroot/WebApplication1). 然后去项目菜单,选择
Add Web Reference选项。接下来在对话栏里输入WSDL文件的URL即可。就像这样
http://localhost/WebApplication1/GoogleSearch.wsdl。之后再点击
Add Reference按钮就可以完成这整个的过程了。这将建立一个代理类用localhost名字
空间(如果你喜欢,你可以自己改一个名字)。
如果你没有Visual Studio.net那么你可以建立代理类通过一个命令行程序去调用
wsdl.exe。wsdl.exe文件将建立一个C#或者是VB.NET文件,当然你要编译和运行
wsdl.exe文件。在命令行中直接输入如下的即可。
wsdl /protocol:SOAP /namespace:google /out:GoogleProxy.cs
C:/google/GoogleSearch.wsdl
这样将使用google名字空间去建立一个名字为GoogleProxy.cs的C#文件。使用C#
命令行编译器,csc,去编译这个类,如下:
csc /t:library /out:GoogleProxy.dll GoogleProxy.cs
这将建立一个名字叫GoogleProxy.dll的文件。一定要把这个文件拷贝到你的web应用
程序的 /bin 目录下。
关于wsdl.exe的更多的信息
如果没有Visual Studio.NET,那么要建立一个代理类需要更多的
信息,一定要读一下PowerPoint的介绍:从一个ASP.NET 的web页面
去调用web服务。
建立一个ASP.NET的web页面去调用google的web服务
既然已经建立了代理类,那么建立一个ASP.NET去调用web服务就是小事一桩了。
首先在做这件事情之前。我们需要先检查一下web服务需要那些参数。幸运的是,这些
参数都被详细的列在了Google站点的参考部分。在这里我们既然是经由google的
web服务去执行一个搜索,那么就让我们来检查一下doGoogleSearch()方法的参数。
这一方法的十个参数如下:
key:
由google提供的,必须要使用这个键值去访问google的服务,google使用这个键值起到鉴定和*记入日志的作用
q:
(看Query Terms 去查询详细的查询语法)
start:
第一个结果的首指针
maxResults :
每一个想得到的查询结果的数字。每个查询的最大值是10(注:如果你的查询没有多少匹配的话,那么得到的结果可能要少于你的提问)
filter:
激活和未激活的自动过滤,这将隐藏非常相似的结果,而这些结果都来自相同的主机。过滤器的只要作用是想去提高google的最终用户使用google时的体验,但是从你的站点应用考虑,你最好还是放弃使用它(参见Automatic Filtering的详细信息)。
restricts:
限制搜索web站点的网页索引下的子集,比如一个国家像“乌克兰”,或者是一个主题像“linux”。(参见Restricts 的详细信息)
safeSearch :
一个布尔值,在搜索结果中可以过滤掉成人内容。(参见SafeSearch的详细信息)
lr:
语言限制-限制搜索文档中含有一种或多种语言
ie:
输入代码用的,这个参数不常用,所有输入代码都应该使用UTF-8格式(参见input和output代码的详细信息)
oe:
输出代码用的,这个参数不常用,所有输出代码都应该使用UTF-8格式(参见input和output代码的详细信息)
DoGoogleSearch()方法将返回一个GoogleSearchResult对象的实例。这个对象有
一个resultElements属性,他们是一系列ResultElement对象的实例。每一个
ResultElement对象都有一个*数字属性。比如 title,snippet, URL,summary等等。
现在让我们去建立一个简单的ASP.NET web页去显示输入查询字-asp时的前十个
结果。这个使用如下的代码就可以完成。
Font-Name=”Verdana” Font-Size=”10pt”>
<%# Container.DataItem.title %>
<%# Container.DataItem.summary %>
[”>
<%# Container.DataItem.URL %>
]
[查看演示实例]
以上代码的黑体部分是调用google web服务dogooglesearch()方法所必须的。
代码如此的少是由于代理类。搜索结果用DataList来显示,每个结果都将显示标题,
概要和通往这个页面的URL。
上面的演示实例将图示google的web服务如何去执行一个搜索。但是功能确是
非常有限的,因为它仅仅显示预先定义的搜索查询的前十个结果。在接下来的Part2
部分我们将看看如何使用一个ASP.NET去建立一个更加实用的web页去使用google的
web服务。
读Part2
使用google的web服务实现搜索(2)
上一个实例演示图示了如何调用google的web服务去实现一个搜索。但是它的功能
确实是非常有限的,它仅仅可以显示限定的查询字的前十个搜索结果。在第二部分,我们
将看看如何建立一个功能类似google的搜索引擎,可以建立一个页面让用户输入关键字
,再由另一个页面返回搜索结果。
建立一个功能更加强大搜索引擎
为了建立一个通过google提供的搜索接口web服务的功能更加强大的搜索引擎,先
让我们通过ASP.NET来建立一个可以分页并且可以让用户自己输入搜索条件的web页。
完成这个的一种方法就是模仿google自己的方法,这样也就意味着将搜索条件和搜索
的页数同时放在querystring环境变量里。比方说,如果用户想要查询字”ASP”而且希望前
十到二十个记录是可见的,那么这个URL请求将是:
http://www.yourserver.com/Search.aspx?q=ASP&first=10&last=20
其实还有其他的方法可以实现这个功能的。另一个方法就是使用postback方式。
Postback方式传送自己到ASP.NET中,所以它比querystring方式可以传送更多的
东西。然而querystring有一个好处那就是用户可以使用特别的查询(对于postback方式,
它是经由HTTP报头去传送数据的,但是querystring在搜索或者标页的时候是不可以
改变的)。
尽管querystring方式有这种标记的优点,但是我还是决定 去实现这个实例演示用
postback方式。如果你喜欢,你完全可以使用querystring方式。Postback方式的原代码
如下:
[参看实例演示]
以上代码的主要功能子程序是DisplaySearchResults(),它将调用web服务,把结果
绑定到DataList中去,并且它还将显示各色各样的信息,比如估计匹配数,查询要求
运行的时间等等。这个子程序还将决定以前的LinkButton是激活还是不激活。
还有一点要注意的就是,当调用google搜索的web服务时,我们必须规定起始的
索引和在一页中将看见多少结果。那就是说,为了看见一个搜索的前十个记录,我们
应当设定0作为开始标记,而将10作为返回的记录数。为了看见接下来的十个记录,
我们最好将10作为开始标记(让10作为返回的记录数)。需要注意的就是,ViewState
被用来去维持开始页的标记数。每一页要显示的记录数被常数PAGE_SIZE所表示。
考虑到分页,将有两个LinkButton被使用,当点击它们的时候,nextRecs和
prevRecs事件将被触发。这些事件仅仅更新可以看见的开始记录数并且调用
DisplaySearchResults().
结论
在这篇文章中我们探究了如何去调用google搜索的web服务。为了调用这个web
服务,我们首先下载了downloading the Google Web API Developer's Kit,接下来我们
在google站里建立了一个帐号可以得到一个许可证。这些都做完以后,我们建立了一个
基于google web服务WSDL文件(GoogleSearch.wsdl, 被包含在被下载的Developer's Kit
里)的代理类。建立了代理类以后,我们仅仅就需要几行简单的ASP.NET代码就可以去调用这个
web服务了。
编程快乐!!!:)
Scott Mitchell
Searching Google Using the Google Web Service
[日期:2003-10-06] |
来源:4guysfromrolla 作者:Scott Mitchell |
[字体:大 中 小] |
Introduction
Did you know that Google provides a Web service for searching through Google's database, retrieving cached versions of Web pages, and performing spelling checks? Using Google's Web service you can provide Google's search functionality on your own Web site. Over the next month or so I plan on authoring two to three articles describing how to utilize Google's Web services. In this first article, we'll look at how to use the Web service to search through Google's database.
Licensing Terms of the Google Web Service |
The Google Web Service API is currently in Beta testing, and is only available for personal use. To limit excessive use, Google requires that those who wish to use the Google Web service acquire a unique license key (which is free to obtain). This license key is used to limit individuals to no more than 1,000 calls to the Google Web service per day. Please be sure to read the license terms. |
A Quick Primer on Web Services
A Web service is an external interface provided by a Web site that can be called from other Web sites. Think of Web services as a self-contained component with one or more methods. This component can reside anywhere on the Internet, and can have its methods invoked by remote clients. For example, the Google Web service provides three methods: doGoogleSearch()
, doGetCachedPage()
, and doSpellingSuggestion()
. The doGoogleSearch()
, which we'll be examining in this article, has a number of input parameters that specify the search query. The method then returns an instance of the GoogleSearchResult
object, which has the results of the search.
Web services are built on open protocols and standards. For example, the communication between a client that wishes to consume a Web service, and the Web service itself, happens over HTTP, a well-known, open protocol. The parameters and return values being passed back and forth are packaged using SOAP, a well-known, open protocol for data-marshalling. The relevant point here is that Web services can be exposed on, say, a Microsoft IIS Web server and be consumed by PHP Web pages running on Apache, by ASP.NET Web pages running on IIS 6.0, or even by a desktop application.
When consuming a Web service, typically a proxy class is created to shield the client from the complexity involved in invoking the Web service. A proxy class is a class that itself contains all of the methods and objects that the Web service exposes. These methods, when called from the client program, handle the marshalling of the parameters into SOAP, sending the SOAP request over HTTP, receiving the response from the Web service, and unmarshalling the return value. The proxy class allows the client program to call a Web service as if the Web service was a local component.
If you are unfamiliar with Web services, this primer serves as a good introduction, but you should definitely take the time to read Creating a Web Service and then Creating and Consuming a Web Service.
The Google Web Service API
The Google Web Service information can be found online at http://www.google.com/apis/. To start using the Google Web Service you will first need to download the Google Web API Developer's Kit. This 666K file includes the WSDL (Web Service Description Language) file that fully describes the Web service, and examples of accessing the Google Web Service in both Java and VB.NET/C#.
After downloading the Google Web API Developer's Kit, you will need to create an account with Google. This can be done at: https://www.google.com/accounts/NewAccount?continue=http://api.google.com/createkey. Once you create one of these free accounts, you will be assigned a unique license number. This license number must be used whenever a Google Web service method is called. The purpose of this license is to limit the number of calls to the Google Web service to 1,000 invocations per license key per day.
Creating the Proxy Class
Once you have a license key and the Google API Developer's Kit, the next step is to create the proxy class that we'll use to call the Web service. To accomplish this, we first need to get our hands on the WSDL file, which is an XML-formatted file that describes the services provided by the Google Web service. This WSDL file, GoogleSearch.wsdl
is located in the Google Web API Developer's Kit.
If you are using Visual Studio .NET, copy this file to the ASP.NET Web directory (like C:/Inetpub/wwwroot/WebApplication1
). Then, in Visual Studio .NET, go to the Project menu and select the Add Web Reference option. Then, in the dialog box, enter the URL to the WSDL file, which will look like: http://localhost/WebApplication1/GoogleSearch.wsdl
(see the screenshot to the right). To complete the process, click the Add Reference button. This will create the proxy class using the namespace localhost
(which you can change if you like).
If you do not have Visual Studio .NET, you can create the proxy class through a command-line program called wsdl.exe
. Wsdl.exe
will create a C# or VB.NET file, which you'll then need to compile. To run wsdl.exe
, drop to the command-line and enter:
wsdl /protocol:SOAP /namespace:google /out:GoogleProxy.cs C:/google/GoogleSearch.wsdl
This will create a C# file named GoogleProxy.cs
with the namespace google
. To compile this class, use the C# command-line compiler, csc
, like so:
csc /t:library /out:GoogleProxy.dll GoogleProxy.cs
This will create a file named GoogleProxy.dll
. Be sure to copy this file to your Web application's /bin
directory!
For More Information on Wsdl.exe |
For more information on creating a proxy class without using Visual Studio .NET, be sure to read the PowerPoint presentation: Calling a Web Service from an ASP.NET Web Page. |
Creating an ASP.NET Web Page that Calls the Google Web Service
Now that we have created the proxy class, calling the Google Web Service through an ASP.NET Web page is a breeze. Before we examine how, precisely, to do this, we need to first examine what parameters the Web service methods expect. Fortunately, these methods and their input parameters are detailed in the reference section on Google's Web site. Since, in this article, we'll focus on simply performing a search via the Google Web services, let's examine the parameters for the doGoogleSearch()
method.
This method takes in 10 parameters:
Name
|
Description
|
key
|
Provided by Google, this is required for you to access the Google service. Google uses the key for authentication and logging. |
q
|
(See Query Terms section for details on query syntax.) |
start
|
Zero-based index of the first desired result. |
maxResults
|
Number of results desired per query. The maximum value per query is 10. Note: If you do a query that doesn't have many matches, the actual number of results you get may be smaller than what you request. |
filter
|
Activates or deactivates automatic results filtering, which hides very similar results and results that all come from the same Web host. Filtering tends to improve the end user experience on Google, but for your application you may prefer to turn it off. (See Automatic Filtering section for more details.) |
restricts
|
Restricts the search to a subset of the Google Web index, such as a country like "Ukraine" or a topic like "Linux." (See Restricts for more details.) |
safeSearch
|
A Boolean value which enables filtering of adult content in the search results. See SafeSearch for more details. |
lr
|
Language Restrict - Restricts the search to documents within one or more languages. |
ie
|
Input Encoding - this parameter has been deprecated and is ignored. All requests to the APIs should be made with UTF-8 encoding. (See Input and Output Encodings section for details.) |
oe
|
Output Encoding - this parameter has been deprecated and is ignored. All requests to the APIs should be made with UTF-8 encoding. (See Input and Output Encodings for details.) |
The doGoogleSearch()
method returns an instance of the GoogleSearchResult
object. This object has a resultElements
property, which is an array of ResultElement
objects. Each ResultElement
object has a number of properties, such as title
, snippet
, URL
, summary
, and so on.
Now, let's create a simple ASP.NET Web page that will display the first 10 search results for the search query ASP
. This can be accomplished using the following code:
[ View a Live Demo!]
The bolded text shows the code necessary to call the Google Web service's doGoogleSearch()
method. Such little code is needed thanks to the proxy class. The search results are displayed in a DataList, with each result displaying the title, summary, and the URL to access the page.
While the previous live demo illustrates how to call the Google Web service to perform a search, it is fairly limited in that it only displays the first 10 records of a predefined search query. In Part 2 we'll see how to create a more useful ASP.NET Web page that employs the Google search Web service.
While the previous live demo illustrates how to call the Google Web service to perform a search, it is fairly limited in that it only displays the first 10 records of a predefined search query. In this second part we'll examine how to build a "pseudo Google" search engine, by creating a page that the user can enter a search query for and page through the search results.
Building a More Functional Search Engine
In order to create a more functional search through Google's Web service search API, let's create an ASP.NET Web page that allows the user to input the search term and provides pagination through the data. One way to accomplish this would be to mimic Google's own approach, meaning that search terms and page numbers would be placed in the querystring. That is, if the user searched for "ASP" and was viewing records 10 through 20, the URL requested might be:
http://www.yourserver.com/Search.aspx?q=ASP&first=10&last=20
Or something to that effect. Another option is to use postback forms. The postback approach lends itself to ASP.NET moreso than the querystring approach. However, the querystring approach has the benefit that a user can bookmark a particular search query (note that with the postback form, the postback occurs via the HTTP POST headers, meaning the actual querystring does not change when searching or paging through the search results).
Despite the querystring approach's bookmarking advantage, I decided to implement this live demo using the postback approach. You are encouraged to implement the querystring approach if you so wish. The source code for the postback approach can be seen below:
[ View a Live Demo!]
The main workhorse subroutine in the above code listing is DisplaySearchResults()
, which makes the Web service call, binds the results to the DataList, and displays miscellaneous information, such as the estimated number of matches found, the time to run the query, etc. This subroutine also determines whether or not the Prev. LinkButton should be enabled or not.
Realize that when calling the Google search Web service, we must specify the starting result index and how many results we want to see in the page. That is, to view the first 10 records of a search, we would pass in 0 as the starting index and 10 as the number of records to return. To view the next 10 records, we'd simply pass in 10 as the starting index (leaving 10 as the number of records to return). Notice that the ViewState
is used to maintain what the starting index number. The number of records to display per page is denoted by the constant PAGE_SIZE
.
To allow for pagination, two LinkButtons are used, which, when clicked, cause the nextRecs
and prevRecs
event handlers to fire. These event handlers simply update the starting record number to view and then call DisplaySearchResults()
.
Conclusion
In this article we saw how to call the Google search Web service. To use the Google Web services, we started by downloading the Google Web API Developer's Kit and then creating an account to obtain a license key. Following that, we created a proxy class based on the Google Web service's WSDL file (GoogleSearch.wsdl
, which is included in the Developer's Kit download). Armed with this proxy class, we could then access the Web service with just a few lines of code from our ASP.NET Web page.
Happy Programming!
As of December 5, 2006, we are no longer issuing new API keys for the SOAP Search API. Developers with existing SOAP Search API keys will not be affected.
Google SOAP Search API ReferenceOverview
1.
1.1 Search Requests
1.2 Cache Requests
1.3 Spelling Requests
2. Search Request Format
2.1 Search Parameters
2.2 Query Terms
2.3 Automatic Filtering
2.4 Restricts
2.5 Input and Output Encoding
2.6 SafeSearch
2.7 Limitations
3. Search Results Format
3.1 Search Response
3.2 Result Element
3.3 Directory Category
This document explains in detail the semantics of the function calls you can make using the Google SOAP Search API service. In this document, you will learn:
- How Google's query syntax works. How to restrict your queries to portions of Google's index, such as a particular language or country.
- How to interpret the search results information sent back by the Google SOAP Search API service.
You may also find the following files from the Google SOAP Search API developer kit to be helpful:
- GoogleSearch.wsdl - WSDL description for Google SOAP Search API SOAP interface. soap-samples/ - example SOAP messages and responses.
- javadoc/index.html - javadoc for the example Java libraries.
For comments or questions, please use the Google SOAP Search API discussion group.
1.1 Search Requests |
Back to top |
Search requests submit a query string and a set of parameters to the Google SOAP Search API service and receive in return a set of search results. Search results are derived from Google's index of billions of web pages.
The details of the interactions involved with search requests are covered in the Search Request Format and Search Results Format sections of this document.
1.2 Cache Requests |
Back to top |
Cache requests submit a URL to the Google SOAP Search API service and receive in return the contents of the URL when Google's crawlers last visited the page (if available).
Please note that Google is not affiliated with the authors of cached pages nor responsible for their content.
The return type for cached pages is base64 encoded text.
1.3 Spelling Requests |
Back to top |
Spelling requests submit a query to the Google SOAP Search API service and receive in return a suggested spell correction for the query (if available). Spell corrections mimic the same behavior as found on Google's Web site.
Spelling requests are subject to the same query string limitations as any other search request. (The input string is limited to 2048 bytes and 10 individual words.)
The return type for spelling requests is a text string.
|
2. Search Request Format |
Back to top |
2.1 Search Parameters |
Back to top |
This table lists all the valid name-value pairs that can be used in a search request and describes how these parameters will modify the search results.
Name
|
Description
|
key
|
Provided by Google, this is required for you to access the Google service. Google uses the key for authentication and logging. |
q
|
(See Query Terms section for details on query syntax.) |
start
|
Zero-based index of the first desired result. |
maxResults
|
Number of results desired per query. The maximum value per query is 10. Note: If you do a query that doesn't have many matches, the actual number of results you get may be smaller than what you request. |
filter
|
Activates or deactivates automatic results filtering, which hides very similar results and results that all come from the same Web host. Filtering tends to improve the end user experience on Google, but for your application you may prefer to turn it off. (See Automatic Filtering section for more details.) |
restricts
|
Restricts the search to a subset of the Google Web index, such as a country like "Ukraine" or a topic like "Linux." (See Restricts for more details.) |
safeSearch
|
A Boolean value which enables filtering of adult content in the search results. See SafeSearch for more details. |
lr
|
Language Restrict - Restricts the search to documents within one or more languages. |
ie
|
Input Encoding - this parameter has been deprecated and is ignored. All requests to the API should be made with UTF-8 encoding. (See Input and Output Encodings section for details.) |
oe
|
Output Encoding - this parameter has been deprecated and is ignored. All requests to the API should be made with UTF-8 encoding. (See Input and Output Encodings for details.) |
2.2 Query Terms - |
Back to top |
Default Search
By default, Google searches for all of your search terms, as well as for relevant variations of the terms you've entered. There is no need to include "AND" between terms. Keep in mind that the order of the terms in the query will affect the search results.
Stop Words
Google ignores common words and characters such as "where" and "how," as well as certain single digits and single letters. Common words that are ignored are known as stop words. However, you can prevent Google from ignoring stop words by enclosing them in quotes, such as in the phrase "to be or not to be".
Special Characters
By default, all non-alphanumeric characters that are included in a search query are treated as word separators. The only exceptions are the following: double quote mark ( " ), plus sign ( + ), minus sign or hyphen ( - ), and ampersand ( & ). The ampersand character ( & ) is treated as another character in the query term in which it is included, while the remaining exception characters correspond to search features listed in the section below.
Special Query Terms
Google supports the use of several special query terms that allow the user or search administrator to access additional capabilities of the Google search engine.ign in front of it.
Special Query Capability
|
Example Query
|
Description
|
Include Query Term |
Star Wars Episode +I |
If a common word is essential to getting the results you want, you can include it by putting a "+" sign in front of it. |
Exclude Query Term |
bass -music |
You can exclude a word from your search by putting a minus sign ("-") immediately in front of the term you want to exclude from the search results. |
Phrase Search |
"yellow pages" |
Search for complete phrases by enclosing them in quotation marks or connecting them with hyphens. Words marked in this way will appear together in all results exactly as entered. Note: You may need to use a "+" to force inclusion of common words in a phrase. |
Boolean OR Search |
vacation london OR paris |
Google search supports the Boolean I operator. To retrieve pages that include either word A or word B, use an uppercase OR between terms. |
Site Restricted Search |
admission site:www.stanford.edu |
If you know the specific web site you want to search but aren't sure where the information is located within that site, you can use Google to search only within a specific web site. Do this by entering your query followed by the string "site:" followed by the host name. Note: The exclusion operator ("-") can be applied to this query term to remove a web site from consideration in the search. Note: Only one site: term per query is supported. |
Date Restricted Search |
Star Wars daterange:2452122-2452234 |
If you want to limit your results to documents that were published within a specific date range, then you can use the "daterange:" query term to accomplish this. The "daterange:" query term must be in the following format:
daterange:-
where
= Julian date indicating the start of the date range = Julian date indicating the end of the date range
The Julian date is calculated by the number of days since January 1, 4713 BC. For example, the Julian date for August 1, 2001 is 2452122. |
Title Search (term) |
intitle:Google search |
If you prepend "intitle:" to a query term, Google search restricts the results to documents containing that word in the title. Note there can be no space between the "intitle:" and the following word. Note: Putting "intitle:" in front of every word in your query is equivalent to putting "allintitle:" at the front of your query. |
Title Search (all) |
allintitle: Google search |
Starting a query with the term "allintitle:" restricts the results to those with all of the query words in the title. |
URL Search (term) |
inurl:Google search |
If you prepend "inurl:" to a query term, Google search restricts the results to documents containing that word in the result URL. Note there can be no space between the "inurl:" and the following word. Note: "inurl:" works only on words , not URL components. In particular, it ignores punctuation and uses only the first word following the "inurl:" operator. To find multiple words in a result URL, use the "inurl:" operator for each word. Note: Putting "inurl:" in front of every word in your query is equivalent to putting "allinurl:" at the front of your query. |
URL Search (all) |
allinurl: Google search |
Starting a query with the term "allinurl:" restricts the results to those with all of the query words in the result URL. Note: "allinurl:" works only on words, not URL components. In particular, it ignores punctuation. Thus, "allinurl: foo/bar" restricts the results to pages with the words "foo" and "bar"" in the URL, but does not require that they be separated by a slash within that URL, that they be adjacent, or that they be in that particular word order. There is currently no way to enforce these constraints. |
Text Only Search (all) |
allintext: Google search |
Starting a query with the term "allintext:" restricts the results to those with all of the query words in only the body text, ignoring link, URL, and title matches. |
Links Only Search (all) |
allinlinks: Google search |
Starting a query with the term "allinlinks:" restricts the results to those with all of the query words in the URL links on the page. |
File Type Filtering |
Google filetype:doc OR filetype:pdf |
The query prefix "filetype:" filters the results returned to include only documents with the extension specified immediately after. Note there can be no space between "filetype:" and the specified extension. Note: Multiple file types can be included in a filtered search by adding more "filetype:" terms to the search query. |
File Type Exclusion |
Google -filetype:doc -filetype:pdf |
The query prefix "-filetype:" filters the results to exclude documents with the extension specified immediately after. Note there can be no space between "-filetype:" and the specified extension. Note: Multiple file types can be excluded in a filtered search by adding more "-filetype:" terms to the search query. |
Web Document Info |
info:www.google.com |
The query prefix "info:" returns a single result for the specified URL if it exists in the index. Note: No other query terms can be specified when using this special query term. |
Back Links |
link:www.google.com |
The query prefix "link:" lists web pages that have links to the specified web page. Note there can be no space between "link:" and the web page URL. Note: No other query terms can be specified when using this special query term. |
Related Links |
related:www.google.com |
The query prefix "related:" lists web pages that are similar to the specified web page. Note there can be no space between "related:" and the web page URL. Note: No other query terms can be specified when using this special query term. |
Cached Results Page |
cache:www.google.com web |
The query prefix "cache:" returns the cached HTML version of the specified web document that the Google search crawled. Note there can be no space between "cache:" and the web page URL. If you include other words in the query, Google will highlight those words within the cached document. |
2.3 Automatic Filtering - |
Back to top |
The parameter causes Google to filter out some of the results for a given search. This is done to enhance the user experience on Google.com, but for your application, you may prefer to turn filtering off in order to get the full set of search results.
When enabled, filtering takes the following actions:
- Near-Duplicate Content Filter = If multiple search results contain identical titles and snippets, then only one of the documents is returned.
- Host Crowding = If multiple results come from the same Web host, then only the first two are returned.
2.4 Restricts - |
Back to top |
Google provides the ability to search a predefined subset of Google's web index. This is enabled by using the lr and restrict parameters.
- language restrict
To search for documents within a particular language, use the parameter, using one of the values in the table below.
Language
|
value
|
Arabic |
lang_ar |
Chinese (S) |
lang_zh-CN |
Chinese (T) |
lang_zh-TW |
Czech |
lang_cs |
Danish |
lang_da |
Dutch |
lang_nl |
English |
lang_en |
Estonian |
lang_et |
Finnish |
lang_fi |
French |
lang_fr |
German |
lang_de |
Greek |
lang_el |
Hebrew |
lang_iw |
Hungarian |
lang_hu |
|
Language
|
value
|
Icelandic |
lang_is |
Italian |
lang_it |
Japanese |
lang_ja |
Korean |
lang_ko |
Latvian |
lang_lv |
Lithuanian |
lang_lt |
Norwegian |
lang_no |
Portuguese |
lang_pt |
Polish |
lang_pl |
Romanian |
lang_ro |
Russian |
lang_ru |
Spanish |
lang_es |
Swedish |
lang_sv |
Turkish |
lang_tr |
|
- Country and Topic Restricts
Google allows you to search for Web information within one or more countries, using an algorithm that considers the top level domain name of the server and the geographic location of the server IP address.
The automatic country sub-collections currently supported are listed below:
Country
|
value AD-CL
|
Andorra |
countryAD |
United Arab Emirates |
countryAE |
Afghanistan |
countryAF |
Antigua and Barbuda |
countryAG |
Anguilla |
countryAI |
Albania |
countryAL |
Armenia |
countryAM |
Netherlands Antilles |
countryAN |
Angola |
countryAO |
Antarctica |
countryAQ |
Argentina |
countryAR |
American Samoa |
countryAS |
Austria |
countryAT |
Australia |
countryAU |
Aruba |
countryAW |
Azerbaijan |
countryAZ |
Bosnia and Herzegowina |
countryBA |
Barbados |
countryBB |
Bangladesh |
countryBD |
Belgium |
countryBE |
Burkina Faso |
countryBF |
Bulgaria |
countryBG |
Bahrain |
countryBH |
Burundi |
countryBI |
Benin |
countryBJ |
Bermuda |
countryBM |
Brunei Darussalam |
countryBN |
Bolivia |
countryBO |
Brazil |
countryBR |
Bahamas |
countryBS |
Bhutan |
countryBT |
Bouvet Island |
countryBV |
Botswana |
countryBW |
Belarus |
countryBY |
Belize |
countryBZ |
Canada |
countryCA |
Cocos (Keeling) Islands |
countryCC |
Congo, The Democratic Republic of the |
countryCD |
Central African Republic |
countryCF |
Congo |
countryCG |
Burundi |
countryBI |
Benin |
countryBJ |
Bermuda |
countryBM |
Brunei Darussalam |
countryBN |
Bolivia |
countryBO |
Brazil |
countryBR |
Bahamas |
countryBS |
Bhutan |
countryBT |
Bouvet Island |
countryBV |
Botswana |
countryBW |
Belarus |
countryBY |
Belize |
countryBZ |
Canada |
countryCA |
Cocos (Keeling) Islands |
countryCC |
Congo, The Democratic Republic of the |
countryCD |
Central African Republic |
countryCF |
Congo |
countryCG |
Switzerland |
countryCH |
Cote D'ivoire |
countryCI |
Cook Islands |
countryCK |
Chile |
countryCL |
|
Country
|
value CM-JO
|
Cameroon |
countryCM |
China |
countryCN |
Colombia |
countryCO |
Costa Rica |
countryCR |
Cuba |
countryCU |
Cape Verde |
countryCV |
Christmas Island |
countryCX |
Cyprus |
countryCY |
Czech Republic |
countryCZ |
Germany |
countryDE |
Djibouti |
countryDJ |
Denmark |
countryDK |
Dominica |
countryDM |
Dominican Republic |
countryDO |
Algeria |
countryDZ |
Ecuador |
countryEC |
Estonia |
countryEE |
Egypt |
countryEG |
Western Sahara |
countryEH |
Eritrea |
countryER |
Spain |
countryES |
Ethiopia |
countryET |
European Union |
countryEU |
Finland |
countryFI |
Fiji |
countryFJ |
Falkland Islands (Malvinas) |
countryFK |
Micronesia, Federated States of |
countryFM |
Faroe Islands |
countryFO |
France |
countryFR |
France, Metropolitan |
countryFX |
Gabon |
countryGA |
United Kingdom |
countryUK |
Grenada |
countryGD |
Georgia |
countryGE |
French Quiana |
countryGF |
Ghana |
countryGH |
Gibraltar |
countryGI |
Greenland |
countryGL |
Gambia |
countryGM |
Guinea |
countryGN |
Guadeloupe |
countryGP |
Equatorial Guinea |
countryGQ |
Greece |
countryGR |
South Georgia and the South Sandwich Islands |
countryGS |
Guatemala |
countryGT |
Guam |
countryGU |
Guinea-Bissau |
countryGW |
Guyana |
countryGY |
Hong Kong |
countryHK |
Heard and Mc Donald Islands |
countryHM |
Honduras |
countryHN |
Croatia (local name: Hrvatska) |
countryHR |
Haiti |
countryHT |
Hungary |
countryHU |
Indonesia |
countryID |
Ireland |
countryIE |
Israel |
countryIL |
India |
countryIN |
British Indian Ocean Territory |
countryIO |
Iraq |
countryIQ |
Iran (Islamic Republic of) |
countryIR |
Iceland |
countryIS |
Italy |
countryIT |
Jamaica |
countryJM |
Jordan |
countryJO |
|
Country
|
value JP-PS
|
Japan |
countryJP |
Kenya |
countryKE |
Kyrgyzstan |
countryKG |
Cambodia |
countryKH |
Kiribati |
countryKI |
Comoros |
countryKM |
Saint Kitts and Nevis |
countryKN |
Korea, Democratic People's Republic of |
countryKP |
Korea, Republic of |
countryKR |
Kuwait |
countryKW |
Cayman Islands |
countryKY |
Kazakhstan |
countryKZ |
Lao People's Democratic Republic |
countryLA |
Lebanon |
countryLB |
Saint Lucia |
countryLC |
Liechtenstein |
countryLI |
Sri Lanka |
countryLK |
Liberia |
countryLR |
Lesotho |
countryLS |
Lithuania |
countryLT |
Luxembourg |
countryLU |
Latvia |
countryLV |
Libyan Arab Jamahiriya |
countryLY |
Morocco |
countryMA |
Monaco |
countryMC |
Moldova |
countryMD |
Madagascar |
countryMG |
Marshall Islands |
countryMH |
Macedonia, The Former Yugoslav Republic of |
countryMK |
Mali |
countryML |
Myanmar |
countryMM |
Mongolia |
countryMN |
Macau |
countryMO |
Northern Mariana Islands |
countryMP |
Martinique |
countryMQ |
Mauritania |
countryMR |
Montserrat |
countryMS |
Malta |
countryMT |
Mauritius |
countryMU |
Maldives |
countryMV |
Malawi |
countryMW |
Mexico |
countryMX |
Malaysia |
countryMY |
Mozambique |
countryMZ |
Namibia |
countryNA |
New Caledonia |
countryNC |
Niger |
countryNE |
Norfolk Island |
countryNF |
Nigeria |
countryNG |
Nicaragua |
countryNI |
Netherlands |
countryNL |
Norway |
countryNO |
Nepal |
countryNP |
Nauru |
countryNR |
Niue |
countryNU |
New Zealand |
countryNZ |
Oman |
countryOM |
Panama |
countryPA |
Peru |
countryPE |
French Polynesia |
countryPF |
Papua New Guinea |
countryPG |
Philippines |
countryPH |
Pakistan |
countryPK |
Poland |
countryPL |
St. Pierre and Miquelon |
countryPM |
Pitcairn |
countryPN |
Puerto Rico |
countryPR |
Palestine |
countryPS |
|
Country
|
value PT-ZR
|
Portugal |
countryPT |
Palau |
countryPW |
Paraguay |
countryPY |
Qatar |
countryQA |
Reunion |
countryRE |
Romania |
countryRO |
Russian Federation |
countryRU |
Rwanda |
countryRW |
Saudi Arabia |
countrySA |
Solomon Islands |
countrySB |
Seychelles |
countrySC |
Sudan |
countrySD |
Sweden |
countrySE |
Singapore |
countrySG |
St. Helena |
countrySH |
Slovenia |
countrySI |
Svalbard and Jan Mayen Islands |
countrySJ |
Slovakia (Slovak Republic) |
countrySK |
Sierra Leone |
countrySL |
San Marino |
countrySM |
Senegal |
countrySN |
Somalia |
countrySO |
Suriname |
countrySR |
Sao Tome and Principe |
countryST |
El Salvador |
countrySV |
Syria |
countrySY |
Swaziland |
countrySZ |
Turks and Caicos Islands |
countryTC |
Chad |
countryTD |
French Southern Territories |
countryTF |
Togo |
countryTG |
Thailand |
countryTH |
Tajikistan |
countryTJ |
Tokelau |
countryTK |
Turkmenistan |
countryTM |
Tunisia |
countryTN |
Tonga |
countryTO |
East Timor |
countryTP |
Turkey |
countryTR |
Trinidad and Tobago |
countryTT |
Tuvalu |
countryTV |
Taiwan |
countryTW |
Tanzania |
countryTZ |
Ukraine |
countryUA |
Uganda |
countryUG |
United States Minor Outlying Islands |
countryUM |
United States |
countryUS |
Uruguay |
countryUY |
Uzbekistan |
countryUZ |
Holy See (Vatican City State) |
countryVA |
Saint Vincent and the Grenadines |
countryVC |
Venezuela |
countryVE |
Virgin Islands (British) |
countryVG |
Virgin Islands (U.S.) |
countryVI |
Vietnam |
countryVN |
Vanuatu |
countryVU |
Wallis and Futuna Islands |
countryWF |
Samoa |
countryWS |
Yemen |
countryYE |
Mayotte |
countryYT |
Yugoslavia |
countryYU |
South Africa |
countryZA |
Zambia |
countryZM |
Zaire |
countryZR |
|
Google also has four topic restricts:
Topic
|
value
|
US. Government |
unclesam |
Linux |
linux |
Macintosh |
mac |
FreeBSD |
bsd |
|
Combining the and parameters: Search requests which use the lr and restrict parameters support the Boolean operators identified in the table below (in order of precedence).
Note: If both lr and restrict parameters are used in a search request, the sub-collection strings will be combined together using "AND" logic.
Boolean Operator
|
Sample Usage
|
Description
|
Boolean NOT [ - ] |
-lang_fr |
Removes all results which are defined as part of the sub-collection immediately following the "-" operator. The example restrict value would remove all results in French. |
Boolean AND [ . ] |
linux.countryFR |
Returns results which are in the intersection of the results returned by the sub-collection to either side of the "." operator. The example restrict value would return all results which are from both the "linux" subtopic and identified as being located in France. |
Boolean OR [ | ] |
lang_en|lang_fr |
Returns results which are in either of the results returned by the sub-collection to either side of the "|" operator. The example restrict value would return all results matching the query that are in either the French or English sub-collections. |
Parentheses [ ( ) ] |
(linux).(-(conutryUK|countryUS)) |
All terms within the innermost set of parentheses in a sub-collection string will be evaluated before terms outside the parentheses are evaluated. Use parentheses to adjust the order of term evaluation. The example restrict value would return all results in the "linux" custom sub-collection that are not in either the United States or United Kingdom sub-collections. |
Note: Spaces are not valid characters in the restrict parameter.
2.5 Input and Output Encodings - , |
Back to top |
In order to support searching documents in multiple languages and character encodings the Google SOAP Search API performs all requests and responses in the UTF-8 encoding. The parameters and are required in client requests but their values are ignored. Clients should encode all request data in UTF-8 and should expect results to be in UTF-8.
2.6 SafeSearch - |
Back to top |
Many Google users prefer not to have adult sites included in their search results. Google's SafeSearch feature screens for sites that contain this type of information and eliminates them from search results. While no filter is 100% accurate, Google's filter uses advanced proprietary technology that checks keywords and phrases, URLs, and Open Directory categories. If you have SafeSearch activated and still find websites containing offensive content in your results, please contact us and we'll investigate it.
2.7 Limitations |
Back to top |
There are some important limitations you should be aware of. Some of these are because Google's infrastructure is currently optimized for end users. However, in the future we hope to vastly increase the limits for Google SOAP Search API developers.
Component
|
Limit
|
Search request |
length 2048 bytes |
Maximum number of words in the query |
10 |
Maximum number of site: terms in the query |
1 (per search request) |
Maximum number of results per query |
10 |
Maximum value of + |
1000 |
|
|
3. Search Results Format |
Back to top |
3.1 Search Response |
Back to top |
Each time you issue a search request to the Google service, a response is returned back to you. This section describes the meanings of the values returned to you.
- A Boolean value indicating whether filtering was performed on the search results. This will be "true" only if (a) you requested filtering and (b) filtering actually occurred.
- A text string intended for display to an end user. One of the most common messages found here is a note that "stop words" were removed from the search automatically. (This happens for very common words such as "and" and "as.")
- The estimated total number of results that exist for the query. Note: The estimated number may be either higher or lower than the actual number of results that exist.
- A Boolean value indicating that the estimate is actually the exact value.
- An array of items. This corresponds to the actual list of search results.
- This is the value of for the search request.
- Indicates the index (1-based) of the first search result in .
- Indicates the index (1-based) of the last search result in .
- A text string intended for display to the end user. It provides instructive suggestions on how to use Google.
- An array of items. This corresponds to the ODP directory matches for this search.
- Text, floating-point number indicating the total server time to return the search results, measured in seconds.
3.2 Result Element |
Back to top |
- If the search result has a listing in the ODP directory, the ODP summary appears here as a text string.
- The URL of the search result, returned as text, with an absolute URL path.
- A text excerpt from the results page that shows the query in context as it appears on the matching results page. This is formatted HTML and usually includes tags within it. Query terms will be highlighted in bold in the results, and line breaks will be included for proper text wrapping. If Google searched for stemmed variants of the query terms using its proprietary technology, those terms will also be highlighted in bold in the snippet. Note that the query term does not always appear in the snippet. - The title of the search result, returned as HTML.
- Text (Integer + "k"). Indicates that a cached version of the is available; size is indicated in kilobytes.
- Boolean indicating that the "related:" query term is supported for this URL.
- When filtering occurs, a maximum of two results from any given host is returned. When this occurs, the second resultElement that comes from that host contains the host name in this parameter.
- See below.
- If the URL for this resultElement is contained in the ODP directory, the title that appears in the directory appears here as a text string. Note that the directoryTitle may be different from the URL's .
3.3 Directory Category |
Back to top |
- Text, containing the ODP directory name for the current ODP category.
- Specifies the encoding scheme of the directory information.