Ruby1.9 利用GOOGLE BOT处理过多重定向

require 'net/http'
require 'uri'

$params = {
    'baseurl'   => 'http://www.XXX.com',
    'pageurl'   => '/XXX',
    'useragent' => 'GoogleBot', 
  } unless $params;  

def fetch(gameurl, limit = 10) # limit is up to you
  # You should choose a better exception.
  raise ArgumentError, 'too many HTTP redirects' if limit == 0
  #response = Net::HTTP.get_response(URI(uri_str)) 
  uri = URI(gameurl) #uri = URI.parse(gameurl) 
  http1 = Net::HTTP.new(uri.host, uri.port)   
  request1 = Net::HTTP::Get.new(uri.request_uri) 
  request1["User-Agent"] = $params['useragent']  
  request1["Accept"] = "*/*"    
  response = http1.request(request) 
  case response
  when Net::HTTPSuccess then
    return response
  when Net::HTTPRedirection then
    location = response['location']
    warn "redirected to #{location}"
    fetch(location, limit - 1)
  else
    response.value
  end
end

response=fetch($params['basicurl']+$params['pageurl'])
puts response.code

注释:
注释:
1 利用URI(gameurl)或者URI.parse(gameurl) 解析需要请求的页面
2 基于解析的uri建立http连接: http1 (new方法:如果不用会暂时关闭,后续使用会再次打开, 而start方法是一次性的,后面的代码块执行完之后就会关闭)
例如:

Net::HTTP.start(uri.host, uri.port) do |http|
  request = Net::HTTP::Get.new uri

  response = http.request request # Net::HTTPResponse object
end

3 创建requests内容
4 设置header参数: user-agent
5 基于已经建立的http链接:http1请求所需的内容 返回response

你可能感兴趣的:(Ruby,GEM,用法)