Python爬虫笔记一(来自MOOC) Requests库入门

Python爬虫笔记一

通用代码框架:

import requests
def getHTMLText(url):
    try:
        r=requests.get(url,timeput=30)
        r.raise_for_status()#如果状态不是200,引发HTTPError异常
        r.encoding=r.apparemt_encoding
        return r.text
    except:
        return "产生异常"

if __name__=="__main__":
    url="http://www.baidu.com"
    print(getHTMLText(url))

例子都是这一周的内容的


提示:以下是代码和运行结果

1.京东商品页面的爬取

代码如下:

import requests
url="https://item.jd.com/2967929.html"
try:
    r = requests.get(url)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[:1000])
except:
    print ("爬取失败")

运行结果:
进程已结束,退出代码0

2.亚马逊商品页面的爬取

代码如下:

import requests
url="https://www.amazon.cn/gp/product/B01M8L5Z3Y"
try:
    kv={
   'user-agent':'Mozilla/5.0'}
    r = requests.get(url,headers=kv)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[1000:2000])
except:
    print("爬取失败")

运行结果:

  ue_sid = (document.cookie.match(/session-id=([0-9-]+)/) || [])[1],
        ue_sn = "opfcaptcha.amazon.cn",
        ue_id = 'FNY2VQ38P3R6JETHXGX2';
}
</script>
</head>
<body>

<!--
        To discuss automated access to Amazon data please contact api-services-support@amazon.com.
        For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com.cn/index.html/ref=rm_c_sv, or our Product Advertising API at https://associates.amazon.cn/gp/advertising/api/detail/main.html/ref=rm_c_ac for advertising use cases.
-->

<!--
Correios.DoNotSend
-<

你可能感兴趣的:(#,Python,python,pycharm,爬虫)