简单爬取电影信息

运行环境:python3.5

工具:pycharm

新手一枚 ,话不多说上代码

    

import requests
from lxml import etree
import ssl

context =ssl._create_unverified_context()

url = 'https://www.dy2018.com/i/'
headers = {
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
}
page = 99400  #地址刷新根据这个数字变更
while page<99403:

    response = requests.get(url + str(page)+'.html',headers=headers,verify = False)
    print('1>>>')
    # response.encoding = 'gbk'
    response = response.content
    html = etree.HTML(response,parser=etree.HTMLParser(encoding='gbk'))#编码格式当时废了半天劲
    restul = html.xpath('//div[@id="Zoom"]//p/text()')
    for c in restul:
        # with open('./img/ha.txt','a') as f:
        #     # f.write(str(c.encode('utf-8'))+'\n')
        #     f.write(c +'\n')
        print(c)
    page+=1

你可能感兴趣的:(简单爬取电影信息)