动态请求与静态请求

判定网页是否动静态请求大部分的网页是静态，先按静态处理，拿不到数据在考虑，是否是动态的

1.动态请求作业

1.http://top.baidu.com/buzz?b=1&fr=topindex

2.抓取百度热搜，字段title,url,crawled_time

3.把结果以一封邮件的形式发到自己的邮箱,相关库 smtplib

4.提交代码截图和运行效果截图

答：

1.获取字段位置

import requests,time

from lxmlimport etree

import smtplib#导入邮箱包

from email.mime.textimport MIMEText

from email.headerimport Header

url='http://top.baidu.com/buzz?b=1&fr=topindex'

r=requests.get(url)

r.encoding='gb2312'

selector=etree.HTML(r.text)

ls=selector.xpath('//a[@class="list-title"]')

# s=selector.cssselect('a.list-title')

# print(ls)

to_list=[]#设置一个列表

for elein ls:

title=ele.text

url=ele.get('href')

# print(title,url)

se_dict={}#设置一个字典

se_dict['title']=title

se_dict['url']=url

se_dict['crawled_time']=time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())

to_list.append(se_dict)#把列表添加到字典里面

# print(title,url,time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) )

#以html格式

msg_from ='[email protected]' # 发送方邮箱

passwd ='fffqwdyxuotoebfj' # 填入发送方邮箱的授权码

msg_to ='[email protected]' # 收件人邮箱

subject ="百度热搜风云榜" # 主题

Email_html =''

for index, itemin enumerate(to_list):

title = item['title']

url = item['url']

string ='''

%d.%s

''' % (index +1, url, title)

print(string)

Email_html += string

print('邮件字节数',len(Email_html))

msg = MIMEText(Email_html,'html','gbk')

msg['Subject'] = subject

msg['From'] = msg_from

# msg['To'] = msg_to

msg['To'] = msg_to

try:

s = smtplib.SMTP_SSL("smtp.qq.com",465)# 邮件服务器及端口号

s.login(msg_from, passwd)

s.sendmail(msg_from, msg_to, msg.as_string())

print("发送成功")

except s.SMTPExceptionas e:

print("发送失败")

finally:

s.quit()

获取qq邮箱授权码

1.进入邮箱点击设置

2.点击账户

3.点击生成授权码

4.按照步骤填写发送获取授权码

授权码就是你发送邮箱的密码

2.动态请求作业

1.https://movie.douban.com/subject/26266893/reviews?start=120

2.抓取全部豆瓣影评，字段：作者，影评内容，推荐星级，评价时间。有兴趣的同学可以加其他的字段3.

3.可能会遇到反爬，请大家注意爬取速度

4.提交代码截图和运行效果截图

1.douban user- agent

1.按时间排序，方便后期增量爬取

2.计算一下全部数量

douban

user- agent

0.按时间排序，方便后期增量爬取

1.计算一下全部数量

2.达到折叠区，换定位方式

3.封IP，手动打一下验证码，然后又可以跑好几百页，总共打四五遍打wheel就可以

取消评论里面的验证码

第一步配置好数据库

代码图

匹配到不是折贴页的码

动态请求与静态请求

你可能感兴趣的:(动态请求与静态请求)