爬取菜鸟教程|菜鸟笔记,作为爬虫玩家,不想复制,但有需要,所以写来spider.

想要这些数据
爬取菜鸟教程|菜鸟笔记,作为爬虫玩家,不想复制,但有需要,所以写来spider._第1张图片
代码可直接运行,但是要先装包,最后将数据放到excel表格中了
爬取连接为https://www.runoob.com/python/python-exceptions.html

import requests as re
import pandas as pd
import bs4
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Cookie':'__guid=61023018.2220520734065574000.1561789449521.1106; _ga=GA1.2.691952474.1561789450; _gid=GA1.2.1913507903.1562568389; monitor_count=10; Hm_lvt_3eec0b7da6548cf07db3bc477ea905ee=1562730834,1562750530,1562752398,1562752508; Hm_lpvt_3eec0b7da6548cf07db3bc477ea905ee=1562752508',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.9',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Host':'www.runoob.com',
'Referer':'https://www.baidu.com/link?url=p9nOeNKSa-aZaI0Sf_fk9sYJ0nyIS0V4X3rdM2T2vxjObxbWIHy-Com3v5Nd3cR0eyuen9VK5yTiPoCiKdN7Oa&wd=&eqid=bf29f91c00044643000000025d25614c',
'Upgrade-Insecure-Requests':'1'

}
url1='https://www.runoob.com/python/python-exceptions.html'
a = re.get(url=url1).content.decode('utf-8')
# print(a)
html = bs4.BeautifulSoup(a,'lxml')
s =html.table.find_all('td')
lists = []
for i in s:
    for j in i:
        f=j.replace('\r\n',"")
        lists.append(f)
a1 =[]
a2 = []
for i in range(len(lists)):
    if i%2==0:
        a1.append(lists[i])
    else:
        a2.append(lists[i])
aa2 = pd.DataFrame(a2,a1)

aa2.to_excel(excel_writer=r'2.xlsx')

你可能感兴趣的:(爬虫)