又是一年一度的圣诞节了,在此祝大家圣诞节快乐!
首先要配置bs4,BeautifulSoup库和requests,代码如下,不同网站可能编码方式不同要调整,不然会导致中文出现乱码,这里以http://www.hengexing.com/z/80844.html为例
from bs4 import BeautifulSoup
import requests
import xlwt
#爬取一些短信祝福语
excelTabel= xlwt.Workbook()#创建excel对象
sheet1=excelTabel.add_sheet('平安夜短信祝福语')
nrows = 0
url=""
for num in range(1,6):
if num==1:
url = "http://www.hengexing.com/z/80844.html"
else:
url="http://www.hengexing.com/z/80844_%d.html" %num
r = requests.get(url)
#这里和网页的编码设置相关
r.encoding = 'gb2312'#解决乱码问题
soup = BeautifulSoup(r.text, 'html.parser')
listAA = soup.find_all("p")
for text in listAA:
print(text.getText())
sheet1.write(nrows,0,text.getText())
nrows+=1
excelTabel.save("平安夜祝福语.xls")