Python应用系列(1),抓取aso100网站的app排名。
背景:要过年了,要做2016一年的判断,需要和同行业对比,判断趋势。
用途:根据aso100.com网站,抓取新分类下的app应用排名列表,导出到Excel文件。
说明:此段代码仅供学习交流,欢迎评论。
知识点:
1. BeautifulSoup,真心说好用。文档地址 https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html
2. csv读写,文档地址 https://docs.python.org/3.5/library/csv.html
3. 字符串操作 split
import requests
from bs4 import BeautifulSoup
newsurl='https://aso100.com/rank/index/country/cn/device/iphone/brand/free/genre/6009'
res=requests.get(newsurl)
res.encoding="utf-8"
soup =BeautifulSoup(res.text,"html.parser")
#print(soup.prettify())
import csv
with open('C:/xxx.csv', 'w', newline='') ascsvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['id','url','总排名','分类排名','app名称','公司'])
for link in soup.find_all('div',class_="thumbnail"):
total='-'
if(len(link.h6.next_sibling.next_sibling)>1):
total=link.h6.next_sibling.next_sibling.contents[1].text
id=link.a['href'].split('/')[4]
url='https://aso100.com'+ link.a['href']
spamwriter.writerow([id,url,total,''.join(link.a.h5.text.split('.')[0:1]),''.join(link.a.h5.text.split('.')[1:2]),link.a.h6.text])
print ('抓取完毕')
#查看
with open('C:/xxx.csv', newline='') ascsvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print(', '.join(row))
原文地址: http://blog.csdn.net/lanmao100/article/details/54025983
转载请注明。