Python爬虫刷CSDN博客文章阅读量

一段有趣的Python代码,刷CSDN访问量的小爬虫,由于点击间隔时间太短不会计入访问量,所以设置为每60S访问一次。

代码中的文章地址为:https://blog.csdn.net/weixin_37228152/article/details/100753730

有兴趣的同学可以对照着将下面的url与code换成自己的博客文章地址(也可以直接跑我的代码帮我刷刷+_+

(题外话:一分钟点击一次,一个钟才能点60次,算了下阅读量过万的话要刷七天七夜-。-

运行图如下:

Python爬虫刷CSDN博客文章阅读量_第1张图片

代码如下:

import re
import time
import random
import requests
import urllib.request
from bs4 import BeautifulSoup
 
firefoxHead = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0"}
IPRegular = r"(([1-9]?\d|1\d{2}|2[0-4]\d|25[0-5]).){3}([1-9]?\d|1\d{2}|2[0-4]\d|25[0-5])"
host = "https://blog.csdn.net"
url = "https://blog.csdn.net/weixin_37228152/article/details/{}"
code = ["100753730"]
 
def parseIPList(url="http://www.xicidaili.com/"):
	IPs = []
	request = urllib.request.Request(url,headers=firefoxHead)
	response = urllib.request.urlopen(request)
	soup = BeautifulSoup(response,"lxml")								 
	tds = soup.find_all("td")
	for td in tds:
		string = str(td.string)
		if re.search(IPRegular,string):
			IPs.append(string)
	return IPs
 
def PV(code):
	s = requests.Session()
	s.headers = firefoxHead
	count = 0
	while True:
		count += 1
		print("asking for {} times\t".format(count),end="\t")
		IPs = parseIPList()
		s.proxies = {"http":"{}:8080".format(IPs[random.randint(0,40)])}
		s.get(host)
		r = s.get(url.format(code))
		html = r.text
		soup = BeautifulSoup(html,"html.parser")
		spans = soup.find_all("span")
		print(spans[2].string)
		time.sleep(random.randint(60,75))	
 
def main():
	PV(code[0])
 
 
if __name__ == "__main__":
	main()

 

你可能感兴趣的:(小玩意)