网络爬虫-拉勾网,知网,QQ邮箱,糗事百科,腾讯招聘等,不定时更新

爬取拉勾网信息

import urllib.request
import jsonpath
import random
import json
headers_list = [{ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'},{'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11'}]
headers = random.choice(headers_list)
url='https://www.lagou.com/lbs/getAllCitySearchLabels.json'
request=urllib.request.Request(url,headers=headers)
response=urllib.request.urlopen(request)
html=response.read().decode('utf-8')
print(html)
jsonobj=json.loads(html)
city_list=jsonpath.jsonpath(jsonobj,'$..name')
file=open('city1.json','w')
content=json.dumps(city_list,ensure_ascii=False)
print(content)
file.write(content)
file.close()

网络爬虫-拉勾网,知网,QQ邮箱,糗事百科,腾讯招聘等,不定时更新_第1张图片

 

你可能感兴趣的:(python网络爬虫,python,网络爬虫)