使用Python爬取拉勾网职位写入Excel

不知不觉2018年已经过了快两个月了,马上春节了,小伙伴们是不是也和我一样,无心工作。。。年终啦,小伙伴们年终奖有没有拿到手软啊??哈哈哈。。。。 没拿到年终奖的小伙伴,也不要气馁,再接再厉,年后一波职位等你来战。。。。闲来无事,教大家使用Python爬取职位数据,实例爬取的是拉勾网杭州的Python职位数据,废话不多说,直接上代码
import requests     # 导入请求模块
import xlsxwriter
import time

headers = {
    'User-Agent':"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
    'Referer':"https://www.lagou.com/jobs/list_python?px=default&city=%E6%9D%AD%E5%B7%9E",
    'Cookie':'user_trace_token=20170814104005-8c79ba9a-88b2-49f0-b55a-84283730e16a; LGUID=20170814104015-e71d7b3c-8099-11e7-af6d-525400f775ce; _ga=GA1.2.1544472457.1502678407; _gid=GA1.2.413489957.1517997531; JSESSIONID=ABAAABAACEFAACG097350D52438FF8E73B79D2C95CD3671; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1517997531; index_location_city=%E5%85%A8%E5%9B%BD; isCloseNotice=0; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1518053302; LGRID=20180208092823-5a881396-0c6f-11e8-af9a-5254005c3644'
}

url = 'https://www.lagou.com/jobs/positionAjax.json?px=default''&city=%E6%9D%AD%E5%B7%9E&needAddtionalResult=false&isSchoolJob=0'

# 获取数据有多少页
def get_Job_page():
    res = requests.post(
        # 请求url
        url=url,
        headers = headers,
        data = {
        'first': 'false',
        'pn': 1,
        'kd': 'Python'
        }
    )
    result = res.json()     # 获取res中的json信息
    all_count = result['content']['positionResult']['totalCount']
    singe_pagecount = result['content']['positionResult']['resultSize']
    print(all_count,singe_pagecount)
    return int(all_count/singe_pagecount)+1

# 获取数据 列表展示的数据
def getJobList(page):
    res = requests.post(
        # 请求url
        url=url,
        headers = headers,
        data = {
        'first': 'false',
        'pn': page,
        'kd': 'Python'
        }
    )
    result = res.json()     # 获取res中的json信息
    jobsInfo = result['content']['positionResult']['result']
    return jobsInfo


workbook = xlsxwriter.Workbook('lagou.xlsx')
worksheet = workbook.add_worksheet()
worksheet.set_column('A:A',20)

def write_excel(row = 0,positionName='职位名',salary='薪水',city='工作地点',
                education='教育程度',workYear='工作经验',companyFullName='公司名'):
    worksheet.write(row,0,positionName)
    worksheet.write(row,1,salary)
    worksheet.write(row,2,city)
    worksheet.write(row,3,education)
    worksheet.write(row,4,workYear)
    worksheet.write(row,5,companyFullName)

write_excel(0)
row = 1
pages = get_Job_page()+1

# 1 延时处理请求
for page in range(1,pages):
    for job in getJobList(page=page):
        write_excel(row,positionName = job['positionName'],
                    salary = job['salary'],
                    city = job['city'],
                    education = job['education'],
                    workYear = job['workYear'],
                    companyFullName = job['companyFullName'])
        row += 1
    print('第%d页数据已经写入完毕'%page)
    time.sleep(0.5)


print('全部写入完毕')

workbook.close()

Excel展示爬取的数据结构


使用Python爬取拉勾网职位写入Excel_第1张图片
image.png

你可能感兴趣的:(使用Python爬取拉勾网职位写入Excel)