Python小试牛刀之爬取本地城市新房价格和地址

利用xpath解析数据,requests库爬取房价,大致步骤如下:

  1. 获得目标网址,并观察网址源码;
  2. UA伪装,请求并获得响应;
  3. 解析标签数据;
  4. 循环遍历提取解析到的数据,并保存下来。
import requests
from lxml import etree
import re
import os
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'
}
file_path = './python_learning/xiantao58.txt'
f = open(file_path,"w")
for page in range(1,3):
    url = 'https://xiantao.58.com/xinfang/loupan/all/p{0:d}/'.format(page)
    response = requests.get(url=url,headers=headers)
    response.encoding = "utf-8"
    page_text = response.text
    # print(page_text)
    tree = etree.HTML(page_text)
    div_list = tree.xpath('//div[@class= "key-list imglazyload"]/div')
    # print(house_name_div)
    for div in div_list:
        house_name = div.xpath('./div/a[@class="lp-name"]/span/text()')[0]
        # print(house_name)
        house_price = div.xpath('./a[@class="favor-pos"]/p/span/text()')[0]
        # print(house_price)
        house_address = div.xpath('./div/a[@class="address"]/span/text()')[0]
        address1 = house_address.replace('[','')
        address2 = address1.replace(']','')
        address3 = address2.replace('(','')
        address4 = address3.replace(')','')
        address = "".join(address4.split()[1:])
        # print(house_name,house_price,house_address,sep="\t")
        # house_address = house_address.encode('iso-8859-1').decode('gbk')
        f.write(house_name+"\t"+house_price+"\t"+address+"\n")
        # break
        print(house_name+"下载成功!!")
f.close()

你可能感兴趣的:(Python小试牛刀之爬取本地城市新房价格和地址)