python爬虫

python爬取链家上面的北京租房信息

1.导包:

import requests
from bs4 import BeautifulSoup

2.获取url页面下的内容,返回soup对象:

def get_page(url):
    responce = requests.get(url)
    soup = BeautifulSoup(responce.text,'lxml')
    return soup

3.封装成函数,作用是获取列表页下面的所有租房页面的链接,返回一个链接列表:

def get_links(link_url):
    soup = get_page(link_url)
    content_list = soup.find('div', class_="content__list")
    links_div = content_list.find_all('div', class_="content__list--item")
    links = []
    for div in links_div:
        tmp = div.find('a')
        if tmp != -1:
            pass
        links.append(div.a.get('href'))
    return links

4.使用get_links(llink_url)函数获取链家首页的所有租房页面的链接

url = 'https://bj.lianjia.com/zufang/'
get_links(url)

5.获取租房页面的房屋信息

house_url='https://bj.lianjia.com/zufang/BJ2360825321093611520.html?nav=0&unique_id=ee73be87-3abd-477e-af89-a3f320eda277zufang1602224217718'
soup=get_page(house_url)
price=int(soup.find('div',class_='content__aside--title').span.text)
good_house=soup.find('p',class_='content__aside--tags').text.replace('\n', '')
content__aside__list=soup.find('ul',class_='content__aside__list')
content__aside__list = content__aside__list.find_all('li')
house_way=content__aside__list[0].text[5:]
house_type=content__aside__list[1].text[5:]
house_floor=content__aside__list[2].text[5:][:-2]
riskwarning=content__aside__list[3].text[5:]
info = {
     
        '房屋价格':price,
        '必看好房':good_house,
        '租赁方式':house_way,
        '房屋类型':house_type,
        '朝向楼层':house_floor,
        '风险提示':riskwarning
}
info

你可能感兴趣的:(python爬虫,python)