python 爬手机号_Python爬虫实战笔记_2-2 爬取手机号

练习两层工作流

第一步,获取目标url存入数据库(mongoconn.py )

第二步,从数据库中读出url, 并从页面上提取目标信息(homework2_2.py )

源代码

mongoconn.py

#!usr/bin/env python

#_*_ coding: utf-8 _*_

#

# connect mongodb

import pymongo

def mongoset(db, table):

client = pymongo.MongoClient('localhost', 27017)

data = client[db]

sheet = data[table]

return sheet

def mongoinsert(table, data):

table.insert_many(data)

homework2_2.py

#!usr/bin/env python

#_*_ coding: utf-8 _*_

#

# 爬取手机号

# step1 get all urls, save them to db

# step2 get detail info by accessing those urls

from bs4 import BeautifulSoup

import requests

import time

from mongoconn import mongoset, mongoinsert

def get_soup(url):

source = requests.get(url)

你可能感兴趣的:(python,爬手机号)