1.爬虫入门_爬取html网页

1.开发环境python2.7

2.爬取贴吧页面代码实现

# -*- coding:utf-8 -*-
"""
    爬取python贴吧网页
"""

# 引入需要的模块
import urllib2

# python吧第一页的url地址
url = "http://tieba.baidu.com/f?kw=download_file&ie=utf-8&pn=0 "

# 获取
response = urllib2.urlopen(url)

# 将获取到的内容赋值给content变量
content = response.read()
print content

with open("python_1.html", "w") as f:
    f.write(content)

你可能感兴趣的:(1.爬虫入门_爬取html网页)