Python爬取网页数据基本步骤及学习资料

Python爬取网页数据基本步骤:

1、获取数据:Requests、Urllib

2、解析数据:BeautifulSoup、XPath

3、保存数据:MongoDB、MySQL、SQLite、CSV、Excel ……



相关资料文档记录:

Awesome Python中文版整理https://github.com/jobbole/awesome-python-cn

selenium + python 中文文档: https://python-selenium-zh.readthedocs.io/zh_CN/latest/

Requests官方文档:http://cn.python-requests.org/zh_CN/latest/

快速上手Requests:http://docs.python-requests.org/zh_CN/latest/user/quickstart.html

Urllib官方文档: https://docs.python.org/3/library/urllib.html

Python官方文档: https://docs.python.org/3/library/index.html

笨方法学Python:https://www.kancloud.cn/kancloud/learn-python-hard-way/49863

Python 3 教程(RUNOOB.COM): http://www.runoob.com/python3/python3-tutorial.html

Python教程(廖雪峰官方网址):https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000

HTTP教程(RUNOOB.COM):http://www.runoob.com/http/http-tutorial.html

python之pip常用命令:https://blog.csdn.net/ouyanggengcheng/article/details/72821092

Xpath教程:http://www.w3school.com.cn/xpath/ 

爬虫入门到精通-网页的解析(xpath):https://zhuanlan.zhihu.com/p/25572729

Python爬虫利器三之Xpath语法与lxml库的用法:https://blog.csdn.net/freeking101/article/details/64461574

Python正则表达式:http://www.runoob.com/python/python-reg-expressions.html

正则表达式30分钟入门:http://deerchao.net/tutorials/regex/regex.htm 

Beautiful Soup 中文教程: http://www.pythonclub.org/modules/beautifulsoup/start

Beautiful Soup 4.2.0 文档: https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

Windows系统 MongoDB 各个64位版本下载地址: http://dl.mongodb.org/dl/win32/x86_64

MONGODB MANUAL https://docs.mongodb.com/manual/tutorial/install-mongodb-on-windows/

echart:https://blog.csdn.net/coraline_m/article/details/51418263

            http://pyecharts.org/#/zh-cn/api

            http://echarts.baidu.com/index.html

numpy: http://www.numpy.org/

            https://www.yiibai.com/numpy/

matplotlib:http://python.jobbole.com/89077/

你可能感兴趣的:(Python爬取网页数据基本步骤及学习资料)