web crawler library for static websites

What you need to crawling static websites.

python 3 library:

requests
connect to websites and working with URLs.

** beautiful soup**
a Python library for pulling data out of HTML and XML files. you can navigating the DOM tree or searching the DOM tree using css selector or re.

urllib
a package that collects several modules for working with URLs:
urllib.request
for opening and reading URLs
urllib.parse
for parsing URLs

import requests
from bs4 import BeautifulSoup as bs 
import urllib
import re
Get Started with my github guide:

https://github.com/HoweZZH/EasyScraping

References:

https://www.crummy.com/software/BeautifulSoup/
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
https://realpython.com/blog/python/web-scraping-with-scrapy-and-mongodb/

你可能感兴趣的:(web crawler library for static websites)