准备:在pycharm(python的开发环境,需下载)该项目下下载相应需要的包 代码有:
import re
from bs4 import BeautifulSoup
import urllib.request, urllib.error
import xlwt
import pymysql
1.定义爬取指定网页(按F12查看)的访问路径函数
def askURL(url):
headers = {
"User-Agent": "Mozilla / 5.0 (Windows NT 10.0; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 92.0.4515.159 Safari / 537.36"
}
request = urllib.request.Request(url, headers=headers)
html = ""
try:
response = urllib.request.urlopen(request)
html = response.read().decode("utf-8")
except Exception as e:
if hasattr(e, "code"):
print(e.code)
if hasattr(e, "reason"):
print(e.reason)
return html
2.定义获取数据,传入基本路径参数,在该路径下爬取需要的数据(即解析数据),这里获取了电影的
"电影详情链接", "图片链接", "影片中文名