要求:
爬取网页你好,蜘蛛侠!中的Python之禅中英文版本,并且打印。
你好,蜘蛛侠!是一个动态网页,URL:
https://localprod.pandateacher.com/python-manuscript/hello-spiderman/
题目要求:
需要通过两种方法获取:
只使用selenium
selenium与BeautifulSoup配合
方法一:使用selenium
from selenium import webdriver
import time
driver=webdriver.Chrome()
driver.get('https://localprod.pandateacher.com/python-manuscript/hello-spiderman/')
time.sleep(2)
teacher=driver.find_element_by_id('teacher')
teacher.send_keys('必须是吴枫呀')
assistant=driver.find_element_by_id('assistant')
assistant.send_keys('都喜欢')
button=driver.find_element_by_class_name('sub')
button.click()
time.sleep(2)
contents=driver.find_elements_by_class_name('content')
for content in contents:
print(content.find_element_by_tag_name('h1').text)
print(content.find_element_by_tag_name('p').text)
driver.close()
方法二:selenium与BeautifulSoup配合
from selenium import webdriver
import time
from bs4 import BeautifulSoup
driver=webdriver.Chrome()
driver.get('https://localprod.pandateacher.com/python-manuscript/hello-spiderman/')
time.sleep(2)
teacher=driver.find_element_by_id('teacher')
teacher.send_keys('必须是吴枫呀')
assistant=driver.find_element_by_id('assistant')
assistant.send_keys('都喜欢')
button=driver.find_element_by_class_name('sub')
button.click()
time.sleep(2)
#从这里开始获得网页源代码,然后再使用BeautifulSoup
pagesource=driver.page_source
soup=BeautifulSoup(pagesource,'html.parser')
contents=soup.find_all(class_='content')
for content in contents:
print(content.find('h1').text)
print(content.find('p').text)
driver.close()