python爬虫第9关项目python之禅

要求:
爬取网页你好,蜘蛛侠!中的Python之禅中英文版本,并且打印。
你好,蜘蛛侠!是一个动态网页,URL:
https://localprod.pandateacher.com/python-manuscript/hello-spiderman/

题目要求:
需要通过两种方法获取:
只使用selenium
selenium与BeautifulSoup配合

方法一:使用selenium

from selenium import webdriver
import time

driver=webdriver.Chrome()

driver.get('https://localprod.pandateacher.com/python-manuscript/hello-spiderman/')
time.sleep(2)

teacher=driver.find_element_by_id('teacher')
teacher.send_keys('必须是吴枫呀')

assistant=driver.find_element_by_id('assistant')
assistant.send_keys('都喜欢')

button=driver.find_element_by_class_name('sub')
button.click()
time.sleep(2)

contents=driver.find_elements_by_class_name('content')

for content in contents:
    print(content.find_element_by_tag_name('h1').text)
    print(content.find_element_by_tag_name('p').text)

driver.close()


方法二:selenium与BeautifulSoup配合

from selenium import webdriver
import time
from bs4 import BeautifulSoup

driver=webdriver.Chrome()

driver.get('https://localprod.pandateacher.com/python-manuscript/hello-spiderman/')
time.sleep(2)

teacher=driver.find_element_by_id('teacher')
teacher.send_keys('必须是吴枫呀')

assistant=driver.find_element_by_id('assistant')
assistant.send_keys('都喜欢')

button=driver.find_element_by_class_name('sub')
button.click()
time.sleep(2)

#从这里开始获得网页源代码,然后再使用BeautifulSoup
pagesource=driver.page_source
soup=BeautifulSoup(pagesource,'html.parser')
contents=soup.find_all(class_='content')

for content in contents:
    print(content.find('h1').text)
    print(content.find('p').text)

driver.close()


你可能感兴趣的:(python基础及爬虫)