selenium自动化测试工具python笔试面试项目实战7 书籍下载

下载smtebooks的IT书单

smtebooks是最好用的英文IT书籍网站,书籍涵盖Programming & IT, Business, Magazines, Ebooks, History, Medical, Art, Non-fiction, Academic, Textbooks, Cooking, SEO, Science & Math, Travel & Tourism等多个方面。

selenium自动化测试工具python笔试面试项目实战7 书籍下载_第1张图片
图片.png

现在请爬取https://smtebooks.net/category/programming-it的所有书籍。

  • 参考答案
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# 讨论钉钉免费群21745728 qq群144081101 567351477
# CreateDate: 2018-10-22
import time
import re
from selenium import webdriver
import pandas as pd

def find_address(browser):
    valids = []
    url_base = 'https://smtebooks.net/Category/programming-it?page='
    for page in range(1,3):
        time.sleep(2)
        browser.get(url_base + str(page))
        books = re.findall(r'(.*?)', browser.page_source)
        print(books)
        if not books:
            break        
        valids = valids + books
    return valids

browser = webdriver.Chrome()
results = find_address(browser)

df = pd.DataFrame(results)
df.to_csv('address.csv')
browser.quit()

该书籍列表比较大,已经下载好存储在:smtebooks IT类书籍列表 2018-10-22.csv

下载smtebooks的IT书

  • 基于上面下载的书单,下载smtebooks上的书籍
  • 参考答案
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# 讨论钉钉免费群21745728 qq群144081101 567351477
# CreateDate: 2018-10-20

from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def download(url,driver):
    driver.get(url)
    print(url)
    browser.find_element_by_link_text('Download Book').click()


output = r"d:\down"
options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images":2,
         "download.default_directory": output}
options.add_argument(r"user-data-dir=C:\Users\andrew\AppData\Local\Google\Chrome\User Data\Default")
options.add_experimental_option("prefs",prefs)
browser = webdriver.Chrome(chrome_options=options)
browser.maximize_window()
browser.implicitly_wait(25)

df = pd.read_csv('address2.csv', index_col=0)
print(df.head())

for i in range(len(df)):
    row = df.iloc[i]
    url = 'https://smtebooks.us' + row[0]
    url2 = 'https://smtebooks.net/getfile/' + row[0].split('/')[2]
    print(url)
    download(url2, browser)

注意:drive超过额度,文件过大、文件滥用等异常没有处理。

参考:

https://github.com/joseprrm/smtebooks-DownloadAll

你可能感兴趣的:(selenium自动化测试工具python笔试面试项目实战7 书籍下载)