首先安装chromedriver
参考:https://blog.csdn.net/tymatlab/article/details/78649727
方法一:下载原始文件直接下载chromedriver
并添加路径
1.下载chromedriver
,查看chrome
浏览器版本为83
下载地址:https://npm.taobao.org/mirrors/chromedriver/83.0.4103.39/
$ mv /Users/chelsea/Downloads/chromedriver /usr/local/bin/
$ export PATH=$PATH:/usr/local/bin/chromedriver
需要注意,不同的driver支持的Chrome浏览器版本是不同的。而且直接从官方网站下载比较困难,推荐从
https://npm.taobao.org/mirrors/chromedriver/
镜像下载,解压缩后添加到系统环境变量中使用,不同的电脑设置方法不同,可搜索系统+关键词找到相应教程。
1、 加载需要用到的模块
import urllib
import time
from lxml import etree
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
2、通过关键字构造网址
自己手动输入几个关键字,分析出网址的构造。
PubMed的原始链接为'https://pubmed.ncbi.nlm.nih.gov/' 当输入并搜索'pulmonary arterial hypertension','congeital heart disease'
时,url变成了'https://pubmed.ncbi.nlm.nih.gov/?term=%27pulmonary+arterial+hypertension%27%2C%27congeital+heart+disease%27' 。于是不难分析出网址由两部分组成,一部分是不变的'https://pubmed.ncbi.nlm.nih.gov/?term=' ,另一部分由我们搜索的关键字由字符串%2C
拼接而构成。因此对于传进来的keyword
,我们可以做以下处理,拼接成搜索后返回的url。
keyword = '%2C'.join(['pulmonary arterial hypertension','congeital heart disease'])
start_url = 'https://pubmed.ncbi.nlm.nih.gov/?term='
url = start_url + keyword
3、创建浏览器对象
browser = webdriver.Chrome()
browser.get(url)
上面三步运行成功后,就会自动弹出一个Google浏览器的窗口
方法二:使用brew安装chromedriver
1.brew安装chromedriver
$ brew install chromedriver
2.安装完成后,再次运行:
from selenium import webdriver
driver = webdriver.Chrome()
但是这个方法不好用,我尝试着解决了下面提示中homebrew/cask
的问题踩了好几个坑还是没能成功。
(base) Cheng-MacBook-Pro:MacOS chelsea$ brew install chromedriver
Updating Homebrew...
Error: No available formula with the name "chromedriver"
It was migrated from homebrew/core to homebrew/cask.
You can access it again by running:
brew tap homebrew/cask
And then you can install it by running:
brew cask install chromedriver
记录如下,虽然没能解决问题,在这个过程中还是学到了点东西。
(base) Cheng-MacBook-Pro:~ chelsea$ brew tap homebrew/cask
Updating Homebrew...
==> Tapping homebrew/cask
Cloning into '/usr/local/Homebrew/Library/Taps/homebrew/homebrew-cask'...
fatal: unable to access 'https://github.com/Homebrew/homebrew-cask/': LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 54
Error: Failure while executing; `git clone https://github.com/Homebrew/homebrew-cask /usr/local/Homebrew/Library/Taps/homebrew/homebrew-cask` exited with 128.
(base) Cheng-MacBook-Pro:~ chelsea$ ping github.com
PING github.com (13.229.188.59): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8
Request timeout for icmp_seq 9
Request timeout for icmp_seq 10
^C
--- github.com ping statistics ---
12 packets transmitted, 0 packets received, 100.0% packet loss
(base) Cheng-MacBook-Pro:~ chelsea$ sudo vim /private/etc/hosts
(base) Cheng-MacBook-Pro:~ chelsea$ ping github.com
PING github.com (192.30.253.112): 56 data bytes
76 bytes from 120.80.173.241: Time to live exceeded
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 5400 51a9 0 0000 01 01 85b9 192.168.100.15 192.30.253.112
Request timeout for icmp_seq 0
76 bytes from 120.80.173.241: Time to live exceeded
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 5400 1681 0 0000 01 01 c0e1 192.168.100.15 192.30.253.112
Request timeout for icmp_seq 1
76 bytes from 120.80.173.241: Time to live exceeded
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 5400 1ea6 0 0000 01 01 b8bc 192.168.100.15 192.30.253.112
Request timeout for icmp_seq 2
76 bytes from 120.80.173.241: Time to live exceeded
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 5400 81c5 0 0000 01 01 559d 192.168.100.15 192.30.253.112
Request timeout for icmp_seq 3
76 bytes from 120.80.173.241: Time to live exceeded
Vr HL TOS Len ID Flg off TTL Pro cks Src Dst
4 5 00 5400 43a2 0 0000 01 01 93c0 192.168.100.15 192.30.253.112
^C
--- github.com ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss
参考学习资料
https://www.jianshu.com/p/dd996cdcc3f7
备选的IP
151.101.185.194 github.global.ssl.fastly.net
192.30.253.112 github.com
151.101.184.133 assets-cdn.github.com
151.101.184.133 avatars0.githubusercontent.com
151.101.112.133 avatars1.githubusercontent.com
解决homebrew-cask
的过程,手动从GitHub下载ruby编辑的homebrew-cask
软件,放在原来的位置。
(base) Cheng-MacBook-Pro:~ chelsea$ cd /usr/local/Homebrew/Library/Taps/homebrew/
(base) Cheng-MacBook-Pro:homebrew chelsea$ ls
homebrew-core
(base) Cheng-MacBook-Pro:homebrew chelsea$ mv /Users/chelsea/Downloads/homebrew-cask /usr/local/Homebrew/Library/Taps/homebrew/
(base) Cheng-MacBook-Pro:homebrew chelsea$ ls
homebrew-cask homebrew-core
因为我开始用的是方法2结果踩了一堆坑后仍然没能配制好chromedriver,在这个过程中发现selenium模块还支持其它浏览器,比如mac自带的safari,也找到的相关教程来设置,但我发现这个还是不好用,也不知道是不是我设置的不好,每次只能打开一次,如果不关掉就不能再次启用(简单来说不能支持多开),一点也不智能,最后终于按照方法1完成设置,chrome浏览器就没有这个问题。
https://blog.csdn.net/xqhadoop/article/details/77892796
设置可用
(base) Cheng-MacBook-Pro:~ chelsea$ safaridriver --enable
Password:
(base) Cheng-MacBook-Pro:~ chelsea$
那么一开始我为何没有用方法1呢,其实我是用了的,但是我用错了,主要是设置到环境的问题。
一开始我从官方网站下载不下来,也不知道有镜像,就通过一个远程服务器下载了软件,下载的过程中又参考了别的教程(大部分是windows的教程),设置环境变量出了问题,导致无法调用,反正是各种坑踩完就弄明白了。
年轻就要折腾,但是最好还是多方搜索汲取前人经验教训!!!
找到一分还不错的中文文档:http://selenium-python-zh.readthedocs.io/en/latest/