CentOS上安装Selenium和google Driver的过程以及问题记录

引言

Selenium主要用在自动化测试中,但是也可以用在爬取数据中,由于其实真实的浏览器,则可以无缝地提取数据,而无需担心各类的数据屏蔽,这里主要介绍在CentOS上安装它们的过程以及其中碰到的各类问题记录。

环境介绍

CentOS 7.4 , Selenium 3.13.0, google chrome, Gecko Driver,这里以google的chrome为例,Gecko的过程类似。 墙内的用户建议使用Gecko Driver。 自备梯子的童鞋,则可以考虑Google Chrome的driver。

安装步骤

  1. 创建google chrome.repo文件。

    vi /etc/yum.repos.d/google-chrome.repo

    在文件中输入如下内容:

[google-chrome]
name=google-chrome
baseurl=http://dl.google.com/linux/chrome/rpm/stable/x86_64
enabled=1
gpgcheck=1
gpgkey=https://dl.google.com/linux/linux_signing_key.pub

2 执行yum的更新操作

yum update

如果顺利的话,则可以提示需要更新的内容。
在执行过程中,碰到如下问题:

---> Package fontpackages-filesystem.noarch 0:1.44-8.el7 will be installed
---> Package google-chrome-stable.x86_64 0:68.0.3440.84-1 will be installed
--> Processing Dependency: libappindicator3.so.1()(64bit) for package: google-chrome-stable-68.0.3440.84-1.x86_64
---> Package graphite2.x86_64 0:1.3.10-1.el7_3 will be installed
---> Package lcms2.x86_64 0:2.6-3.el7 will be installed
---> Package libXxf86vm.x86_64 0:1.1.4-1.el7 will be installed
---> Package libgusb.x86_64 0:0.2.9-1.el7 will be installed
---> Package libsoup.x86_64 0:2.56.0-4.el7_4 will be installed
--> Processing Dependency: glib-networking(x86-64) >= 2.38.0 for package: libsoup-2.56.0-4.el7_4.x86_64
---> Package libxshmfence.x86_64 0:1.2-1.el7 will be installed
---> Package mesa-libgbm.x86_64 0:17.0.1-6.20170307.el7 will be installed
---> Package mesa-libglapi.x86_64 0:17.0.1-6.20170307.el7 will be installed
---> Package stix-fonts.noarch 0:1.1.0-5.el7 will be installed
--> Running transaction check
---> Package glib-networking.x86_64 0:2.50.0-1.el7 will be installed
--> Processing Dependency: gsettings-desktop-schemas for package: glib-networking-2.50.0-1.el7.x86_64
---> Package google-chrome-stable.x86_64 0:68.0.3440.84-1 will be installed
--> Processing Dependency: libappindicator3.so.1()(64bit) for package: google-chrome-stable-68.0.3440.84-1.x86_64
--> Running transaction check
---> Package google-chrome-stable.x86_64 0:68.0.3440.84-1 will be installed
--> Processing Dependency: libappindicator3.so.1()(64bit) for package: google-chrome-stable-68.0.3440.84-1.x86_64
---> Package gsettings-desktop-schemas.x86_64 0:3.22.0-1.el7 will be installed
--> Finished Dependency Resolution
Error: Package: google-chrome-stable-68.0.3440.84-1.x86_64 (google-chrome)
Requires: libappindicator3.so.1()(64bit)
You could try using --skip-broken to work around the

先是执行如下命令:

yum –enablerepo=extras install epel-release
输出信息如下;

Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Package epel-release-7-9.noarch already installed and latest version
Nothing to do

表示其已经安装成功了。于是继续安装:

yum install libappindicator-gtk3

但是依然会提示上述的错误信息,于是这里我就直接将epel-release进行了卸载和重新安装,则问题解决:

yum –enablerepo=extras reinstall epel-release
yum install libappindicator-gtk3

则在yum update过程中的错误信息解决。


  1. 安装Google Chrome

yum install google-chrome-stable
但是非常不幸的是,问题再次出现了,问题的错误信息如下:

Total size: 52 M
Installed size: 187 M
Is this ok [y/d/N]: y
Downloading packages:
warning: /var/cache/yum/x86_64/7/google-chrome/packages/google-chrome-stable-68.0.3440.84-1.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID 7fac5991: NOKEY
Retrieving key from https://dl-ssl.google.com/linux/linux_signing_key.pub

GPG key retrieval failed: [Errno 14] curl#7 - "Failed to connect to 2404:6800:4008:c00::5d: Network is unreachable"

从错误信息上,好像是网络的某些设置被阻隔了。好吧,于是直接下载安装包,本地安装好了。

wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
yum -y localinstall google-chrome-stable_current_x86_64.rpm

然后安装完成。


  1. 安装 chormedriver
    下载地址: https://chromedriver.storage.googleapis.com/index.html?path=2.41/
    这里使用的是2.41版本,下载,解压之后,放入PATH环境变量之中。懒惰的话,也可以直接放入本地Python的bin目录中去。
    放好之后,打开命令行,执行如下命令,看看是否有有效的信息输出:

chromedriver
输出信息如下:

Starting ChromeDriver 2.41.578700 (2f1ed5f9343c13f73144538f15c00b370eda6706) on port 9515
Only local connections are allowed.

这个表示其被正确启动了,安装成功了。
5. 安装selenium
由于Selenium是标准的python包,这里直接基于pip进行安装。

pip install selenium

6.启动本地Spider程序
在程序启动过程中,出现了如下错误信息:

File "xxx-sy.py", line 384, in
browser = webdriver.Chrome(chrome_options=chrome_options)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
desired_capabilities=desired_capabilities)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 156, in __init__
self.start_session(capabilities, browser_profile)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
(Driver info: chromedriver=2.41.578700 (2f1ed5f9343c13f73144538f15c00b370eda6706),platform=Linux 3.10.0-693.5.2.el7.x86_64 x86_64)

于是切换到命令行下,尝试测试一下google chrome的命令是否可用:

google-chrome
命令输出内容如下:

[29574:29574:0803/145908.944672:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.

从错误信息上可以看到,需要在启动Firefox的过程中,加入–no-sanbox的启动参数。

在启动过程中,还出现了一个新的错误信息:

Fri, 03 Aug 2018 15:04:51 xx-xx.py[line:183] INFO category, style, create dir:/export/xx/spider/xx/
Traceback (most recent call last):
File "taobao-sexy.py", line 428, in
total_images = spide_one_action(page_url, folder_path)
File "taobao-sexy.py", line 264, in spide_one_action
brw= create_brw()
File "taobao-sexy.py", line 309, in create_brw
brw = webdriver.Chrome(chrome_options=chrome_options)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
desired_capabilities=desired_capabilities)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 156, in __init__
self.start_session(capabilities, browser_profile)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 251, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/export/home/anaconda3/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
(Driver info: chromedriver=2.41.578700 (2f1ed5f9343c13f73144538f15c00b370eda6706),platform=Linux 3.10.0-693.5.2.el7.x86_64 x86_64)

经过调查研究之后,发现其需要在启动过程中设置chrome的参数如下:

–disable-dev-shm-usage

7.完成的Chrome启动参数如下:

 chrome_options = webdriver.ChromeOptions()
 chrome_options.add_argument('--headless')
 chrome_options.add_argument('--no-sandbox')
 chrome_options.add_argument('--disable-gpu')
 chrome_options.add_argument('--disable-dev-shm-usage')

于是就可以欢欢喜喜地去调用Selenium驱动的chrome快乐地游玩了。

总结

这个过程中还是有非常多的问题,需要不停的研究和解决的,这里权作记录,方便后续的参考。

你可能感兴趣的:(脚本语言,互联网技术,数据爬虫,数据爬虫)