Scrapy 在Mac OSX 10.10 上安装错误的解决。Failed building wheel for lxml

Scrapy 在Mac OSX 10.10 上安装错误的解决

Scrapy 是一个基于Python的爬虫框架。它简洁而跨平台,适合爬虫类软件的快速开发。

Scrapy的官方安装文档中给出的安装方法是使用pip进行安装

pip install Scrapy

但是在OSX 10.10中运行以上代码会出现lxml模块无法编译的问题。错误信息为

 Failed building wheel for lxml

更详细的错误信息类似于

'CC' can't be find

这个错误是因为pip在安装Scrapy模块时依赖lxml模块,而pip的默认行为是下载源码进行编译。在MAC终端中又没有指定C编译器的环境变量。

如果没有在MAC上编译C代码的需求,我们可以直接安装lxml的二进制版本,步骤如下:

  • 下载并安装Macport

从Macport官网下载Yosemite版本的Macport并安装

  • 安装二进制版本lxml, 在终端中运行如下命令

sudo port install py27-lxml

  • 安装Scrapy

sudo pip install Scrapy

等到终端提示Done。安装完成。运行如下命令可以进行一个scrapy性能测试

scrapy bench

安装成功的输出结果应该类似于下面这样:

White-Knight:~ zhengcai$ scrapy bench
2015-05-27 19:54:07+0800 [scrapy] INFO: Scrapy 0.24.6 started (bot: scrapybot)
2015-05-27 19:54:07+0800 [scrapy] INFO: Optional features available: ssl, http11
2015-05-27 19:54:07+0800 [scrapy] INFO: Overridden settings: {‘CLOSESPIDER_TIMEOUT’: 10, ‘LOG_LEVEL’: ‘INFO’, ‘LOGSTATS_INTERVAL’: 1}
2015-05-27 19:54:07+0800 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-05-27 19:54:07+0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-05-27 19:54:07+0800 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-05-27 19:54:07+0800 [scrapy] INFO: Enabled item pipelines:
2015-05-27 19:54:07+0800 [follow] INFO: Spider opened
2015-05-27 19:54:07+0800 [follow] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:09+0800 [follow] INFO: Crawled 106 pages (at 6360 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:09+0800 [follow] INFO: Crawled 195 pages (at 5340 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:11+0800 [follow] INFO: Crawled 298 pages (at 6180 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:11+0800 [follow] INFO: Crawled 394 pages (at 5760 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:12+0800 [follow] INFO: Crawled 490 pages (at 5760 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:13+0800 [follow] INFO: Crawled 586 pages (at 5760 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:14+0800 [follow] INFO: Crawled 674 pages (at 5280 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:16+0800 [follow] INFO: Crawled 770 pages (at 5760 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:17+0800 [follow] INFO: Crawled 850 pages (at 4800 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:18+0800 [follow] INFO: Crawled 939 pages (at 5340 pages/min), scraped 0 items (at 0 items/min)
2015-05-27 19:54:18+0800 [follow] INFO: Closing spider (closespider_timeout)
2015-05-27 19:54:18+0800 [follow] INFO: Dumping Scrapy stats:
{‘downloader/request_bytes’: 345797,
‘downloader/request_count’: 954,
‘downloader/request_method_count/GET’: 954,
‘downloader/response_bytes’: 1905484,
‘downloader/response_count’: 954,
‘downloader/response_status_count/200’: 954,
‘dupefilter/filtered’: 1642,
‘finish_reason’: ‘closespider_timeout’,
‘finish_time’: datetime.datetime(2015, 5, 27, 11, 54, 18, 253375),
‘log_count/INFO’: 17,
‘request_depth_max’: 49,
‘response_received_count’: 954,
‘scheduler/dequeued’: 954,
‘scheduler/dequeued/memory’: 954,
‘scheduler/enqueued’: 17437,
‘scheduler/enqueued/memory’: 17437,
‘start_time’: datetime.datetime(2015, 5, 27, 11, 54, 7, 972465)}
2015-05-27 19:54:18+0800 [follow] INFO: Spider closed (closespider_timeout)

Scrapy 安装成功后,就可以进行爬虫的编写啦~ Happy Your Crawling!

你可能感兴趣的:(数据结构与算法)