Python Scrapy使用Selector、xpath、css选择器提取数据

从页面中提取数据的核心技术是HTTP文本解析,在Python 中常用以下模块处理此类问题:

BeautifulSoup lxml
非常流行的HTTP解析库,API 简洁易用,但解析速度较慢。 由C语言编写的xml解析库( libxml2),解析速度更快,API相对复杂。

Scrapy综合上述两者优点实现了Selector 类,它是基于lxml库构建的,并简化了API接口。在Scrapy中使用Selector 对象提取页面中的数据,使用时先通过XPath或CSS选择器选中页面中需要提取的数据,然后进行提取,下面来介绍一下Selector对象的使用。

一、Selector对象

1.1、创建对象

from scrapy.selector import Selector
from scrapy.http import HtmlResponse

html = '''

<html lang="en">
<head>
    <title>Scrapy Studytitle>
head>
<body>
    <h1>Hello Worldh1>
    <h2>ayouleyangh2>
    <b>yangyoub>
    <ul>
        <li>Pythonli>
        <li>Scrapyli>
        <li>htmlli>
    ul>
    
'''

使用Response对象构造Selector对象,将其传递给Selector构造器方法的response参数:

>>> result = HtmlResponse(html,body=html,encoding='utf-8')
>>> selector = Selector(response = result)
>>> print(selector)
<Selector xpath=None data='\n\n    Scrap'</span><span class="token operator">></span>
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <h3>1.2、选中数据</h3> 
  <p>调用Selector对象的xpath或css方法可以选中文中某个或某部分:</p> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> selector_h1 <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'//h1'</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>selector_h1<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'//h1'</span> data<span class="token operator">=</span><span class="token string">'<h1>Hello World</h1>'</span><span class="token operator">></span><span class="token punctuation">]</span>
</code></pre> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> selector_li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'//li'</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>selector_li<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'//li'</span> data<span class="token operator">=</span><span class="token string">'<li>Python</li>'</span><span class="token operator">></span><span class="token punctuation">,</span> 
<span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'//li'</span> data<span class="token operator">=</span><span class="token string">'<li>Scrapy</li>'</span><span class="token operator">></span><span class="token punctuation">,</span> 
<span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'//li'</span> data<span class="token operator">=</span><span class="token string">'<li>html</li>'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <p>xpath和css方法返回一个SelectorList对象,SelectorList支持列表接口,可使用for语句迭代访问其中的对象:</p> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">for</span> li <span class="token keyword">in</span> selector_li<span class="token punctuation">:</span>
		<span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'./text()'</span><span class="token punctuation">)</span><span class="token punctuation">)</span>

	
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'./text()'</span> data<span class="token operator">=</span><span class="token string">'Python'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'./text()'</span> data<span class="token operator">=</span><span class="token string">'Scrapy'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'./text()'</span> data<span class="token operator">=</span><span class="token string">'html'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <p>SelectorList对象也有xpath和css方法:</p> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> lis <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//ul'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>css<span class="token punctuation">(</span><span class="token string">'li'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'./text()'</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>lis<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'./text()'</span> data<span class="token operator">=</span><span class="token string">'Python'</span><span class="token operator">></span><span class="token punctuation">,</span> 
<span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'./text()'</span> data<span class="token operator">=</span><span class="token string">'Scrapy'</span><span class="token operator">></span><span class="token punctuation">,</span> 
<span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'./text()'</span> data<span class="token operator">=</span><span class="token string">'html'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <h3>1.3、提取数据</h3> 
  <p>调用Selector或SelectorList对象的一下方法可将选中的内容提取</p> 
  <ul> 
   <li>extract()</li> 
   <li>re()</li> 
   <li>extract_first() (SelectorList专有)</li> 
   <li>re_first (SelectorList专有)</li> 
  </ul> 
  <p><strong>extract方法</strong></p> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> selector_li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'//li'</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>selector_li<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'//li'</span> data<span class="token operator">=</span><span class="token string">'<li>Python</li>'</span><span class="token operator">></span><span class="token punctuation">,</span> 
Selector xpath<span class="token operator">=</span><span class="token string">'//li'</span> data<span class="token operator">=</span><span class="token string">'<li>Scrapy</li>'</span><span class="token operator">></span><span class="token punctuation">,</span> 
Selector xpath<span class="token operator">=</span><span class="token string">'//li'</span> data<span class="token operator">=</span><span class="token string">'<li>html</li>'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>selector_li<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token operator"><</span>li<span class="token operator">></span>Python<span class="token operator"><</span><span class="token operator">/</span>li<span class="token operator">></span>
<span class="token operator">>></span><span class="token operator">></span> 


<span class="token operator">>></span><span class="token operator">></span> li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//li/text()'</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'.//li/text()'</span> data<span class="token operator">=</span><span class="token string">'Python'</span><span class="token operator">></span><span class="token punctuation">,</span> 
<span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'.//li/text()'</span> data<span class="token operator">=</span><span class="token string">'Scrapy'</span><span class="token operator">></span><span class="token punctuation">,</span>
<span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'.//li/text()'</span> data<span class="token operator">=</span><span class="token string">'html'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
<span class="token operator">>></span><span class="token operator">>></span><span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token string">'Python'</span><span class="token punctuation">,</span> <span class="token string">'Scrapy'</span><span class="token punctuation">,</span> <span class="token string">'html'</span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
Python
<span class="token operator">>></span><span class="token operator">></span> 
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
Scrapy
</code></pre> 
  <p><strong>提取标题内容:</strong></p> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> title <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//title/text()'</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>title<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'.//title/text()'</span> data<span class="token operator">=</span><span class="token string">'Scrapy Study'</span><span class="token operator">></span><span class="token punctuation">]</span>

<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>title<span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token string">'Scrapy Study'</span><span class="token punctuation">]</span>

<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>title<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
Scrapy Study
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <p><strong>定点提取ul>li的内容:</strong></p> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> html <span class="token operator">=</span> <span class="token triple-quoted-string string">'''
    <ul>
        <li>Python编程<b>价格:32.00元</b></li>
        <li>精通Scrapy<b>价格:12.00元</b></li>
        <li>html知识<b>价格:52.00元</b></li>
    </ul>
'''</span>

<span class="token operator">>></span><span class="token operator">></span> selector  <span class="token operator">=</span> Selector<span class="token punctuation">(</span>text<span class="token operator">=</span>html<span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//ul/li/text()'</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'.//ul/li/text()'</span> data<span class="token operator">=</span><span class="token string">'Python编程'</span><span class="token operator">></span><span class="token punctuation">,</span>
 <span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'.//ul/li/text()'</span> data<span class="token operator">=</span><span class="token string">'精通Scrapy'</span><span class="token operator">></span><span class="token punctuation">,</span>
 <span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'.//ul/li/text()'</span> data<span class="token operator">=</span><span class="token string">'html知识'</span><span class="token operator">></span><span class="token punctuation">]</span>

<span class="token operator">>></span><span class="token operator">></span> li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//ul/li/text()'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token string">'Python编程'</span><span class="token punctuation">,</span> <span class="token string">'精通Scrapy'</span><span class="token punctuation">,</span> <span class="token string">'html知识'</span><span class="token punctuation">]</span>

<span class="token operator">>></span><span class="token operator">></span> li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//ul/li/b/text()'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token string">'价格:32.00元'</span><span class="token punctuation">,</span> <span class="token string">'价格:12.00元'</span><span class="token punctuation">,</span> <span class="token string">'价格:52.00元'</span><span class="token punctuation">]</span>

<span class="token operator">>></span><span class="token operator">></span> li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//ul/li/b/text()'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>re<span class="token punctuation">(</span><span class="token string">'\d+\.\d+'</span><span class="token punctuation">)</span>	<span class="token comment">#只提取数字		</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">)</span>			
<span class="token punctuation">[</span><span class="token string">'32.00'</span><span class="token punctuation">,</span> <span class="token string">'12.00'</span><span class="token punctuation">,</span> <span class="token string">'52.00'</span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
<span class="token operator">>></span><span class="token operator">></span> li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//ul/li/b/text()'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>re_first<span class="token punctuation">(</span><span class="token string">'\d+\.\d+'</span><span class="token punctuation">)</span>			
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span><span class="token punctuation">(</span>li<span class="token punctuation">)</span>			
<span class="token number">32.00</span>
<span class="token operator">>></span><span class="token operator">></span> 
<span class="token operator">>></span><span class="token operator">></span> li <span class="token operator">=</span> selector<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'.//ul/li[2]/b/text()'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>re<span class="token punctuation">(</span><span class="token string">'\d+\.\d+'</span><span class="token punctuation">)</span>	<span class="token comment">#li[2]定位第二个li标签		</span>
<span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">print</span> <span class="token punctuation">(</span>li<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span>	<span class="token comment">#[0]提取数组第一位			</span>
<span class="token number">12.00</span>
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <br> 
  <b> </b> 
  <center> 
   <h2> 二、Xpath</h2> 
  </center> Xpath即XML路径语言(XML Path Language),它是一种用来确定xml文档中某部分位置的语言,xml文档(html属于xml)是由一系列节点构成的树。XML 实例文档参考 
  <p>菜鸟教程Xpath语法 。</p> 
  <h3>2.1、基础语法</h3> 
  <p>先创一个html文档,接下来,我们通过一些例子xpath的作用。</p> 
  <pre><code class="prism language-html">>>> from scrapy.selector import Selector
>>> from scrapy.http import HtmlResponse
>>> html = '''
<span class="token doctype"><!DOCTYPE html></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>html</span> <span class="token attr-name">lang</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>en<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>head</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>Xpath study<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>head</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>body</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">id</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>images<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image1.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片1<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image1.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image2.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片2<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image2.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image3.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片3<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image3.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image4.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片4<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image4.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image5.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片5<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image5.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>body</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>html</span><span class="token punctuation">></span></span>
'''
>>> response = HtmlResponse(html,body = html,encoding='utf-8')
</code></pre> 
  <ul> 
   <li>/:描述一个从根节点的绝对路径</li> 
  </ul> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> response<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'/html'</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'/html'</span> data<span class="token operator">=</span><span class="token string">'<html lang="en">\n<head>\n    <title>Xpath'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <ul> 
   <li>E1/E2:选中E1子节点中的所有E2</li> 
  </ul> 
  <pre><code class="prism language-html">>>> response.xpath('/html/body/div/a')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image1.html<span class="token punctuation">"</span>>Name:图片1<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image2.html<span class="token punctuation">"</span>>Name:图片2<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image3.html<span class="token punctuation">"</span>>Name:图片3<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
  <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image4.html<span class="token punctuation">"</span>>Name:图片4<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
  <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image5.html<span class="token punctuation">"</span>>Name:图片5<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
</code></pre> 
  <ul> 
   <li>//E:选中文档中的所有E,无论在什么位置</li> 
  </ul> 
  <pre><code class="prism language-html">>>> name = response.xpath('.//a/text()')
>>> name
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片1<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片2<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片3<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片4<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片5<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
>>> name.extract()
['Name:图片1', 'Name:图片2', 'Name:图片3', 'Name:图片4', 'Name:图片5']
>>> 
</code></pre> 
  <ul> 
   <li>E/text() :选中E的文本子节点</li> 
  </ul> 
  <pre><code class="prism language-html">>>> name = response.xpath('.//a/text()')
>>> name
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片1<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片2<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片3<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片4<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//a/text()<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>Name:图片5<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
>>> name.extract()
['Name:图片1', 'Name:图片2', 'Name:图片3', 'Name:图片4', 'Name:图片5']
>>> 
</code></pre> 
  <ul> 
   <li>E/*: 选中E中的所有元素节点</li> 
  </ul> 
  <pre><code class="prism language-html">>>> response.xpath('/html/body/div/a/*')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><br><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image1.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><br><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image2.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><br><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image3.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><br><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image4.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><br><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>/html/body/div/a/*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image5.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
</code></pre> 
  <ul> 
   <li>*E: 选中孙节点中的所有E</li> 
  </ul> 
  <pre><code class="prism language-html">#选中div孙节点中的所有img
>>> response.xpath('//div/*/img')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//div/*/img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image1.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//div/*/img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image2.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//div/*/img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image3.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//div/*/img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image4.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//div/*/img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image5.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
</code></pre> 
  <ul> 
   <li>E/@ATTR: 选中E的ATTR属性</li> 
  </ul> 
  <pre><code class="prism language-html">#选中所有img的src属性
>>> response.xpath('//img/@src')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/@src<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image1.jpg<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/@src<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image2.jpg<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/@src<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image3.jpg<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/@src<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image4.jpg<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/@src<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image5.jpg<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
>>> response.xpath('//img/@src').extract()
['image1.jpg', 'image2.jpg', 'image3.jpg', 'image4.jpg', 'image5.jpg']
>>> 
</code></pre> 
  <ul> 
   <li>//@ATTR:选中文档中所有ATTR属性</li> 
  </ul> 
  <pre><code class="prism language-html">#选中所有的href属性
>>> response.xpath('//@href')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//@href<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image1.html<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//@href<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image2.html<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//@href<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image3.html<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//@href<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image4.html<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//@href<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image5.html<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
</code></pre> 
  <ul> 
   <li>E/@*: 选中E中的所有属性</li> 
  </ul> 
  <pre><code class="prism language-html">#获取第一个a下img的所有属性(这里只有一个src属性)
>>> response.xpath('//a[1]/img/@*')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//a[1]/img/@*<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>image1.jpg<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
</code></pre> 
  <ul> 
   <li>. :选中当前节点,又来描述相对路径</li> 
  </ul> 
  <pre><code class="prism language-html">#获取第一个a的选择器对象
>>> img = response.xpath('//a')[0]
>>> img
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//a<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image1.html<span class="token punctuation">"</span>>Name:图片1<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>
>>> 
>>> 
#假设找a[0]中的所有img,但却得到所有的img,因为//是绝对路径,会从文档的根部开始搜索
>>> img.xpath('//img')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image1.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image2.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image3.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image4.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image5.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
>>> 
#需要使用.//来描述当前节点后代中的所有img
>>> img.xpath('.//img')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>.//img<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><img src=<span class="token punctuation">"</span>image1.jpg<span class="token punctuation">"</span>><span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
</code></pre> 
  <ul> 
   <li>选中当前节点的父节点</li> 
  </ul> 
  <pre><code class="prism language-html">>>> response.xpath('//img/..')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/..<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image1.html<span class="token punctuation">"</span>>Name:图片1<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/..<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image2.html<span class="token punctuation">"</span>>Name:图片2<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/..<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image3.html<span class="token punctuation">"</span>>Name:图片3<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/..<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image4.html<span class="token punctuation">"</span>>Name:图片4<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
 <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//img/..<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image5.html<span class="token punctuation">"</span>>Name:图片5<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>] 
</code></pre> 
  <ul> 
   <li>node[谓语]:谓语用来查找某个特定的节点或者包含某个特定值的节点</li> 
  </ul> 
  <pre><code class="prism language-html">#选区所有a中的第3个
>>> response.xpath('//a[3]')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//a[3]<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image3.html<span class="token punctuation">"</span>>Name:图片3<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
>>> 
#使用last函数,选中最后一个
>>> response.xpath('//a[last()]')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//a[last()]<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image5.html<span class="token punctuation">"</span>>Name:图片5<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
>>>
#使用position函数,选中前3个
>>> response.xpath('//a[position()<=3]')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//a[position()<=3]<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image1.html<span class="token punctuation">"</span>>Name:图片1<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>,
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//a[position()<=3]<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image2.html<span class="token punctuation">"</span>>Name:图片2<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>, 
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//a[position()<=3]<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><a href=<span class="token punctuation">"</span>image3.html<span class="token punctuation">"</span>>Name:图片3<br><img s<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
>>> 
#选中所有含id属性的div
>>> response.xpath('//div[@id]')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//div[@id]<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><div id=<span class="token punctuation">"</span>images<span class="token punctuation">"</span>>\n    <a href=<span class="token punctuation">"</span>image1.ht<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
>>> 
>>> 
#选中所有含有id属性且值为images的div
>>> response.xpath('//div[@id="images"]')
[<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>Selector</span> <span class="token attr-name">xpath</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span>//div[@id=<span class="token punctuation">"</span>images<span class="token punctuation">"</span>]<span class="token punctuation">'</span></span> <span class="token attr-name">data</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">'</span><div id=<span class="token punctuation">"</span>images<span class="token punctuation">"</span>>\n    <a href=<span class="token punctuation">"</span>image1.ht<span class="token punctuation">'</span></span><span class="token punctuation">></span></span>]
</code></pre> 
  <h3>2.2、常用函数</h3> 
  <p>Xpath还提供了很多函数,如数字、字符串、时间、日期、统计等。</p> 
  <ul> 
   <li>string(arg):返回参数的字符串值。</li> 
  </ul> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">from</span> scrapy<span class="token punctuation">.</span>selector <span class="token keyword">import</span> Selector
<span class="token operator">>></span><span class="token operator">></span> html <span class="token operator">=</span> <span class="token string">'<a href="https://blog.csdn.net/ayouleyang/"><b>阿优乐扬</b>的博客</a>'</span>

<span class="token operator">>></span><span class="token operator">></span> sel <span class="token operator">=</span> Selector<span class="token punctuation">(</span>text<span class="token operator">=</span>html<span class="token punctuation">)</span>
<span class="token operator">>></span><span class="token operator">></span> sel
<span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token boolean">None</span> data<span class="token operator">=</span><span class="token string">'<html><body><a href="https://blog.csdn.n'</span><span class="token operator">></span>

<span class="token operator">>></span><span class="token operator">></span> sel<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'/html/body/a/text()'</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'/html/body/a/text()'</span> data<span class="token operator">=</span><span class="token string">'的博客'</span><span class="token operator">></span><span class="token punctuation">]</span>

<span class="token operator">>></span><span class="token operator">></span> sel<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'/html/body/a/b/text()'</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'/html/body/a/b/text()'</span> data<span class="token operator">=</span><span class="token string">'阿优乐扬'</span><span class="token operator">></span><span class="token punctuation">]</span>

<span class="token comment">#如果想同时得到a中的字符串(阿优乐扬的博客),只是用text()就不行了</span>
<span class="token operator">>></span><span class="token operator">></span> sel<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'/html/body/a//text()'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token string">'阿优乐扬'</span><span class="token punctuation">,</span> <span class="token string">'的博客'</span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
<span class="token comment">#这种情况可以使用string()函数</span>
<span class="token operator">>></span><span class="token operator">></span> sel<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'string(/html/body/a)'</span><span class="token punctuation">)</span><span class="token punctuation">.</span>extract<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token string">'阿优乐扬的博客'</span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <ul> 
   <li>contains(str1,str2): 判断str1中是否包含str2,返回布尔值</li> 
  </ul> 
  <pre><code class="prism language-python"><span class="token operator">>></span><span class="token operator">></span> <span class="token keyword">from</span> scrapy<span class="token punctuation">.</span>selector <span class="token keyword">import</span> Selector
<span class="token operator">>></span><span class="token operator">></span> html <span class="token operator">=</span> <span class="token triple-quoted-string string">'''
<div>
	<p class="Nic name">阿优乐扬</p>
	<p class="English name">Youle</p>
</div>
'''</span>
<span class="token operator">>></span><span class="token operator">></span> sel <span class="token operator">=</span> Selector<span class="token punctuation">(</span>text<span class="token operator">=</span>html<span class="token punctuation">)</span>

<span class="token comment">#选择class属性中包含Nic的p元素</span>
<span class="token operator">>></span><span class="token operator">></span> sel<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'//p[contains(@class,"Nic")]'</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'//p[contains(@class,"Nic")]'</span> data<span class="token operator">=</span><span class="token string">'<p class="Nic name">阿优乐扬</p>'</span><span class="token operator">></span><span class="token punctuation">]</span>

<span class="token comment">#选择class属性中包含name的p元素</span>
<span class="token operator">>></span><span class="token operator">></span> sel<span class="token punctuation">.</span>xpath<span class="token punctuation">(</span><span class="token string">'//p[contains(@class,"name")]'</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'//p[contains(@class,"name")]'</span> data<span class="token operator">=</span><span class="token string">'<p class="Nic name">阿优乐扬</p>'</span><span class="token operator">></span><span class="token punctuation">,</span> 
<span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'//p[contains(@class,"name")]'</span> data<span class="token operator">=</span><span class="token string">'<p class="English name">Youle</p>'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 

</code></pre> 
  <center> 
   <h2> 三、CSS选择器</h2> 
  </center> CSS即层叠样式表,其选择器是种用来确定 Html文档中某部分位置的语言。CSS选择器的语法比XPath更简单一些, 但功能不如XPath强大。实际上,当我们调用Selector对象的CSS方法时,在其内部会使用Python库csselecet将CSS选择器表达式翻译成XPath表达式,然后调用Selector 对象的XPATH方法。 
  <p>先创建一个HTML文档并构造一个HtmlResponse对象。</p> 
  <pre><code class="prism language-html">from scrapy.selector import Selector
from scrapy.http import HtmlResponse
html = '''
<span class="token doctype"><!DOCTYPE html></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>html</span> <span class="token attr-name">lang</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>en<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>head</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>CSS选择器 study<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>head</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>body</span><span class="token punctuation">></span></span>
<div id="images1", style="width:512px;" >
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image1.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片1<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image1.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image2.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片2<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image2.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image3.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片3<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image3.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span>
<div id="images2", class="pictrue" >
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image4.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片4<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image4.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
    <span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image5.html<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>Name:图片5<span class="token tag"><span class="token tag"><span class="token punctuation"><</span>br</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>img</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation">=</span><span class="token punctuation">"</span>image5.jpg<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>body</span><span class="token punctuation">></span></span>
<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>html</span><span class="token punctuation">></span></span>
'''
response = HtmlResponse(html,body = html,encoding='utf-8')
</code></pre> 
  <ul> 
   <li>E :选中E元素</li> 
  </ul> 
  <pre><code class="prism language-python"><span class="token comment">#选中所有的img</span>
<span class="token operator">>></span><span class="token operator">></span> response<span class="token punctuation">.</span>css<span class="token punctuation">(</span><span class="token string">'img'</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'descendant-or-self::img'</span> data<span class="token operator">=</span><span class="token string">'<img src="image1.jpg">'</span><span class="token operator">></span><span class="token punctuation">,</span>
 <span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'descendant-or-self::img'</span> data<span class="token operator">=</span><span class="token string">'<img src="image2.jpg">'</span><span class="token operator">></span><span class="token punctuation">,</span> 
 <span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'descendant-or-self::img'</span> data<span class="token operator">=</span><span class="token string">'<img src="image3.jpg">'</span><span class="token operator">></span><span class="token punctuation">,</span> 
 <span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'descendant-or-self::img'</span> data<span class="token operator">=</span><span class="token string">'<img src="image4.jpg">'</span><span class="token operator">></span><span class="token punctuation">,</span> 
 <span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'descendant-or-self::img'</span> data<span class="token operator">=</span><span class="token string">'<img src="image5.jpg">'</span><span class="token operator">></span><span class="token punctuation">]</span>
<span class="token operator">>></span><span class="token operator">></span> 
</code></pre> 
  <ul> 
   <li>E1,E2 :选中E1和E2元素</li> 
  </ul> 
  <pre><code class="prism language-python"><span class="token comment">#选中所有的title和div</span>
<span class="token operator">>></span><span class="token operator">></span> response<span class="token punctuation">.</span>css<span class="token punctuation">(</span><span class="token string">'title,div'</span><span class="token punctuation">)</span>
<span class="token punctuation">[</span><span class="token operator"><</span>Selector xpath<span class="token operator">=</span><span class="token string">'descendant-or-self::title | descendant-or-self::div'</span> data<span class="token operator">=</span><span class="token string">'<title>CSS选择器 study'>, 
<Selector xpath='descendant-or-self::title | descendant-or-self::div' data='
\n'>, <Selector xpath='descendant-or-self::title | descendant-or-self::div' data='
\n <'>] >>>
  • E1 E2 :选中E1后代中的E2元素
#div后代中的img
>>> response.css('div img')
[<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data=''>, 
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data=''>, 
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data=''>, 
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data=''>, 
<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data=''>]
>>> 
  • E1>E2 :选中E1子元素中的E2元素
>>> response.css('body>div')
[<Selector xpath='descendant-or-self::body/div' data='
\n'>, <Selector xpath='descendant-or-self::body/div' data='
\n <'>]
  • [ATTR] :选中包含ATTR属性的元素
>>> response.css('[style]')
[<Selector xpath='descendant-or-self::*[@style]' data='
\n'>] >>>
  • [ATTR=VALUE] :选中包含ATTR属性且值为VALUE的元素
>>> response.css('[id="images1"]')
[<Selector xpath="descendant-or-self::*[@id = 'images1']" data='
\n'>] >>>
  • E:nth-child(n):选中E元素,且该元素必须是其父元素的第n个子元素
#选中每个div的第一个a
>>> response.css('div>a:nth-child(1)')
[<Selector xpath='descendant-or-self::div/a[count(preceding-sibling::*) = 0]' data='Name:图片1
>, <Selector xpath='descendant-or-self::div/a[count(preceding-sibling::*) = 0]' data='
Name:图片4
>] >>> #选中第二个div的第一个a >>> response.css('div:nth-child(2)>a:nth-child(1)') [<Selector xpath='descendant-or-self::div[count(preceding-sibling::*) = 1]/a[count(preceding-sibling::*) = 0]' data='
Name:图片4
>] >>>
  • E:first-child:选中E元素,该元素必须是其父元素的第一个子元素
  • E:last-child:选中E元素,该元素必须是其父元素的倒数第一个子元素
#选取第一个div的最后一个a
>>> response.css('div:first-child>a:last-child')
[<Selector xpath='descendant-or-self::div[count(preceding-sibling::*) = 0]/a[count(following-sibling::*) = 0]' data='Name:图片3
>] >>>
  • E::text: 选中E元素的文本节点
#选中所有a的文本

>>> response.css('a::text')
[<Selector xpath='descendant-or-self::a/text()' data='Name:图片1'>, 
<Selector xpath='descendant-or-self::a/text()' data='Name:图片2'>, 
<Selector xpath='descendant-or-self::a/text()' data='Name:图片3'>, 
<Selector xpath='descendant-or-self::a/text()' data='Name:图片4'>, 
<Selector xpath='descendant-or-self::a/text()' data='Name:图片5'>]

>>> response.css('a::text').extract()
['Name:图片1', 'Name:图片2', 'Name:图片3', 'Name:图片4', 'Name:图片5']
>>> 




以上学习内容参考《精通Scrapy网络爬虫 ——刘硕 编著》

你可能感兴趣的:(Python学习)