Raymone_

爬虫学习：解析库的使用

解析库的使用

1. 使用 XPath

1.1 XPath 概览
1.2 XPath 常用规则
1.3 实例引入
1.4 所有节点
1.5 子节点
1.6 父节点
1.7 属性匹配
1.8 文本获取
1.9 属性获取
1.10 属性多值匹配
1.11 多属性匹配
1.12 按序选择
1.13 节点轴选择

2. 使用 Beautiful Soup

2.1 解析器
2.2 基本用法
2.3 节点选择器
2.4 方法选择器
2.5 CSS 选择器

3. 使用 pyquery

3.1 初始化
3.2 基本 CSS 选择器
3.3 查找节点
3.4 遍历
3.5 获取信息
3.6 节点操作
3.7 伪类选择器

1. 使用 XPath

XPath，全称为 XML Path Language，即 XML 路径语言，是一门在 XML 文档中查找信息的语言，但也适用于 HTML 文档的搜索

1.1 XPath 概览

拥有简洁明了的路径选择表达式
提供超过 100 个内建函数，用于字符串、数值、时间的匹配以及节点、序列的处理等
官方网站：https://www.w3.org/TR/xpath/

1.2 XPath 常用规则

XPath 常用规则如下：
- nodename 选取此节点的所有子节点
- / 从当前节点选取直接子节点
- // 从当前节点选取子孙节点
- . 选取当前节点
- … 选取当前节点的父节点
- @ 选取属性
一个示例，代表选择所有名称为 title，同时属性为 lang 的值为 eng 的节点：
```
//title[@lang='eng']
```

1.3 实例引入

一个实例：
- 首先声明一段 HTML 文本
- 然后调用 HTML 类进行初始化，构造一个 XPath 解析对象。注意最后一个 li 节点是没有闭合的，但 etree 模块可以自动修正
- 使用 tostring 方法输出修正后的 HTML 代码，但是是 bytes 类型
- 通过 decode() 方法转化为 str 类型
经过处理后，li 节点标签被补全，并且自动添加了 body, html 节点

from lxml import etree

# 声明一段 HTML 文本
text = '''

    
         first item
         second item
         third item
         fourth item
         fifth item
     
 
'''
# 调用 HTML 类进行初始化，构造一个 XPath 解析对象
# 注意最后一个 li 节点是没有闭合的，但 etree 模块可以自动修正
html = etree.HTML(text)
# 使用 tostring 方法输出修正后的 HTML 代码，但是是 bytes 类型
result = etree.tostring(html)
# 通过 decode() 方法转化为 str 类型
print(result.decode('utf-8'))


    
         first item
         second item
         third item
         fourth item
         fifth item

直接读取文本文件进行解析：
- test.html 文件的内容和上面的一模一样
- 输出多了也给 DOCTYPE 声明，不过对解析无影响
- 节点后面的代表什么？

from lxml import etree

html = etree.parse('test.html', etree.HTMLParser())
result = etree.tostring(html)
print(result.decode('utf-8'))



    
         first item
         second item
         third item
         fourth item
         fifth item

1.4 所有节点

一般用 // 开头的 XPath 规则来选取所有符合要求的节点，如下例子，其中 * 代表匹配所有节点，返回形式是一个列表，每个元素都是 Element 类型，其后跟了节点的名称：

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//*')
print(result)

[, , , , , , , , , , , , , ]

也可以指定节点名称，如下获取所有 li 节点：

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//li')
print(result)

[, , , , ]

1.5 子节点

通过 / 或 // 即可查找元素的子节点或孙节点，如下选择 li 节点的所有直接 a 子节点：

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//li/a')
print(result)

[, , , , ]

如下使用 // 选择 ul 节点下的 a 节点：

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//ul//a')
print(result)

[, , , , ]

1.6 父节点

父节点的选择通过 . . 来实现，如下实例先选中 href 属性为 link4.html 的 a 节点，然后再获取其父节点，然后再获取其 class 属性：

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//a[@href="link4.html"]/../@class')
print(result)

['item-1']

也可以通过 parent:: 来获取父节点：

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//a[@href="link4.html"]/parent::*/@class')
print(result)

['item-1']

1.7 属性匹配

如上例，使用 @ 符号进行属性过滤

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//li[@class="item-0"]')
print(result)

[, ]

1.8 文本获取

使用 XPath 中的 text() 方法获取节点中的文本，如下实例尝试获取前面 li 节点的文本：

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//li[@class="item-0"]/text()')
print(result)

['\r\n     ']

但输出只有一个回车符和一个换行符，是因为 text() 前面是 /，/ 的含义是选取直接子节点，li 的直接子节点都是 a 节点，文本都在 a 节点的内部的，因此这里匹配到的结果就是被修正的 li 节点内部的回车符和换行符，因为自动修正的 li 节点的尾标签换行了；
也就是说提取到的文本是 li 节点的尾标签和 a 节点的尾标签之间的文本
要获取 li 节点内部的文本，一个是先选取 a 节点再获取文本，一种是使用 // 获取子孙节点的文本，后者获取到的内容包括前者，因为选择的是所有子孙节点的文本，而前者只是子节点 a 内部的文本

# 先获取 a 节点：
from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//li[@class="item-0"]/a/text()')
print(result)

# 使用 // 获取子孙节点：
from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//li[@class="item-0"]//text()')
print(result)

['first item', 'fifth item']
['first item', 'fifth item', '\r\n     ']

1.9 属性获取

属性通过 @ 来获取：

from lxml import etree
html = etree.parse('test.html', etree.HTMLParser())
result = html.xpath('//li/a/@href')
print(result)

['link1.html', 'link2.html', 'link3.html', 'link4.html', 'link5.html']

1.10 属性多值匹配

当一个节点的属性有多个值，就无法使用之前的属性匹配获取
这种情况下使用 contains() 方法，contains() 方法的第一个参数为属性名称，第二个参数为属性值，示例如下：

from lxml import etree
text = '''
first item
'''
html = etree.HTML(text)

# 使用之前的匹配方法获得的结果为空
result = html.xpath('//li[@class="li"]/a/text()')
print(result)

# 使用 contains() 方法
result = html.xpath('//li[contains(@class, "li")]/a/text()')
print(result)

[]
['first item']

1.11 多属性匹配

根据多个属性确定一个节点，比如同时满足两个属性，或者满足两个属性中的一个，需要用到 XPath 中的运算符，如 and, or, mod, |(计算两个节点集), +, -, *, div, =, !=, <, <=, >, >= 等

from lxml import etree
text = '''
first item
'''
html = etree.HTML(text)
result = html.xpath('//li[contains(@class, "li") and @name="item"]/a/text()')
print(result)

['first item']

1.12 按序选择

我们在选择的时候，某些属性可能同时匹配了多个节点，但是只想要其中的某个节点，可以利用中括号传入所有的方法进行获取
中括号不仅可以是数字，也可以是位置函数如 last(), last()-2, position() < 3 等，另外需要注意这里 1 即第一个，和代码不同

from lxml import etree
 
text = '''

    
         first item
         second item
         third item
         fourth item
         fifth item
     
 
'''
html = etree.HTML(text)
result = html.xpath('//li[1]/a/text()')
print(result)
result = html.xpath('//li[last()]/a/text()')
print(result)
result = html.xpath('//li[last()-2]/a/text()')
print(result)
result = html.xpath('//li[position()<3]/a/text()')
print(result)

['first item']
['fifth item']
['third item']
['first item', 'second item']

1.13 节点轴选择

XPath 具有很多节点轴选择方法，包括获取子元素，兄弟元素，父元素，祖先元素等，如下示例：
- 第一次选择：调用 acestor 轴，其后加了两个冒号以及 * 号，以获取所有祖先节点
- 第二次选择，调用 acestor 轴，冒号后面加了 div，代表选择祖先节点中的 div 节点
- 第三次选择，调用 attribute 轴，可以获取所有属性值
- 第四次选择，调用 child 轴，获取直接子节点
- 第五次选择，调用 descendant 轴，获取子孙节点
- 第六次选择，调用 following 轴，获取当前节点及其之后的节点
- 第七次选择，调用 following-sibling 轴，获取当前节点及其之后的同级节点

from lxml import etree
 
text = '''

    
         first item
         second item
         third item
         fourth item
         fifth item
     
 
'''
html = etree.HTML(text)
result = html.xpath('//li[1]/ancestor::*')
print(result)
result = html.xpath('//li[1]/ancestor::div')
print(result)
result = html.xpath('//li[1]/attribute::*')
print(result)
result = html.xpath('//li[1]/child::a[@href="link1.html"]')
print(result)
result = html.xpath('//li[1]/descendant::span')
print(result)
result = html.xpath('//li[1]/following::*[2]')              # 1 为当前节点
print(result)
result = html.xpath('//li[1]/following-sibling::*')         # 包含当前节点
print(result)

[, , , ]
[]
['item-0']
[]
[]
[]
[, , , ]

2. 使用 Beautiful Soup

Beautiful Soup 借助网页的结构和属性等特性来解析网页，因此不必写一些复杂的正则表达式

2.1 解析器

Beautiful Soup 在解析时实际上以来解析器，除了支持 Python 标准库中的 HTML 解析器外，还支持一些第三方解析器如 lxml
lmxl 解析器有解析 HTML 和 XML 的功能，而且速度快，容错能力强，因此后面统一使用这个解析器

2.2 基本用法

实例引入：
- 声明 html：html 并不是一个完整的 HTML 字符串， body 和 html 节点都没有闭合:
- 实例化 BeautifulSoup 对象：将 html 作为第一个参数传入 soup，一个 BeautifulSoup 对象，该对象的第二个参数是解析器的类型，该步骤同时自动更正 html，补充缺失的尾标签
- prettify() 方法：将要解析的字符串以标准的缩进格式输出
- soup.title.string：选择 HTML 的 title 节点，并获得其文本

from bs4 import BeautifulSoup

# 声明 html
html = """
The Dormouse's story

The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
"""
# 实例化 BeautifulSoup 对象
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())
print(soup.title.string)


 
  
   The Dormouse's story
  
 
 
  
   
    The Dormouse's story
   
  
  
   Once upon a time there were three little sisters; and their names were
   
    
   
   ,
   
    Lacie
   
   and
   
    Tillie
   
   ;
and they lived at the bottom of a well.
  
  
   ...
  
 

The Dormouse's story

2.3 节点选择器

选择元素

如下例子详细说明选择元素的方法：
- 首先打印输出 title 节点的选择结果，输出为 title 节点及其里面的文字内容
- 然后输出 title 节点的类型，为 bs4.element.Tag 类型，是 BeautifulSoup 的一个重要的数据结构
- Tag 具有一些属性如 string，调用后可得到节点的文本内容
- 最后一个 p 节点输出的内容表明此方式只会输出第一个匹配到的节点

from bs4 import BeautifulSoup

# 声明 html
html = """
The Dormouse's story

The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
"""
# 实例化 BeautifulSoup 对象
soup = BeautifulSoup(html, 'lxml')
print(soup.title)
print(type(soup.title))
print(soup.title.string)
print(soup.head)
print(soup.p)

The Dormouse's story

The Dormouse's story
The Dormouse's story
The Dormouse's story

提取信息

获取名称

print(soup.title.name)

title

获取属性

print(soup.p.attrs)          # 获取所有属性
print(soup.p.attrs['name'])  # 获取 name 属性
print(soup.p['name'])        # 同上
print(soup.p['class'])       # 获取 class 属性，由于 class 可能有多个，因此返回的是列表

{'class': ['title'], 'name': 'dromouse'}
dromouse
dromouse
['title']

嵌套选择

我们在获取了节点之后，还可以继续调用来选取其内部的节点：

from bs4 import BeautifulSoup

html = """
The Dormouse's story

"""

soup = BeautifulSoup(html, 'lxml')
print(soup.head.title)
print(type(soup.head.title))
print(soup.head.title.string)

The Dormouse's story

The Dormouse's story

关联选择

先选择某个节点，再以该节点为基准，选择它的子节点、父节点、兄弟节点等
子节点和子孙节点：
- contents 属性得到直接子节点的列表
- children 属性得到直接子节点的生成器
- descendants 属性得到所有子孙节点的生成器

from bs4 import BeautifulSoup

html = """

    
        The Dormouse's story
    
    
        
            Once upon a time there were three little sisters; and their names were
            
                Elsie
            
            Lacie 
            and
            Tillie
            and they lived at the bottom of a well.
        
        ...
"""

soup = BeautifulSoup(html, 'lxml')
# contents 属性
print(soup.p.contents)

['\n            Once upon a time there were three little sisters; and their names were\n            ', 
Elsie
, '\n', Lacie, ' \n            and\n            ', Tillie, '\n            and they lived at the bottom of a well.\n        ']

# children 属性
print(soup.p.children)
for i, child in enumerate(soup.p.children):
    print(i, child)


0 
            Once upon a time there were three little sisters; and their names were
            
1 
Elsie

2 

3 Lacie
4  
            and
            
5 Tillie
6 
            and they lived at the bottom of a well.

# descendants 属性
print(soup.p.descendants)
for i, child in enumerate(soup.p.descendants):
    print(i, child)


0 
            Once upon a time there were three little sisters; and their names were
            
1 
Elsie

2 

3 Elsie
4 Elsie
5 

6 

7 Lacie
8 Lacie
9  
            and
            
10 Tillie
11 Tillie
12 
            and they lived at the bottom of a well.

父节点和子孙节点
- parent 属性获取直接父节点
- parents 属性获取所有祖先节点，返回生成器类型

from bs4 import BeautifulSoup

html = """

    
        The Dormouse's story
    
    
        
            Once upon a time there were three little sisters; and their names were
            
                Elsie
            
        
        ...
"""

soup = BeautifulSoup(html, 'lxml')
# parent 属性
print(soup.a.parent)


            Once upon a time there were three little sisters; and their names were
            
Elsie

# parents 属性
print(soup.a.parents)
for i, parent in enumerate(soup.a.parents):
    print(i, parent)


0 
            Once upon a time there were three little sisters; and their names were
            
Elsie


1 

            Once upon a time there were three little sisters; and their names were
            
Elsie


...

2 

The Dormouse's story



            Once upon a time there were three little sisters; and their names were
            
Elsie


...

3 

The Dormouse's story



            Once upon a time there were three little sisters; and their names were
            
Elsie


...

兄弟节点
- next_sibling 和 previous_sibling 分别获取节点的下一个和前一个兄弟元素或节点
- next_siblings 和 previous_siblings 分别获取节点的后面和前面的兄弟元素和节点

from bs4 import BeautifulSoup

html = """

    
        
            Once upon a time there were three little sisters; and their names were
            
                Elsie
            
            Hello
            Lacie 
            and
            Tillie
            and they lived at the bottom of a well.
        
"""

soup = BeautifulSoup(html, 'lxml')
# parent 属性
print('Next Sibling', soup.a.next_sibling)
print('Previous Sibling', soup.a.previous_sibling)
print('Next Siblings', list(enumerate(soup.a.next_siblings)))
print('Previous Siblings', list(enumerate(soup.a.previous_siblings)))

Next Sibling 
            Hello
            
Previous Sibling 
            Once upon a time there were three little sisters; and their names were
            
Next Siblings [(0, '\n            Hello\n            '), (1, Lacie), (2, ' \n            and\n            '), (3, Tillie), (4, '\n            and they lived at the bottom of a well.\n        ')]
Previous Siblings [(0, '\n            Once upon a time there were three little sisters; and their names were\n            ')]

提取信息

提取信息的方法和之前一样：

from bs4 import BeautifulSoup

html = """

    
        
            Once upon a time there were three little sisters; and their names were
            BobLacie 
        
"""
soup = BeautifulSoup(html, 'lxml')
print('Next Sibling:')
print(type(soup.a.next_sibling))
print(soup.a.next_sibling)
print(soup.a.next_sibling.string)
print('Parents:')
print(type(soup.a.parents))
print(list(soup.a.parents)[0])
print(list(soup.a.parents)[0].attrs['class'])

Next Sibling:

Lacie
Lacie
Parents:


            Once upon a time there were three little sisters; and their names were
            BobLacie

['story']

2.4 方法选择器

属性选择器很快但是不适合复杂的选择，通过 find_all(), find() 等方法，通过传入一些参数，可以很灵活地进行查询

find_all()

find_all() 查询所有符合条件的元素，API 如下：
```
find_all(name, attrs, recursive, text, **kwargs)
```
name：根据节点名查询元素：
- 返回结果是列表类型，其元素仍为 bs4.element.Tag 类型
- 因为都是 Tag 类型，所以仍可以进行嵌套查询

from bs4 import BeautifulSoup

html='''

    
        Hello
    
    
        
            Foo
            Bar
            Jay
        
        
            Foo
            Bar
        
    

'''

soup = BeautifulSoup(html, 'lxml')
print(soup.find_all(name='ul'))
print(type(soup.find_all(name='ul')[0]))

[
Foo
Bar
Jay
, 
Foo
Bar
]

# 嵌套查询
for ul in soup.find_all(name='ul'):
    print(ul.find_all(name='li'))
    for li in ul.find_all(name='li'):
        print(li.string)

[Foo
, Bar
, Jay]
Foo
Bar
Jay
[Foo
, Bar]
Foo
Bar

attrs：根据属性查询：
- 传入的参数是字典类型
- 对于常用的属性，如 id 和 class 等，可不用 attrs 传递，直接传入 id=‘list-1’ 或 class_=‘element’ （因为 class 是 Python 里的关键字，因此需要加上 _ ）

from bs4 import BeautifulSoup

html='''

    
        Hello
    
    
        
            Foo
            Bar
            Jay
        
        
            Foo
            Bar
        
    

'''

soup = BeautifulSoup(html, 'lxml')
print(soup.find_all(attrs={'id': 'list-1'}), '\n')
print(soup.find_all(id='list-1'), '\n')
print(soup.find_all(attrs={'class': 'element'}), '\n')
print(soup.find_all(class_='element'))

[
Foo
Bar
Jay
] 

[
Foo
Bar
Jay
] 

[Foo
, Bar
, Jay
, Foo
, Bar] 

[Foo
, Bar
, Jay
, Foo
, Bar]

text:
- text 参数可用来匹配节点的文本，传入的形式可以是字符串或正则表达式对象：

import re
from bs4 import BeautifulSoup

html='''

    
        Hello, this is a link
        Hello, this is a link, too
    

'''

soup = BeautifulSoup(html, 'lxml')
print(soup.find_all(text=re.compile('link')))

['Hello, this is a link', 'Hello, this is a link, too']

find()：find() 返回的是单个元素，也就是第一个匹配的元素
其他查询方法：
- find_parents() 和 find_parent()：前者返回所有祖先节点，后者返回直接父节点
- find_next_siblings() 和 find_next_sibling()：前者返回后面所有的兄弟节点，后者返回后面第一个兄弟节点
- find_previous_siblings() 和 find_next_sibling()：前者返回前面所有的兄弟节点，后者返回前面第一个兄弟节点
- find_all_next() 和 find_next()：前者返回节点后所有符合条件的节点，后者返回第一个符合条件的节点
- find_all_previous() 和 find_previous():前者返回节点前所有符合条件的节点，后者返回第一个符合条件的节点

2.5 CSS 选择器

使用 CSS 选择器时，只需要调用 select() 方法，传入相应的 CSS 选择器即可：

from bs4 import BeautifulSoup

html='''

    
        Hello
    
    
        
            Foo
            Bar
            Jay
        
        
            Foo
            Bar
        
    

'''

soup = BeautifulSoup(html, 'lxml')
print(soup.select('.panel .panel-heading'))    # 选择 class 为 panel 中 class 为 panel-heading 的节点
print(soup.select('ul li'))                    # 选择 ul 节点里的 li 节点
print(soup.select('#list-2 .element'))         # 选择 id 为 list-2 中 class 为 element 的节点
print(type(soup.select('ul')[0]))

[
Hello
]
[Foo
, Bar
, Jay
, Foo
, Bar]
[Foo
, Bar]

嵌套选择：

for ul in soup.select('ul'):
    print(ul.select('li'))

[Foo
, Bar
, Jay]
[Foo
, Bar]

获取属性

for ul in soup.select('ul'):
    print(ul['id'])
    print(ul.attrs['id'])

list-1
list-1
list-2
list-2

获取文本：除了 string 属性，还有一个 get_text() 方法

for li in soup.select('li'):
    print('Get Text:', li.get_text())
    print('String:', li.string)

Get Text: Foo
String: Foo
Get Text: Bar
String: Bar
Get Text: Jay
String: Jay
Get Text: Foo
String: Foo
Get Text: Bar
String: Bar

3. 使用 pyquery

3.1 初始化

与 Beautiful Soup 一样，初始化 pyquery 时，也需要传入 HTML 文本来初始化一个 PyQuery 对象，初始化方式有多种，比如直接传入字符串，传入 URL，传入文件名等

字符串初始化：将 HTML 字符串直接传递给 PyQuery 类，然后将初始化的对象传入 CSS 选择器

from pyquery import PyQuery as pq

html = '''

    
         first item
         second item
         third item
         fourth item
         fifth item
     
 
'''

doc = pq(html)
print(doc('li'))

first item
         second item
         third item
         fourth item
         fifth item

URL 初始化：将网页的 URL 传入对象 PyQuery，对象会首先请求这个 URL，然后用得到的 HTML 内容完成初始化：

from pyquery import PyQuery as pq

doc = pq(url='https://www.taobao.com')
print(doc('title'))

# 如下代码等效
from pyquery import PyQuery as pq
import requests
doc = pq(requests.get('https://www.taobao.com').text)
print(doc('title'))

淘宝网 - 淘！我喜欢

淘宝网 - 淘！我喜欢

文件初始化：将本地文件传入对象：

from pyquery import PyQuery as pq

doc = pq(filename='demo.html')
print(doc('title'))

This is a Demo

3.2 基本 CSS 选择器

from pyquery import PyQuery as pq
html = '''

    
         first item
         second item
         third item
         fourth item
         fifth item
     
 
'''

doc = pq(html)
print(doc('#container .list li'))
print(type(doc('#container .list li')))

first item
         second item
         third item
         fourth item
         fifth item

3.3 查找节点

子节点

查找子孙节点使用 find() 方法，传入的参数是 CSS 选择器
查找直接子节点使用 children() 方法

from pyquery import PyQuery as pq
html = '''

    
         first item
         second item
         third item
         fourth item
         fifth item
     
 
'''
doc = pq(html)
items = doc('#container')
print(type(items))
print(items)

# 所有子孙节点
lis = items.find('ul, li')
print(type(lis))
print(lis)

# 直接子节点
lis = items.children('.list')
print(type(lis))
print(lis)



    
         first item
         second item
         third item
         fourth item
         fifth item
     
 


         first item
         second item
         third item
         fourth item
         fifth item
     
 first item
         second item
         third item
         fourth item
         fifth item
     


         first item
         second item
         third item
         fourth item
         fifth item

父节点：

使用 parent() 方法，获取直接父节点
使用 parents() 方法，获取祖先节点

from pyquery import PyQuery as pq
html = '''

    
        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
 
'''
doc = pq(html)
items = doc('.list')

# 直接父节点
container = items.parent()
print(type(container))
print(container, '\n')

# 所有祖先节点
parents = items.parents()
print(type(parents))
print(parents, '\n')

# class 为 wrap 的祖先节点
parent = items.parents('.wrap')
print(type(parent))
print(parent)



        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
  



    
        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
 

        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
  



    
        
             first item
             second item
             third item
             fourth item
             fifth item

兄弟节点：

兄弟节点使用 siblings() 方法

from pyquery import PyQuery as pq
html = '''

    
        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
 
'''
doc = pq(html)
li = doc('.list .item-0.active')

# 所有兄弟节点
print(li.siblings(), '\n')

# 筛选后的兄弟节点
print(li.siblings('.active'))

second item
             first item
             fourth item
             fifth item
          

fourth item

3.4 遍历

pyquery 选择的结果可能是单个节点，也可能是多个节点，对于单个节点，可以直接打印输出，也可以转为字符串；对于多个节点，需要使用 items() 方法得到生成器后进行遍历

from pyquery import PyQuery as pq
html = '''

    
        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
 
'''
doc = pq(html)

# 单个节点
li = doc('.item-0.active')
print(li)
print(str(li))

# 多个节点
lis = doc('li').items()
print(type(lis))
for li in lis:
    print(str(li).strip(), type(li), sep='\n')

third item
             
third item
             

first item

second item

third item

fourth item

fifth item

3.5 获取信息

获取属性

获取到某个 PyQuery 类型的节点后，可调用 attr() 方法获取其属性，也可调用 atrr 属性来获取
如果选中的是多个元素，直接调用 attr() 方法，会得到第一个元素的属性，要得到所有元素的属性，需要进行遍历

from pyquery import PyQuery as pq

html = '''

    
        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
 
'''

doc = pq(html)

# 单个元素
a = doc('.item-0.active a')
print(a, type(a))
print(a.attr('href'))
print(a.attr.href)
print('\n')
# 多个元素
a = doc('a')
for item in a.items():
    print(item.attr('href'))

third item 
link3.html
link3.html


link2.html
link3.html
link4.html
link5.html

获取文本

获取纯文本：text() 方法，对于多节点来说，会返回所有选中节点的纯文本，中间用空格分开，合成一个字符串，不用进行遍历
获取 HTML 文本：html() 方法，对于多节点需要进行遍历

from pyquery import PyQuery as pq

html = '''

    
        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
 
'''

doc = pq(html)

# 单个节点
a = doc('.item-0.active a')
print(a)
print(a.text())    # text()
print('\n')
print(a.html())    # html()
print('\n')

# 多个节点
li = doc('li')
print(li)
print(li.text())
print('\n')
for item in li.items():
    print(item.html())

third item
third item


third item


first item
             second item
             third item
             fourth item
             fifth item
         
first item second item third item fourth item fifth item


first item
second item
third item
fourth item
fifth item

3.6 节点操作

pyquery 提供了一系列方法来对节点进行动态修改，比如为某个节点添加一个 class，移除某个节点等，如下为一些例子

addClass 和 removeClass，前者添加 class，后者删除 class

from pyquery import PyQuery as pq

html = '''

    
        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
 
'''

doc = pq(html)
li = doc('.item-0.active')
print(li)
li.removeClass('active')
print(li)
li.addClass('active')
print(li)

third item
             
third item
             
third item

attr, text 和 html：attr() 方法可以对属性进行操作，其第一个参数是属性名，第二个参数是属性值；text() 和 html() 方法可以改变节点的内容

from pyquery import PyQuery as pq

html = '''

     third item

'''

doc = pq(html)
li = doc('.item-0.active')
print(li)
li.attr('name', 'link')
print(li)
li.text('changed item')
print(li)
li.html('changed item')
print(li)

third item

third item

changed item

changed item

remove：移除节点

from pyquery import PyQuery as pq

html = '''

    Hello, World
    This is a paragraph.
 
'''

doc = pq(html)
wrap = doc('.wrap')
print(wrap.text())
# 只要 Hello, World，不要 p 节点里的内容
wrap.find('p').remove()
print(wrap.text())

Hello, World
This is a paragraph.
Hello, World

其他方法如 append(), empty(), prepend() 可参考官方文档 http://pyquery.readthedocs.io/en/latest/api.html

3.7 伪类选择器

CSS 选择器还支持多种伪类选择器，如选择第一个节点、最后一个节点、奇偶数节点、包含某一文本的节点等：

from pyquery import PyQuery as pq

html = '''

    
        
             first item
             second item
             third item
             fourth item
             fifth item
         
     
 
'''

doc = pq(html)
# 第一个 li 节点
li = doc('li:first-child')
print('第一个 li 节点', li)
# 最后一个 li 节点
li = doc('li:last-child')
print('最后一个 li 节点', li)
# 第二个 li 节点
li = doc('li:nth-child(2)')
print('第二个 li 节点', li)
# 第三个 li 之后的 li 节点
li = doc('li:gt(2)')
print('第三个 li 之后的 li 节点', li)
# 偶数位置的 li 节点
li = doc('li:nth-child(2n)')
print('偶数位置的 li 节点', li)
# 包含 second 的 li 节点
li = doc('li:contains(second)')
print('包含 second 的 li 节点', li)

第一个 li 节点 first item
             
最后一个 li 节点 fifth item
         
第二个 li 节点 second item
             
第三个 li 之后的 li 节点 fourth item
             fifth item
         
偶数位置的 li 节点 second item
             fourth item
             
包含 second 的 li 节点 second item

你可能感兴趣的:(爬虫学习)

爬虫学习--1.前导知识 F—— 爬虫爬虫学习开发语言 python
初始爬虫前言引入随着大数据时代的来临，网络爬虫在互联网中的地位将越来越重要。互联网中的数据是海量的，如何自动高效地获取互联网中我们感兴趣的信息并为我们所用是一个重要的问题，而爬虫技术就是为了解决这些问题而生的。我们感兴趣的信息分为不同的类型：如果只是做搜索引擎，那么感兴趣的信息就是互联网中尽可能多的高质量网页；如果要获取某一垂直领域的数据或者有明确的检索需求，那么感兴趣的信息就是根据我们的检索和需
爬虫学习--14.进程与线程 F—— 爬虫-限免爬虫学习 python
什么是进程？电脑中时会有很多单独运行的程序，每个程序有一个独立的进程，而进程之间是相互独立存在的。比如下标中的QQ播放器、小鹅通等等。什么是线程？进程可以简单的理解为一个可以独立运行的程序单位，它是线程的集合，进程就是有一个或多个线程构成的。而线程是进程中的实际运行单位，是操作系统进行运算调度的最小单位。可理解为线程是进程中的一个最小运行单元。什么是多进程?同理，多进程就是指计算机同时执行多个进程
python电影评价分析_用 Python 分析豆瓣电影 TOP250 weixin_39806413 python电影评价分析
用Python分析豆瓣电影TOP250既然要分析豆瓣电影TOP250,那么肯定就要把相关的数据采集下来,比如排名,电影名,导演,主演等信息.那就肯定使用一下爬虫咯,如果还不会的话,欢迎看之前的文章:Python爬虫学习(一)概述Python爬虫学习(二)urllib基础使用Python爬虫学习(三)urllib进阶使用Python爬虫学习(四)正则表达式Python爬虫学习(五)爬取电影排行榜及其
Python爬虫学习——爬取小说章节一大块肥皂 Python爬虫 python 爬虫
之前学了Python好久都没有用，感觉再不继续学就要忘了。。。赶紧再挖个坑继续学习。这个部分会用Python去做爬虫来进行学习，巩固python的知识。爬虫的教程看的是Jack-Cui大佬的文章。这一次是跟着大佬学习：Python3网络爬虫（二）：下载小说的正确姿势（2020年最新版）_Jack-Cui-CSDN博客练习-爬取章节前面的爬虫基础部分就看大佬的上一篇博文，讲的非常棒：Python3网
手机Python爬虫教程：利用手机学习Python爬虫的终极指南一只会写程序的猫 Python 智能手机 python 爬虫
【引言】在数字化时代，手机已经成为人们生活中不可或缺的一部分。而Python爬虫作为一种强大的数据获取工具，也受到越来越多人的关注。但是，是否可以利用手机进行Python爬虫学习呢？本文将介绍如何通过手机学习Python爬虫，为你打开一扇全新的学习之门。【一、手机学习资源】1.《Python爬虫入门教程》（手机应用）这款手机应用程序提供了Python爬虫的基础知识和实例讲解，适合初学者使用。你可以
python 爬虫学习 lally. python 爬虫学习
目录requst库访问HTML语言常用HTML标签结构性标签文本格式化标签超链接与图像列表标签HTML练习BeautifulSoup处理数据requst库访问fromrequestsimport*response=get("https://19j.tv/")print(response)若访问成功，状态码为200，访问失败，则查询状态码，http和https的状态码是一样的http状态码可以采取伪
Python爬虫：从入门到实践来恩1003 Python爬虫 python 爬虫开发语言
Python爬虫学习资料Python爬虫学习资料Python爬虫学习资料在当今数字化信息爆炸的时代，数据已成为企业和个人发展的重要资产。Python爬虫作为一种高效获取网络数据的工具，正逐渐被广大开发者所熟知和应用。无论是市场调研、学术研究，还是数据分析，Python爬虫都能发挥巨大作用。本文将带你从基础概念出发，逐步深入到爬虫的实战应用，助你掌握这一强大的数据获取技能。一、爬虫基础：开启数据获取
python爬虫心得_python爬虫学习心得 weixin_39941721 python爬虫心得
爬虫新手一枚，因为工作原因需要学习相关的东西。发表下这段时间学习的心得，有说得不对的地方欢迎指指点点。一.什么是爬虫在学习爬虫之前只对爬虫有个概念性的认识。通过向服务器发送请求获取服务器传回信息，再根据其提取所需的信息。原理虽然简单，但是涉及的细节非常多，从一个坑爬出来又掉进另一个坑。二.post和getpost和get是两种向服务器发送请求的方式，有些http基础的同学应该都清楚他们的用处，在写
Python大数据之Python爬虫学习总结——day16 数据可视化笨小孩124 Python爬虫学习总结信息可视化 python 大数据
数据可视化Map_地图基础地图知识点:基础示例:实战练习:知识点:自定义模块:制作中国地图data1.txt文件内容python代码示例制作区域地图data2.txt文件内容python代码示例Line_折线图基础折线图实战练习:Bar_柱状图基础柱状图反转以及主题设置Json数据python数据转为json数据知识点:示例:json数据转为python数据知识点:json文件:示例:Map_地图
python爬虫要不要学正则_Python爬虫学习（四）正则表达式 weixin_39583751 python爬虫要不要学正则
经过前面的学习之后，大家现在应该可以顺利地得到一个网页源码字符串，对于Python中的字符串，Python提供了很多操作，大家可以其去尝试提取网页源码字符串中想要的信息。在这里，给大家推荐的是正则表达式!文章最后还有爬取糗事百科的实例哦！什么是正则表达式说白了，正则表达式就是描述我们需要提取的那部分信息的规则的工具。举个栗子，比如，我们想要提取'Stayhungry,123stayfoolish!
爬虫学习4：爬取技能信息夜清寒风爬虫网络爬虫 pycharm 学习 python
爬虫：爬取技能信息（代码和代码流程）代码importtimefromseleniumimportwebdriverfromselenium.webdriver.common.byimportByif__name__=='__main__':fp=open("./honorKing.txt","w",encoding='utf8')#1、urlurl=""#页面url#2、发送请求driver=we
python爬虫学习小叶丶
Python爬虫(1):基本原理Python爬虫(2):Requests的基本用法Python爬虫(3):Requests的高级用法Python爬虫(4):BeautifulSoup的常用方法Python爬虫(5):豆瓣读书练手爬虫Python爬虫(6):煎蛋网全站妹子图爬虫Python爬虫(7):多进程抓取拉钩网十万数据Python爬虫(8):分析Ajax请求爬取果壳网Python爬虫(9):C
爬虫学习笔记-scrapy链接提取器爬取读书网链接写入MySQL数据库 DevCodeMemo 爬虫学习笔记
1.终端运行scrapystartprojectscrapy_read,创建项目2.登录读书网,选择国学(随便点一个)3.复制链接(后面修改为包括其他页)4.创建爬虫文件,并打开5.滑倒下方翻页处,右键2,点击检查,查看到a标签网址,复制6.修改爬虫文件规则allow(正则表达式),'\d'表示数字,'+'表示多个,'\.'使'.'生效7.在parse_item中编写打印,scrapycrawlr
爬虫学习笔记-scrapy爬取电影天堂(双层网址嵌套) DevCodeMemo 爬虫学习笔记
1.终端运行scrapystartprojectmovie,创建项目2.接口查找3.终端cd到spiders,cdscrapy_carhome/scrapy_movie/spiders,运行scrapygenspidermvhttps://dy2018.com/4.打开mv,编写代码,爬取电影名和网址5.用爬取的网址请求,使用meta属性传递name,callback调用自定义的parse_sec
爬虫学习笔记-scrapy爬取当当网 DevCodeMemo 爬虫学习笔记
1.终端运行scrapystartprojectscrapy_dangdang,创建项目2.接口查找3.cd100个案例/Scrapy/scrapy_dangdang/scrapy_dangdang/spiders到文件夹下,创建爬虫程序4.items定义ScrapyDangdangItem的数据结构(要爬取的数据)src,name,price5.爬取src,name,price数据导入items
Python爬虫学习曹博Blog Python python 爬虫学习
1.1搭建爬虫程序开发环境爬取未来七天天气预报frombs4importBeautifulSoupfrombs4importUnicodeDammitimporturllib.requesturl="http://www.weather.com.cn/weather/101120901.shtml"try:headers={"User-Agent":"Mozilla/5.0(WindowsNT10
python爬虫学习day2—百度翻译 2401_82964032 爬虫学习 python 百度
##第零步安装requests库以及了解AJAX请求##第一步打开百度翻译网址，随便输入一个英文单词，我们可以发现网页进行了局部刷新，而非整体性的，因此我们可以猜测，这是一个AJAX请求。##第二步F12打开控制台，点击网络(network)，因为我们已经猜测这是一个AJAX请求，因此我们选择XHR(实现网页得局部刷新)或者叫Fetch/XHR。然后输入一个英文单词，例如write。我们挨个点击，
python爬虫学习day3—KFC肯德基餐厅信息查询 2401_82964032 爬虫学习 beautifulsoup
##第零步安装requests库以及了解AJAX请求##第一步打开肯德基餐厅信息查询(kfc.com.cn)随便输入一个地址后发现页面没有整体刷新，并且点击下一页页面也仍然是局部刷新，因此判断是AJAX请求。##第二步F12打开控制台，点击网络(network)，选择XHR(实现网页得局部刷新)或者叫Fetch/XHR。选择一个地址后，我们可以得到点击后我们可以得到：其url为https://ww
python爬虫学习day1—Books to Scrape 2401_82964032 python beautifulsoup
##第零步安装requests库与BeautifulSoup库，以及学习一点点html知识##第一步导入requests库与BeautifulSoup库importrequestsfrombs4importBeautifulSoup##第三步查看网站是否有反爬机制如果有可以选择伪装浏览器headers={"User-Agent":"自己浏览器的标识"}按F12找到网络（network）然后刷新网页
python爬虫beautifulsoup实例-Python爬虫学习（二）使用Beautiful Soup库 weixin_37988176
（一）使用BeautifulSoup库（默认将HTML转换为utf-8编码）1，安装BeautifulSoup库：pipinstallbeautifulsoup42，简单使用：importrequests;from_socketimporttimeoutfrombs4importBeautifulSoup#使用BeautifulSoup库需要导包#fromaifcimportdatadefgetH
速看，关于Python的17个学习网站，从基础到机器学习【建议收藏】帅帅的Python python 学习机器学习
目录一、基础学习网站Python官方教程Python官方安装包地址PyCharm下载地址anaconda3清华开源下载地址二、爬虫学习网站requests官方学习网站BeautifulSoup文档网站selenium官方学习网站scrapy中文学习网站三、数据分析学习网站numpy官方文档网站pandas官方文档网站sklearn官方文档网站四、数据可视化学习网站matplotlib官方学习网站p
PYthon进阶--网页采集器(基于百度搜索的Python3爬虫程序) 在猴站学算法 python 百度爬虫
简介：基于百度搜索引擎的PYthon3爬虫程序的网页采集器，小白和爬虫学习者都可以学会。运行爬虫程序，输入关键词，即可将所搜出来的网页内容保存在本地。知识点：requests模块的get方法一、此处需要安装第三方库requests:在Pycharm平台终端或者命令提示符窗口中输入以下代码即可安装pipinstallrequests二、抓包分析及编写Python代码1、打开百度搜索进行抓包分析打开百
python爬虫学习步骤和推荐资料 suoge223 python 爬虫学习
学习Python爬虫是一项非常实用的技能，可以帮助你获取网络上的数据，进行信息抓取和分析。以下是一系列学习步骤和对应的参考资料，帮助你入门和深入学习Python爬虫。###学习步骤：####Step1:基础Python编程在学习爬虫之前，首先要确保你对基础的Python语法有一定的了解。参考资料：-[Python官方文档](https://docs.python.org/3/)-[w3school
爬虫学习笔记-scrapy爬取汽车之家 DevCodeMemo 爬虫学习笔记
1.终端运行scrapystartprojectscrapy_carhome,创建项目2.接口查找3.终端cd到spiders,cdscrapy_carhome/scrapy_carhome/spiders,运行scrapygenspideraudihttps://car.autohome.com.cn/price/brand-33.html4.打开audi,编写代码,xpath获取页面车型价格列
Python爬虫学习之scrapy库蜀道之南718 python 爬虫学习笔记 scrapy
一、scrapy库安装pipinstallscrapy-ihttps://pypi.douban.com/simple二、scrapy项目的创建1、创建爬虫项目打开cmd输入scrapystartproject项目的名字注意:项目的名字不允许使用数字开头也不能包含中文2、创建爬虫文件要在spiders文件夹中去创建爬虫文件cd项目的名字\项目的名字\spiderscdscrapy_baidu_09
python中用scrapy框架创建项目小沙弥哥
最近在学scrapy框架进行简单爬虫学习，在此简单回顾一下创建项目流程思路。首先你的安装scrapy运行环境，在此省略，不懂可以百度。第一步：创建项目在运行环境按住shift键，单击右键选择【在此打开命令窗口】，打开cmd命令框，输入命令：scrapystartprojectqsbk,如下图：第二步创建爬虫，根据提示进入qsbk目录下输入“scrapygenspiderqsbk_spider”，成
爬虫学习笔记-scrapy安装及第一个项目创建问题及解决措施 DevCodeMemo 爬虫学习笔记
1.安装scrapypycharm终端运行pipinstallscrapy-ihttps://pypi.douban.com/simple2.终端运行scrapystartprojectscrapy_baidu,创建项目问题1:lxml版本低导致无法找到解决措施:更新或者重新安装lxml3.项目创建成功4.终端cd到项目的spiders文件夹下,cdscrapy_baidu\scrapy_baid
Python大牛写的爬虫学习路线，分享给大家！ IT青年
今天给大家带来我的python爬虫学习路线，供大家参考！第一步，学会自己安装python、库和你的编辑器并设置好它我们学习python的最终目的是要用它来达到我们的目的，它本身是作为工具的存在，我们一定要掌握自己的工具的各类设置，比如安装、环境配置、库的安装，编辑器的设置等等。当然也可以用比如Anaconda来管理你的版本和各种库！为了帮助大家更轻松的学好Python开发，爬虫技术，Python数
爬虫学习：搜狗简易网页采集器 unravel_tom 爬虫学习爬虫学习
#搜狗简易网页采集器importrequests#请求参数动态化keyword=input('请输入关键字:')#如果请求失败，那就是模仿的力度不够，第一次我未加请求头中的headers,导致搜索404headers={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrom
爬虫学习：下厨房的菜谱搜索 unravel_tom 爬虫学习爬虫学习
#下厨房的菜谱搜索(多个请求参数)，注：只支持搜索功能，不具备多页爬取功能importrequests#请求头headers={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/121.0.0.0Safari/537.36Edg/121.0.0.0'}title=i
java数字签名三种方式知了ing java jdk
以下3钟数字签名都是基于jdk7的 1，RSA String password="test"; // 1.初始化密钥 KeyPairGenerator keyPairGenerator = KeyPairGenerator.getInstance("RSA"); keyPairGenerator.initialize(51
Hibernate学习笔记 caoyong Hibernate
1>、Hibernate是数据访问层框架，是一个ORM(Object Relation Mapping)框架，作者为:Gavin King 2>、搭建Hibernate的开发环境 a>、添加jar包: aa>、hibernatte开发包中/lib/required/所
设计模式之装饰器模式Decorator（结构型）漂泊一剑客 Decorator
1. 概述若你从事过面向对象开发，实现给一个类或对象增加行为，使用继承机制，这是所有面向对象语言的一个基本特性。如果已经存在的一个类缺少某些方法，或者须要给方法添加更多的功能（魅力），你也许会仅仅继承这个类来产生一个新类—这建立在额外的代码上。
读取磁盘文件txt，并输入String 一炮送你回车库 String
public static void main(String[] args) throws IOException { String fileContent = readFileContent("d:/aaa.txt"); System.out.println(fileContent);
js三级联动下拉框 3213213333332132 三级联动
//三级联动省/直辖市<select id="province"></select> 市/省直辖<select id="city"></select> 县/区 <select id="area"></select>
erlang之parse_transform编译选项的应用 616050468 parse_transform 游戏服务器属性同步 abstract_code
最近使用erlang重构了游戏服务器的所有代码，之前看过C++/lua写的服务器引擎代码，引擎实现了玩家属性自动同步给前端和增量更新玩家数据到数据库的功能，这也是现在很多游戏服务器的优化方向，在引擎层面去解决数据同步和数据持久化，数据发生变化了业务层不需要关心怎么去同步给前端。由于游戏过程中玩家每个业务中玩家数据更改的量其实是很少
JAVA JSON的解析 darkranger java
// { // “Total”：“条数”， // Code: 1, // // “PaymentItems”:[ // { // “PaymentItemID”:”支款单ID”, // “PaymentCode”:”支款单编号”, // “PaymentTime”:”支款日期”, // ”ContractNo”:”合同号”， //
POJ-1273-Drainage Ditches aijuans ACM_POJ
POJ-1273-Drainage Ditches http://poj.org/problem?id=1273 基本的最大流，按LRJ的白书写的 #include<iostream> #include<cstring> #include<queue> using namespace std; #define INF 0x7fffffff int ma
工作流Activiti5表的命名及含义 atongyeye 工作流 Activiti
activiti5 - http://activiti.org/designer/update在线插件安装 activiti5一共23张表 Activiti的表都以ACT_开头。第二部分是表示表的用途的两个字母标识。用途也和服务的API对应。 ACT_RE_*: 'RE'表示repository。这个前缀的表包含了流程定义和流程静态资源（图片，规则，等等）。 A
android的广播机制和广播的简单使用百合不是茶 android 广播机制广播的注册
Android广播机制简介在Android中，有一些操作完成以后，会发送广播，比如说发出一条短信，或打出一个电话，如果某个程序接收了这个广播，就会做相应的处理。这个广播跟我们传统意义中的电台广播有些相似之处。之所以叫做广播，就是因为它只负责“说”而不管你“听不听”，也就是不管你接收方如何处理。另外，广播可以被不只一个应用程序所接收，当然也可能不被任何应
Spring事务传播行为详解 bijian1013 java spring 事务传播行为
在service类前加上@Transactional，声明这个service所有方法需要事务管理。每一个业务方法开始时都会打开一个事务。 Spring默认情况下会对运行期例外(RunTimeException)进行事务回滚。这
eidtplus operate 征客丶 eidtplus
开启列模式: Alt+C 鼠标选择 OR Alt+鼠标左键拖动列模式替换或复制内容(多行): 右键-->格式-->填充所选内容-->选择相应操作 OR Ctrl+Shift+V(复制多行数据,必须行数一致) -------------------------------------------------------
【Kafka一】Kafka入门 bit1129 kafka
这篇文章来自Spark集成Kafka(http://bit1129.iteye.com/blog/2174765)，这里把它单独取出来，作为Kafka的入门吧下载Kafka http://mirror.bit.edu.cn/apache/kafka/0.8.1.1/kafka_2.10-0.8.1.1.tgz 2.10表示Scala的版本，而0.8.1.1表示Kafka
Spring 事务实现机制 BlueSkator spring 代理事务
Spring是以代理的方式实现对事务的管理。我们在Action中所使用的Service对象，其实是代理对象的实例，并不是我们所写的Service对象实例。既然是两个不同的对象，那为什么我们在Action中可以象使用Service对象一样的使用代理对象呢？为了说明问题，假设有个Service类叫AService，它的Spring事务代理类为AProxyService，AService实现了一个接口
bootstrap源码学习与示例：bootstrap-dropdown（转帖） BreakingBad bootstrap dropdown
bootstrap-dropdown组件是个烂东西，我读后的整体感觉。一个下拉开菜单的设计： <ul class="nav pull-right"> <li id="fat-menu" class="dropdown">
读《研磨设计模式》-代码笔记-中介者模式-Mediator bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /* * 中介者模式（Mediator）：用一个中介对象来封装一系列的对象交互。 * 中介者使各对象不需要显式地相互引用，从而使其耦合松散，而且可以独立地改变它们之间的交互。 * * 在我看来，Mediator模式是把多个对象（
常用代码记录 chenjunt3 UI Excel J#
1、单据设置某行或某字段不能修改 //i是行号,"cash"是字段名称 getBillCardPanelWrapper().getBillCardPanel().getBillModel().setCellEditable(i, "cash", false); //取得单据表体所有项用以上语句做循环就能设置整行了 getBillC
搜索引擎与工作流引擎 comsci 算法工作搜索引擎网络应用
最近在公司做和搜索有关的工作，(只是简单的应用开源工具集成到自己的产品中)工作流系统的进一步设计暂时放在一边了，偶然看到谷歌的研究员吴军写的数学之美系列中的搜索引擎与图论这篇文章中的介绍，我发现这样一个关系(仅仅是猜想) -----搜索引擎和流程引擎的基础--都是图论，至少像在我在JWFD中引擎算法中用到的是自定义的广度优先
oracle Health Monitor daizj oracle Health Monitor
About Health Monitor Beginning with Release 11g, Oracle Database includes a framework called Health Monitor for running diagnostic checks on the database. About Health Monitor Checks Health M
JSON字符串转换为对象 dieslrae java json
作为前言,首先是要吐槽一下公司的脑残编译部署方式,web和core分开部署本来没什么问题,但是这丫居然不把json的包作为基础包而作为web的包,导致了core端不能使用,而且我们的core是可以当web来用的(不要在意这些细节),所以在core中处理json串就是个问题.没办法,跟编译那帮人也扯不清楚,只有自己写json的解析了.
C语言学习八结构体，综合应用，学生管理系统 dcj3sjt126com C语言
实现功能的代码： # include <stdio.h> # include <malloc.h> struct Student { int age; float score; char name[100]; }; int main(void) { int len; struct Student * pArr; int i,
vagrant学习笔记 dcj3sjt126com vagrant
想了解多主机是如何定义和使用的, 所以又学习了一遍vagrant 1. vagrant virtualbox 下载安装 https://www.vagrantup.com/downloads.html https://www.virtualbox.org/wiki/Downloads 查看安装在命令行输入vagrant 2.
14.性能优化-优化-软件配置优化 frank1234 软件配置性能优化
1.Tomcat线程池修改tomcat的server.xml文件： <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" maxThreads="1200" m
一个不错的shell 脚本教程入门级 HarborChung linux shell
一个不错的shell 脚本教程入门级建立一个脚本　　Linux中有好多中不同的shell，但是通常我们使用bash (bourne again shell) 进行shell编程，因为bash是免费的并且很容易使用。所以在本文中笔者所提供的脚本都是使用bash（但是在大多数情况下，这些脚本同样可以在 bash的大姐，bourne shell中运行）。　　如同其他语言一样
Spring4新特性——核心容器的其他改进 jinnianshilongnian spring 动态代理 spring4 依赖注入
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
Linux设置tomcat开机启动 liuxingguome tomcat linux 开机自启动
执行命令sudo gedit /etc/init.d/tomcat6 然后把以下英文部分复制过去。（注意第一句#!/bin/sh如果不写，就不是一个shell文件。然后将对应的jdk和tomcat换成你自己的目录就行了。 #!/bin/bash # # /etc/rc.d/init.d/tomcat # init script for tomcat precesses
第13章 Ajax进阶（下） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Troubleshooting Crystal Reports off BW blueoxygen BO
http://wiki.sdn.sap.com/wiki/display/BOBJ/Troubleshooting+Crystal+Reports+off+BW#TroubleshootingCrystalReportsoffBW-TracingBOE Quite useful, especially this part: SAP BW connectivity For t
Java开发熟手该当心的11个错误 tomcat_oracle java jvm 多线程单元测试
#1、不在属性文件或XML文件中外化配置属性。比如，没有把批处理使用的线程数设置成可在属性文件中配置。你的批处理程序无论在DEV环境中，还是UAT（用户验收测试）环境中，都可以顺畅无阻地运行，但是一旦部署在PROD 上，把它作为多线程程序处理更大的数据集时，就会抛出IOException，原因可能是JDBC驱动版本不同，也可能是#2中讨论的问题。如果线程数目可以在属性文件中配置，那么使它成为
正则表达式大全 yang852220741 html 编程正则表达式
今天向大家分享正则表达式大全，它可以大提高你的工作效率正则表达式也可以被当作是一门语言，当你学习一门新的编程语言的时候，他们是一个小的子语言。初看时觉得它没有任何的意义，但是很多时候，你不得不阅读一些教程，或文章来理解这些简单的描述模式。一、校验数字的表达式数字：^[0-9]*$ n位的数字：^\d{n}$ 至少n位的数字：^\d{n,}$ m-n位的数字：^\d{m,n}$