Spider_权威指南_ch02_01

# 本节内容:
# 解析复杂的 HTML网页:
# 1--bs.find()  bs.find_all()  tag.get_text()
# find_all(tag/tag_list,attributes_dict,recursive,text,limit,keywords)  
#     find(tag/tag_list,attributes_dict,recursive,text,keywords)

# 2--CSS选择器(导航树): 一般与 bs.find() bs.find_all()搭配使用 
# tag.children   tag.descendants  tag.next_siblings   tag.previous_siblings  tag.parent

# 3--BeautifulSoup对象: 
# beautifulsoup对象  bs
# Tag对象(包含单个Tag或者 Tag列表)
# NavigableString 对象    表示标签里的文字,而不是标签本身
# Comment对象 用来查找 HTML 文档的注释标签,
# 解析复杂的 html网页时,我们使用 beautifulsoup利用 css的样式属性可以轻松地区分出不同的标签来:
#  bs.find()   bs.findall()   tag.get_text()
 
# 一,引子:
import requests
from requests import exceptions
from bs4 import BeautifulSoup

html = requests.get('http://www.pythonscraping.com/pages/warandpeace.html')
bs = BeautifulSoup(html.text, 'html.parser')
# print(bs)
nameList = bs.findAll('span', {'class': 'green'})   # bs.findall(tag/tag_list,attributes_dict) 返回以 满足条件的 tag的列表
for name in nameList:
    print(name.get_text())                          # tag.get_text()  最后使用 get_text(),一般情况下我们保留 HTML的标签结构
Anna
Pavlovna Scherer
Empress Marya
Fedorovna
Prince Vasili Kuragin
Anna Pavlovna
St. Petersburg
the prince
Anna Pavlovna
Anna Pavlovna
the prince
the prince
the prince
Prince Vasili
Anna Pavlovna
Anna Pavlovna
the prince
Wintzingerode
King of Prussia
le Vicomte de Mortemart
Montmorencys
Rohans
Abbe Morio
the Emperor
the prince
Prince Vasili
Dowager Empress Marya Fedorovna
the baron
Anna Pavlovna
the Empress
the Empress
Anna Pavlovna's
Her Majesty
Baron
Funke
The prince
Anna
Pavlovna
the Empress
The prince
Anatole
the prince
The prince
Anna
Pavlovna
Anna Pavlovna
# 二,通过标签的名称和属性来查找标签:

# bs.findall()与 bs.find()  (后者相当于前者 limit=1的情况)

# find_all(tag/tag_list,attributes_dict,recursive,text,limit,keywords)  
#     find(tag/tag_list,attributes_dict,recursive,text,keywords)

# tag/tag_list (标签或标签列表)-- 如:‘span’ 或 ['h1','h2','p']
# attributes_dict (属性字典)-- 如: {'class':'green'}  再如:{'class':{'green', 'red'}}  
# recursive (递归 )    -- 默认为 True---表示 查找指定的tag/tag_list及其子标签...
# text (文本参数 )     -- text=‘指定要查找的文本内容’  而不使用 标签的属性   返回的是 NavigableString,而不是标签对象。
# limit (限制匹配次数 )--注意是,按照网页上的顺序排序之后抓取指定的次数的标签,未必是你想要的那前几项。
# keywords--可以设置一个或多个 keyword来进一步限制匹配的标签,如 id='Tiltle' class_='green'等。 (为与python中的关键字区分,bs规定加个_)


# 示例 1:

titles = bs.find_all(['h1', 'h2','h3','h4','h5','h6'])
print([title for title in titles])   # [

War and Peace

,

Chapter 1

] prince=bs.find(text='the prince') print(type(prince)) # prince_list=bs.find_all(text='the prince') print(prince_list) print([prince for prince in prince_list])
[

War and Peace

,

Chapter 1

] ['the prince', 'the prince', 'the prince', 'the prince', 'the prince', 'the prince', 'the prince'] ['the prince', 'the prince', 'the prince', 'the prince', 'the prince', 'the prince', 'the prince']
# 示例 2:
allText = bs.find_all(id='title', class_='text')
print(allText)
print([text for text in allText])
[]
[]
# 三,BeautifulSoup对象:
# 1-beautifulsoup对象  bs
# 2-Tag对象(包含单个Tag或者 Tag列表)
# 3-NavigableString 对象    表示标签里的文字,而不是标签本身
# 4-Comment对象 用来查找 HTML 文档的注释标签,
# 四,导航树:子标签,后代标签,兄弟标签,父标签
# find_all()与find()是通过标签的名称和属性来查找标签,我们还可以通过标签的位置来查找:
# 1)单一方向: bs.tag.subtag.anothersubtag 
# 2) 导航树:纵向和横向导航

# 1-- 子标签: .children
import requests
from bs4 import BeautifulSoup

html = requests.get('http://www.pythonscraping.com/pages/page3.html')
bs = BeautifulSoup(html.text, 'html.parser')

for child in bs.find('table',{'id':'giftList'}).children:
    print(child)
    print('--------------------------------------------')

--------------------------------------------

Item Title

Description

Cost

Image

--------------------------------------------


--------------------------------------------

Vegetable Basket

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
Now with super-colorful bell peppers!

$15.00



--------------------------------------------


--------------------------------------------

Russian Nesting Dolls

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents!

$10,000.52



--------------------------------------------


--------------------------------------------

Fish Painting

If something seems fishy about this painting, it's because it's a fish! Also hand-painted by trained monkeys!

$10,005.00



--------------------------------------------


--------------------------------------------

Dead Parrot

This is an ex-parrot! Or maybe he's only resting?

$0.50



--------------------------------------------


--------------------------------------------

Mystery Box

If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. Keep your friends guessing!

$1.50



--------------------------------------------


--------------------------------------------
# 2-- 后代标签: .descendants 

import requests
from bs4 import BeautifulSoup

html = requests.get('http://www.pythonscraping.com/pages/page3.html')
bs = BeautifulSoup(html.text, 'html.parser')

for child in bs.find('table',{'id':'giftList'}).descendants:  # 查找第一个时,bs.table.tr 或 bs.tr也行,但不具体,如果网页变化,容易丢失
    print(child)
    print('--------------------------------------------')
--------------------------------------------

Item Title

Description

Cost

Image

--------------------------------------------

Item Title

--------------------------------------------

Item Title

--------------------------------------------

Description

--------------------------------------------

Description

--------------------------------------------

Cost

--------------------------------------------

Cost

--------------------------------------------

Image

--------------------------------------------

Image

--------------------------------------------


--------------------------------------------

Vegetable Basket

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
Now with super-colorful bell peppers!

$15.00



--------------------------------------------

Vegetable Basket

--------------------------------------------

Vegetable Basket

--------------------------------------------

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
Now with super-colorful bell peppers!

--------------------------------------------

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!

--------------------------------------------
Now with super-colorful bell peppers!
--------------------------------------------
Now with super-colorful bell peppers!
--------------------------------------------


--------------------------------------------

$15.00

--------------------------------------------

$15.00

--------------------------------------------



--------------------------------------------


--------------------------------------------

--------------------------------------------


--------------------------------------------


--------------------------------------------

Russian Nesting Dolls

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents!

$10,000.52



--------------------------------------------

Russian Nesting Dolls

--------------------------------------------

Russian Nesting Dolls

--------------------------------------------

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents!

--------------------------------------------

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 
--------------------------------------------
8 entire dolls per set! Octuple the presents!
--------------------------------------------
8 entire dolls per set! Octuple the presents!
--------------------------------------------


--------------------------------------------

$10,000.52

--------------------------------------------

$10,000.52

--------------------------------------------



--------------------------------------------


--------------------------------------------

--------------------------------------------


--------------------------------------------


--------------------------------------------

Fish Painting

If something seems fishy about this painting, it's because it's a fish! Also hand-painted by trained monkeys!

$10,005.00



--------------------------------------------

Fish Painting

--------------------------------------------

Fish Painting

--------------------------------------------

If something seems fishy about this painting, it's because it's a fish! Also hand-painted by trained monkeys!

--------------------------------------------

If something seems fishy about this painting, it's because it's a fish! 
--------------------------------------------
Also hand-painted by trained monkeys!
--------------------------------------------
Also hand-painted by trained monkeys!
--------------------------------------------


--------------------------------------------

$10,005.00

--------------------------------------------

$10,005.00

--------------------------------------------



--------------------------------------------


--------------------------------------------

--------------------------------------------


--------------------------------------------


--------------------------------------------

Dead Parrot

This is an ex-parrot! Or maybe he's only resting?

$0.50



--------------------------------------------

Dead Parrot

--------------------------------------------

Dead Parrot

--------------------------------------------

This is an ex-parrot! Or maybe he's only resting?

--------------------------------------------

This is an ex-parrot! 
--------------------------------------------
Or maybe he's only resting?
--------------------------------------------
Or maybe he's only resting?
--------------------------------------------


--------------------------------------------

$0.50

--------------------------------------------

$0.50

--------------------------------------------



--------------------------------------------


--------------------------------------------

--------------------------------------------


--------------------------------------------


--------------------------------------------

Mystery Box

If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. Keep your friends guessing!

$1.50



--------------------------------------------

Mystery Box

--------------------------------------------

Mystery Box

--------------------------------------------

If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. Keep your friends guessing!

--------------------------------------------

If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. 
--------------------------------------------
Keep your friends guessing!
--------------------------------------------
Keep your friends guessing!
--------------------------------------------


--------------------------------------------

$1.50

--------------------------------------------

$1.50

--------------------------------------------



--------------------------------------------


--------------------------------------------

--------------------------------------------


--------------------------------------------


--------------------------------------------
# 3-- 兄弟标签:next_siblings 和 previous_sibling

import requests
from bs4 import BeautifulSoup

html = requests.get('http://www.pythonscraping.com/pages/page3.html')
bs = BeautifulSoup(html.text, 'html.parser')

for sibling in bs.find('table', {'id':'giftList'}).tr.next_siblings:
    print(sibling) 

Vegetable Basket

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
Now with super-colorful bell peppers!

$15.00






Russian Nesting Dolls

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents!

$10,000.52






Fish Painting

If something seems fishy about this painting, it's because it's a fish! Also hand-painted by trained monkeys!

$10,005.00






Dead Parrot

This is an ex-parrot! Or maybe he's only resting?

$0.50






Mystery Box

If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. Keep your friends guessing!

$1.50



# 4-- 父标签:.parent  用的比较少

# 查找图片 '../img/gifts/img1.jpg'对应的商品的价格:
import requests
from bs4 import BeautifulSoup
html = requests.get('http://www.pythonscraping.com/pages/page3.html')
bs = BeautifulSoup(html.text, 'html.parser')

print(bs.find('img',
              {'src':'../img/gifts/img1.jpg'})
      .parent.previous_sibling.get_text())  # 兄弟标签和父标签
$15.00
                                                                                                            

你可能感兴趣的:(Spider_权威指南_ch02_01)