之前看爬虫的时候,看到这里就断了,一直不太理解这2个的区别。
今天重新看,也借助了这位哥们的方法,把结果打印出来,我大概知道了这2者的区别。
http://www.cnblogs.com/chensimin1990/p/6725803.html
--------------------------------
from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") bs_obj = BeautifulSoup(html,'html.parser') # name_list = bs_obj.find_all("span", {"class":"green"}) #for name in name_list: # print(name.get_text()) # file = open('test.txt','w') # content = '' for child in bs_obj.find("table",{"id":"giftList"}).descendants: print(child)
代码是这样的
------------------------------------------------
<tr><th> Item Title th><th> Description th><th> Cost th><th> Image th>tr> <th> Item Title th> Item Title <th> Description th> Description <th> Cost th> Cost <th> Image th> Image <tr class="gift" id="gift1"><td> Vegetable Basket td><td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!span> td><td> $15.00 td><td> <img src="../img/gifts/img1.jpg"/> td>tr> <td> Vegetable Basket td> Vegetable Basket <td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!span> td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!span> Now with super-colorful bell peppers! <td> $15.00 td> $15.00 <td> <img src="../img/gifts/img1.jpg"/> td> <img src="../img/gifts/img1.jpg"/> <tr class="gift" id="gift2"><td> Russian Nesting Dolls td><td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!span> td><td> $10,000.52 td><td> <img src="../img/gifts/img2.jpg"/> td>tr> <td> Russian Nesting Dolls td> Russian Nesting Dolls <td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!span> td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!span> 8 entire dolls per set! Octuple the presents! <td> $10,000.52 td> $10,000.52 <td> <img src="../img/gifts/img2.jpg"/> td> <img src="../img/gifts/img2.jpg"/> <tr class="gift" id="gift3"><td> Fish Painting td><td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!span> td><td> $10,005.00 td><td> <img src="../img/gifts/img3.jpg"/> td>tr> <td> Fish Painting td> Fish Painting <td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!span> td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!span> Also hand-painted by trained monkeys! <td> $10,005.00 td> $10,005.00 <td> <img src="../img/gifts/img3.jpg"/> td> <img src="../img/gifts/img3.jpg"/> <tr class="gift" id="gift4"><td> Dead Parrot td><td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?span> td><td> $0.50 td><td> <img src="../img/gifts/img4.jpg"/> td>tr> <td> Dead Parrot td> Dead Parrot <td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?span> td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?span> Or maybe he's only resting? <td> $0.50 td> $0.50 <td> <img src="../img/gifts/img4.jpg"/> td> <img src="../img/gifts/img4.jpg"/> <tr class="gift" id="gift5"><td> Mystery Box td><td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!span> td><td> $1.50 td><td> <img src="../img/gifts/img6.jpg"/> td>tr> <td> Mystery Box td> Mystery Box <td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!span> td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!span> Keep your friends guessing! <td> $1.50 td> $1.50 <td> <img src="../img/gifts/img6.jpg"/> td> <img src="../img/gifts/img6.jpg"/>
这是结果
用children的函数(?不知道为什么叫函数,感觉没有括号,明明是字段啊...)
from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page3.html") bs_obj = BeautifulSoup(html,'html.parser') # name_list = bs_obj.find_all("span", {"class":"green"}) #for name in name_list: # print(name.get_text()) # file = open('test.txt','w') # content = '' for child in bs_obj.find("table",{"id":"giftList"}).children: print(child)
结果是这样的:
<tr><th> Item Title th><th> Description th><th> Cost th><th> Image th>tr> <tr class="gift" id="gift1"><td> Vegetable Basket td><td> This vegetable basket is the perfect gift for your health conscious (or overweight) friends! <span class="excitingNote">Now with super-colorful bell peppers!span> td><td> $15.00 td><td> <img src="../img/gifts/img1.jpg"/> td>tr> <tr class="gift" id="gift2"><td> Russian Nesting Dolls td><td> Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!span> td><td> $10,000.52 td><td> <img src="../img/gifts/img2.jpg"/> td>tr> <tr class="gift" id="gift3"><td> Fish Painting td><td> If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!span> td><td> $10,005.00 td><td> <img src="../img/gifts/img3.jpg"/> td>tr> <tr class="gift" id="gift4"><td> Dead Parrot td><td> This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?span> td><td> $0.50 td><td> <img src="../img/gifts/img4.jpg"/> td>tr> <tr class="gift" id="gift5"><td> Mystery Box td><td> If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!span> td><td> $1.50 td><td> <img src="../img/gifts/img6.jpg"/> td>tr>
------------------
到说明的时候:
1、chidlren并不是只返回子代的第一层,而是到没有子代的那一层,也就是说会穿透所有的,这个我以前以为是descendants干的事。
2、那descendants还留着干嘛呢?
是这么一个作用,他对每一个子代都会遍历一边他所有的后代。
如果我们打个比方:
a
-a1
--a11
--a12
--a13
---a131
----a1311
如果用children,其实就是原样返回,如果用descendants的话,他会在a13的时候返回一次a1311,a131的时候又返回一次a1311。
另外
for child in bs_obj.find("table",{"id":"giftList"}).children 和
for child in bs_obj.find("table",{"id":"giftList"})是等价的,想想也知道,这个更符合一般人的直觉。