首页 > 代码库 > beautifulsoup 的children和descandants

beautifulsoup 的children和descandants

之前看爬虫的时候,看到这里就断了,一直不太理解这2个的区别。

今天重新看,也借助了这位哥们的方法,把结果打印出来,我大概知道了这2者的区别。

http://www.cnblogs.com/chensimin1990/p/6725803.html

 

--------------------------------

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bs_obj = BeautifulSoup(html,html.parser)

# name_list = bs_obj.find_all("span", {"class":"green"})

#for name in name_list:
#    print(name.get_text())  
# file = open(‘test.txt‘,‘w‘)
# content = ‘‘
for child in bs_obj.find("table",{"id":"giftList"}).descendants:
    print(child)

代码是这样的

------------------------------------------------

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>
<th>
Item Title
</th>

Item Title

<th>
Description
</th>

Description

<th>
Cost
</th>

Cost

<th>
Image
</th>

Image



<tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>
<td>
Vegetable Basket
</td>

Vegetable Basket

<td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td>

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!

<span class="excitingNote">Now with super-colorful bell peppers!</span>
Now with super-colorful bell peppers!


<td>
$15.00
</td>

$15.00

<td>
<img src="../img/gifts/img1.jpg"/>
</td>


<img src="../img/gifts/img1.jpg"/>




<tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>
<td>
Russian Nesting Dolls
</td>

Russian Nesting Dolls

<td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td>

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"!
<span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
8 entire dolls per set! Octuple the presents!


<td>
$10,000.52
</td>

$10,000.52

<td>
<img src="../img/gifts/img2.jpg"/>
</td>


<img src="../img/gifts/img2.jpg"/>




<tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it‘s because it‘s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>
<td>
Fish Painting
</td>

Fish Painting

<td>
If something seems fishy about this painting, it‘s because it‘s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td>

If something seems fishy about this painting, it‘s because it‘s a fish!
<span class="excitingNote">Also hand-painted by trained monkeys!</span>
Also hand-painted by trained monkeys!


<td>
$10,005.00
</td>

$10,005.00

<td>
<img src="../img/gifts/img3.jpg"/>
</td>


<img src="../img/gifts/img3.jpg"/>




<tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he‘s only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>
<td>
Dead Parrot
</td>

Dead Parrot

<td>
This is an ex-parrot! <span class="excitingNote">Or maybe he‘s only resting?</span>
</td>

This is an ex-parrot!
<span class="excitingNote">Or maybe he‘s only resting?</span>
Or maybe he‘s only resting?


<td>
$0.50
</td>

$0.50

<td>
<img src="../img/gifts/img4.jpg"/>
</td>


<img src="../img/gifts/img4.jpg"/>




<tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>
<td>
Mystery Box
</td>

Mystery Box

<td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td>

If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining.
<span class="excitingNote">Keep your friends guessing!</span>
Keep your friends guessing!


<td>
$1.50
</td>

$1.50

<td>
<img src="../img/gifts/img6.jpg"/>
</td>


<img src="../img/gifts/img6.jpg"/>

这是结果

 

用children的函数(?不知道为什么叫函数,感觉没有括号,明明是字段啊...)

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bs_obj = BeautifulSoup(html,html.parser)

# name_list = bs_obj.find_all("span", {"class":"green"})

#for name in name_list:
#    print(name.get_text())  
# file = open(‘test.txt‘,‘w‘)
# content = ‘‘
for child in bs_obj.find("table",{"id":"giftList"}).children:
    print(child)

结果是这样的:

<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>


<tr class="gift" id="gift1"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg"/>
</td></tr>


<tr class="gift" id="gift2"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg"/>
</td></tr>


<tr class="gift" id="gift3"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it‘s because it‘s a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg"/>
</td></tr>


<tr class="gift" id="gift4"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he‘s only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg"/>
</td></tr>


<tr class="gift" id="gift5"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg"/>
</td></tr>

------------------

到说明的时候:

1、chidlren并不是只返回子代的第一层,而是到没有子代的那一层,也就是说会穿透所有的,这个我以前以为是descendants干的事。

2、那descendants还留着干嘛呢?

是这么一个作用,他对每一个子代都会遍历一边他所有的后代。

如果我们打个比方:

a

-a1

--a11

--a12

--a13

---a131

----a1311

如果用children,其实就是原样返回,如果用descendants的话,他会在a13的时候返回一次a1311,a131的时候又返回一次a1311。

另外

for child in bs_obj.find("table",{"id":"giftList"}).children 和 
for child in bs_obj.find("table",{"id":"giftList"})是等价的,想想也知道,这个更符合一般人的直觉。

beautifulsoup 的children和descandants