Python爬虫之BeautifulSoup库(六):输出

一、格式化输出

prettify()方法将BeautifulSoup文档以格式化的方法输出

from bs4 import BeautifulSoup
markup = 'I linked to example.com'
soup = BeautifulSoup(markup,'lxml')
print(soup.prettify())

 
  
   I linked to
   
    example.com
   
  
 

二、压缩输出

如果只想得到字符串,不重视格式的话,可以使用str()方法

str(soup)
'I linked to example.com'

三、HTML特殊字符

soup = BeautifulSoup("“Dammit!” he said.","lxml")
str(soup)
'

“Dammit!” he said.

'

四、获取该tag中所有的文本内容:get_text()

markup = '\nI linked to example.com\n'
soup = BeautifulSoup(markup,'lxml')
print(soup.get_text())
print(soup.i.get_text())
I linked to example.com

example.com

指定分隔符

soup.get_text("|")
'\nI linked to |example.com|\n'

去掉空白符

soup.getText("|",strip=True)
'I linked to|example.com'

你可能感兴趣的:(Python爬虫)