from lxml import etree #导入lxml库解析html文件,后面用xpath查找
from requests import get
def get_html(url):
html=get(url)
if(html.status_code==200):
print('ok')
soup=etree.HTML(html.text)
titles=soup.xpath('//div[@id="content_left"]/div[contains(@class,'result')]')
for t in titles:
title=t.xpath('h3/a/text()')
print(''.join(title))#将列表的元素连接起来
try:
url=t.xpath('h3/a/@href')[0]
except:
print('no href')
else:
print('error')
if __name__=='__main__':
url='https://www.baidu.com/s?ie=UTF-8&wd=%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F'
get_html(url)
ok
– 语法 | 菜鸟教程
http://www.baidu.com/link?url=8nDy-xnUZC54lLUg-UHQZQtvCdv0P6QUiHknXjOvV45j7q9DlcjGkbpMPRJvgGENwc4BpnwN2rTtxdkKHDW3Kq
- 百度百科
http://www.baidu.com/link?url=pgJa-P2RwiFZaDOwPTHbfOvDkHsCII94HvRzE66OeBM939LUMOMHOcL0kL7ZZxc9CI8_OqnqsTAYDlAyf2-6YMauzGibKrxeFJj-lzeuTg3ThqdwTL2za8NAObeuvrLIg8SRwEpwNxtKaqAZS8XxSq
在线测试
http://www.baidu.com/link?url=Q5QHjwR_8KJH5i-T0IY6Lds97PP_yplc3-9GyIK5tK6s8xmDg6tBPdRF4Mmw2Qqn
– 教程 | 菜鸟教程
http://www.baidu.com/link?url=R8jsoN5M9mJWl1tqWD7n979zhlQ3-rDC1elvQ-fxZLq5qNCVVLTLHU4378EqNGDeLlTYYvKZ3pTyeuvgtx921q
你是如何学会的? - 知乎
http://www.baidu.com/link?url=Z-bVgTi2pKtv9JNoLu4BO28aZ8Qiv_Gm7hC610_cO_qe31Wj6xhpTIBoW2XNWToZA7cj71fgQAFguc3IH7xu-SMrD2CJH4eph9DAqz8LiHS
在线测试 | 菜鸟工具
http://www.baidu.com/link?url=sHP2Egz4EUDpw7Y2SZ82nhteCh_8MF0E_4R1xq_xiKDQ9nbicsgWy92nc18-BAqk
_cherrydreamsover的博客-CSDN博客
http://www.baidu.com/link?url=ARkF-ZZ5ckRhwAMSoTnFdNaMpadxIgZsjrmxde8WqAf2WlISAxptRC0OI43W8B-pn1GQtDspGADltk6l3DhJtZR3N3WbBnUlNFG6UeE62VK
在线测试 - 站长工具
http://www.baidu.com/link?url=b2ucqH4G5wzi-HTyWkt3qkeUsEdQLsoWVApJ5018OnJQnxjG16vtPwtrr_T4lLUh
-脚本之家
http://www.baidu.com/link?url=7bnR471Qbt6rmR9TRCuuLR9Ev7XHesQKuJgEdF0doTK-YROrN9upFTuosmXTDSB-
在线-BeJSON.com
http://www.baidu.com/link?url=urNZGpYU-49XctHorhJC2MEL_FAA4RhsOQkqN0Dv1ksucykh9Qlrq4gaRcwwV7e5sPesLraDhywaOqvjhhHaa_