一、什么是PyQuery?
PyQuery库也是一个非常强大又灵活的网页解析库。
官网地址:http://pyquery.readthedocs.io/en/latest/
二、PyQuery基本库使用
html = ''' '''
1.初始化
# 字符串初始化 from pyquery import PyQuery as pq html = "" doc = pd(html) print(doc('li')) # URL初始化 from pyquery import PyQuery as pq html = "" doc = pq(url=' https://cuiqingcai.com’) print(doc(’title')) # 文件初始化 from pyquery import PyQuery as pq html = "" doc = pq(filename=’demo.html’) print(doc(’li’))
2.CSS选择器-获取标签
from pyquery import PyQuery as pq doc = pd(html) # 子元素 items = doc('.list') lis = items.find('li') lis = items.children() lis = items.children('.active') print(lis) # 父元素 items = doc('.list') container =items.parents() print(container) parent = items.parents('.wrap') print(parent) # 兄弟元素 li = doc('.list.item-0.active') print(li.siblings()) print(li.siblings('.active'))
3.CSS选择器-获取属性
from pyquery import PyQuery as pq doc = pd(html) a = doc('.item-0.active a') print(a) print(a.attr.href) print(a.attr('href')
4.获取内容
from pyquery import PyQuery as pq doc = pd(html) a = doc('.item-0.active a') print(a) print(a.text())
5.获取HTML
from pyquery import PyQuery as pq doc = pd(html) li = doc('.item-0.active') print(li) print(li.html())