正则表达式应用的举例

正则表达式应用的举例


灵活使用findall与search


#coding=utf-8

#导入re库文件
import re

old_url='http://www.jikexueyuan.com/course/android/?pageNum=2'
total_page=20

f=open('test.html','r')
html=f.read()
f.close()

#爬取标题
#在确定寻找的内容只有一个的时候,search的效率要比findall高
title=re.search('(.*?)',html,re.S).group(1)
print title

#爬取链接
links=re.findall('href="(.*?)"',html,re.S)
for each in links:
    print each

#爬取部分文字,先大再小
text_fied=re.findall('
    (.*?)
',html,re.S)[0] the_text=re.findall('">(.*?)',text_fied,re.S) for every_text in the_text: print every_text #sub实现翻页 for i in range(2,total_page+1): new_link=re.sub('pageNum=\d+','pageNum=%d'%i,old_url,re.S) print new_link

html文件:




    
    极客学院爬虫测试






输出:

极客学院爬虫测试
http://jikexueyuan.com/welcom.html
http://jikexueyuan.com/1.html
http://jikexueyuan.com/2.html
http://jikexueyuan.com/3.html
这是第一条
这是第二条
这是第三条
http://www.jikexueyuan.com/course/android/?pageNum=2
http://www.jikexueyuan.com/course/android/?pageNum=3
http://www.jikexueyuan.com/course/android/?pageNum=4
http://www.jikexueyuan.com/course/android/?pageNum=5
http://www.jikexueyuan.com/course/android/?pageNum=6
http://www.jikexueyuan.com/course/android/?pageNum=7
http://www.jikexueyuan.com/course/android/?pageNum=8
http://www.jikexueyuan.com/course/android/?pageNum=9
http://www.jikexueyuan.com/course/android/?pageNum=10
http://www.jikexueyuan.com/course/android/?pageNum=11
http://www.jikexueyuan.com/course/android/?pageNum=12
http://www.jikexueyuan.com/course/android/?pageNum=13
http://www.jikexueyuan.com/course/android/?pageNum=14
http://www.jikexueyuan.com/course/android/?pageNum=15
http://www.jikexueyuan.com/course/android/?pageNum=16
http://www.jikexueyuan.com/course/android/?pageNum=17
http://www.jikexueyuan.com/course/android/?pageNum=18
http://www.jikexueyuan.com/course/android/?pageNum=19
http://www.jikexueyuan.com/course/android/?pageNum=20

Process finished with exit code 0


你可能感兴趣的:(python学习)