python 3.2 html.parserde 自学笔记

例子:解析网页中的连接地址:


from html.parser import HTMLParser
page='''京东商城乐淘网上鞋城拉手网团购卓越网上购物凡客诚品购物世纪佳缘交友'''
class hp(HTMLParser):
    def handle_starttag(self,tag,attr):
        if tag=='a':
            for (att,value) in attr:
                if att=='href':
                    print(value)

yk=hp()
yk.feed(page)


返回结果是:


http://click.union.360buy.com/JdClick/?unionId=75
http://www.letao.com/?source=hao123
http://www.lashou.com/cl_today/w_3001
http://www.amazon.cn/?tag=2009hao123famousdaohang
http://www.vancl.com/?source=hao123mp
http://reg.jiayuan.com/st/?id=3237&url=/st/main.php

--------------------------------------------------------------------------------------------------------------------------------

python 官网文档的例子:

from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
    
    def handle_starttag(self, tag, attrs):
        print("Encountered a {} start tag".format(tag))
    def handle_endtag(self, tag):
         print("Encountered a {} end tag".format(tag))
page = """

Title

I'm a paragraph!

""" myparser = MyHTMLParser() myparser.feed(page)


运行后返回的结果:


Encountered a html start tag
Encountered a h1 start tag
Encountered a h1 end tag
Encountered a p start tag
Encountered a p end tag
Encountered a html end tag


---------------------------------------------------------------------------

显示html中之间的文字例子:

from html.parser import HTMLParser
page='''啊阿什顿啊京东商城乐淘网上鞋城拉手网团购卓越网上购物凡客诚品购物世纪佳缘交友'''
class hp(HTMLParser):
    
    a_txt=False
    
    def handle_starttag(self,tag,attr):
        if tag=='a':
            self.a_txt=True
            #print (dict(attr))




    def handle_endtag(self,tag):
        if tag=='a':
            self.a_txt=False




            
    def handle_data(self,data):
        if self.a_txt:
            print (data)




yk=hp()
yk.feed(page)
yk.close()

运行后:

京东商城
乐淘网上鞋城
拉手网团购
卓越网上购物
凡客诚品购物
世纪佳缘交友


你可能感兴趣的:(python)