代码如下:
#coding:utf-8 '''cdays-1-exercise-1.py @author: U{shengyan<mailto:[email protected]>} @version:$Id$ @note: 使用chardet和 urllib2 @see: chardet使用文档: http://chardet.feedparser.org/docs/, urllib2使用参考: http://docs.python.org/lib/module-urllib2.html ''' import sys import urllib2 import chardet def blog_detect(blogurl): ''' 检测blog的编码方式 @param blogurl: 要检测blog的url ''' try: fp = urllib2.urlopen(blogurl) #尝试打开给定url except Exception, e: #若产生异常,则给出相关提示并返回 print e print 'download exception %s' % blogurl return 0 blog = fp.read() #读取内容 codedetect = chardet.detect(blog)["encoding"] #检测得到编码方式 print '%s\t<-\t%s' % (blogurl, codedetect) fp.close() #关闭 return 1 if __name__ == "__main__": if len(sys.argv) == 1: print 'usage:\n\tpython cdays-1-exercise-1.py http://xxxx.com' else: blog_detect(sys.argv[1])
3.x中urllib,urllib2合成了一个urllib,打开网页变成了urllib.request.urlopen(url).
这个问题解决后,还有一个问题:chardet是外部库,怎么安装?"https://pypi.python.org/pypi/chardet2/2.0.3",下载后解压,放在Python\Lib\site-packages下,如我的是"D:\Python33\Lib\site-packages\chardet2-2.0.3",然后打开命令行,输入"pythonD:\Python33\Lib\site-packages\chardet2-2.0.3\setup.py install",但是不会成功,查看代码,可知setuptools没有安装.如何安装?"https://pypi.python.org/pypi/setuptools/1.1.6",这里windows链接有个" ez_setup.py",地址:"https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py",下载后运行,即可成功安装setuptools.然后运行"python D:\Python33\Lib\site-packages\chardet2-2.0.3\setup.py install",成功安装chardet.
然后python3.x的脚本为:
#coding:utf-8
'''python 3.x'''
import sys
import urllib.request
import chardet
def blog_detect(blogurl):
'''检测编码方式'''
try:
fp=urllib.request.urlopen(blogurl)
except Exception as e:
print(e)
print('download exception %s'%blogurl)
return 0
blog=fp.read() #python3.x read the html as html code bytearray
codedetect=chardet.detect(blog)['encoding']
print('%s<-%s'%(blogurl,codedetect))
fp.close()
return 1
if __name__=='__main__':
if len(sys.argv)==1:
print('''usage:
python DetectURLCoding.py http://xxx.com''')
else:
blog_detect(sys.argv[1])