UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 99: invalid start byte

参看文章:
https://blog.csdn.net/weixin_40930415/article/details/80756828
https://www.cnblogs.com/yunguoxiaoqiao/p/7588725.html
我的代码:

import urllib.request
url = 'http://www.ip138.com'
'''
proxy_support = urllib.request.ProxyHandler({'http':'163.204.241.174:9999'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
'''
# head = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
req = urllib.request.Request(url)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36')
response = urllib.request.urlopen(req)
html = response.read().decode('utf-8')
print(html)

问题:

>>> 
================== RESTART: E:/Python/小甲鱼/爬虫专题/proxy_ip.py ==================
Traceback (most recent call last):
  File "E:/Python/小甲鱼/爬虫专题/proxy_ip.py", line 23, in 
    html = response.read().decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 99: invalid start byte

解决办法:
这个涉及到编码问题,先直说这个问题。在解码过程中,这个网站使用的编码方式,不是utf-8,所以出现了解码失败的问题。应该把“utf-8”改为“GB2312”,再次运行就可以了。
编码问题:
注意:python3默认编码是unicode;python2是ASCII码;windows环境默认是gbk编码。
python3执行过程:
1.解释器找到代码文件,把代码字符串按文件头定义的编码加载到内存,转成unicode
2.把代码字符串按照语法规则进行解释,
3.所有的变量字符都会以unicode编码声明

你可能感兴趣的:(UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 99: invalid start byte)