python读写gbk、utf-8等输入输出流

python编码问题:

所有使用python的都会遇到下面的问题:

Traceback (most recent call last):
  File "amazon_test.py", line 30, in 
    print(s)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)

解决方法

首先,你要有个通用的环境:

  • locale保证
LANG=zh_CN.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

具体设置:

# ~/.bashrc中添加
LANG=zh_CN.UTF-8
LANGUAGE=zh_CN:zh:en_US:en
LC_ALL=en_US.UTF-8
  • py文件第一行一般为#!/usr/bin/env python
    第二行# -*- coding: utf-8 -*- 或者# coding=utf-8
    保证文件的编码为utf-8格式(有些人会把vim环境设置为gbk或者chinese,文件保存时可能会变成gbk格式,需要注意)
    p.s. : vimrc设置推荐:
set encoding=utf-8  " 新创建文件格式为utf-8
set termencoding=utf-8 " 终端显示格式,把解析的字符用utf-8编码来进行显示和渲染终端屏幕
set fileencodings=utf-8,gb18030,gbk,cp936,gb2312 " 可以查看多种格式的文件

python2

  • 解码输入流

    • 读取文件
    with open(file_path, 'r') as f:
        for line in f:
        line = line.decode('your_file_encoding', errors='ignore').strip()
    
    • 标准输入流
       for line in sys.stdin:
           line = line.decode('your_file_encoding', errors='ignore').strip()
    
  • 写某编码的文件

print >> sys.stdout, line.encode('gb18030', 'ignore') 
# 或者用,推荐下面的方法
sys.stdout.write(line.encode('gb18030', 'ignore') + '\n')

python3

  • 解码输入流
    • 读取文件
    with open(file_path, mode='r', encoding='gb18030', errors='ignore') as f:
        for line in f:  # line is unicode string
            pass
    
    • 标准输入流
       import io
       import sys
       sys.stdin = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
       for line in sys.stdin:
          pass
    
import sys
sys.stdin.reconfigure(encoding='utf-8')
for line in sys.stdin:
         pass
  • 编码输出
    • 写文件
    with open(file_output, encoding='your_dest_encoding', mode='w') as f:
        f.write(line) 
    
    • 输出流
    import sys
    import io
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
    sys.stdout.write(line + '\n')
    

你可能感兴趣的:(python读写gbk、utf-8等输入输出流)