Python3爬虫文件持久化

用json.dumps()将数据保存到文件中中文显示不正常

def write_to_file(content):
    '''
    持久化保存到txt文件
    :param content: 字典对象
    :return:
    '''
    # a:追加; ensure_ascii:设置json.dumps()写入文件中的中文正常显示
    with open('maoyanTop100.txt', 'a', encoding='utf8') as f:
        f.write(json.dumps(content) + '\n')

文件内容如下:

{"the_index": "21", "image_url": "http://p0.meituan.net/movie/932bdfbef5be3543e6b136246aeb99b8123736.jpg@160w_220h_1e_1c", "title": "\u6307\u73af\u738b3\uff1a\u738b\u8005\u65e0\u654c", "actor": "\u4f0a\u83b1\u8d3e\u00b7\u4f0d\u5fb7,\u4f0a\u6069\u00b7\u9ea6\u514b\u83b1\u6069,\u4e3d\u8299\u00b7\u6cf0\u52d2", "the_time": "2004-03-15", "score": "9.2"}
...

json.dumps 序列化时对中文默认使用的ascii编码.想输出真正的中文需要指定ensure_ascii=False。

添加ensure_ascii=False

def write_to_file(content):
    '''
    持久化保存到txt文件
    :param content: 字典对象
    :return:
    '''
    # encoding ensure_ascii设置文件中的中文正常显示
    with open('maoyanTop100.txt', 'a', encoding='utf8') as f:
        f.write(json.dumps(content, ensure_ascii=False) + '\n')

文件内容如下:

{"the_index": "1", "image_url": "http://p1.meituan.net/movie/20803f59291c47e1e116c11963ce019e68711.jpg@160w_220h_1e_1c", "title": "霸王别姬", "actor": "张国荣,张丰毅,巩俐", "the_time": "1993-01-01", "score": "9.6"}
...

你可能感兴趣的:(python_爬虫)