什么叫序列化——将原本的字典、列表等内容转换成一个字符串的过程就叫做序列化。
序列化的目的
1、以某种存储形式使自定义 对象持久化;
2、将对象从一个地方传递到另一个地方。
3、使程序更具维护性
json
json import json dic={"aaa":"bb","cc":"ddd"} str_dic=json.dumps(dic) print(str_dic) print(type(str_dic)) 结果: {"aaa": "bb", "cc": "ddd"} <class 'str'> ret=json.loads(str_dic) print(ret) print(type(ret)) 结果: {'aaa': 'bb', 'cc': 'ddd'} <class 'dict'> 写入文件: import json dic={"aaa":"bb","cc":"ddd"} str_dic=json.dumps(dic) with open("json_dump",mode="w")as f: f.write("str_dic") import json dic={"aaa":"bb","cc":"ddd"} with open("json_dump",mode="w")as f: json.dump(dic,f) 文件读取 import json dic={"aaa":"bb","cc":"ddd"} with open("json_dump")as f: print(json.load(f))
json格式的限制 json格式的key必须是字符串数据类型 如果是数字为key,那么dump之后会强行转成字符串数据类型 import json dic={123:456,321:654} str_dic=json.dumps(dic) print(str_dic) dic_str=json.loads(str_dic) print(dic_str) 结果: {"123": 456, "321": 654} {'123': 456, '321': 654} json支持元组做value,对元组做value的字典会把元组强制转换成列表 import json dic={123:456,321:(654,789,987)} str_dic=json.dumps(dic) print(str_dic) dic_str=json.loads(str_dic) print(dic_str) 结果: {"123": 456, "321": [654, 789, 987]} {'123': 456, '321': [654, 789, 987]} json不支持元组做key import json dic={123:456,(654,789,987):321} str_dic=json.dumps(dic) print(str_dic) dic_str=json.loads(str_dic) print(dic_str) 结果:报错 对列表的dump import json lst = ['aaa',123,'bbb',12.456] with open('json_demo','w') as f: json.dump(lst,f) with open('json_demo') as f: ret = json.load(f) print(ret) 结果: ['aaa', 123, 'bbb', 12.456] 注:json_demo文件夹下的内容: ["aaa", 123, "bbb", 12.456] json格式中的字符串只能是"" import json with open('json_demo') as f: ret = json.load(f) print(ret) 结果:报错 注:json_demo文件夹下的内容: ['aaa', 123, "bbb", 12.456] 可以多次dump但是不能load出来了 import json lst = ['aaa',123,'bbb',12.456] dic={123:456} with open('json_demo','w') as f: json.dump(lst,f) json.dump(dic,f) 结果:json_demo文件夹下的内容: ["aaa", 123, "bbb", 12.456]{"123": 456} import json with open('json_demo','r') as f: ret=json.load(f) print(ret) 结果:报错 想dump多个数据进入文件并能load读出来,用dumps import json lst = ['aaa',123,'bbb',12.456] dic={123:456} with open('json_demo','w') as f: str_lst=json.dumps(lst) str_dic=json.dumps(dic) f.write(str_lst+"\n") f.write(str_dic+"\n") 结果:json_demo文件夹下的内容: ["aaa", 123, "bbb", 12.456] {"123": 456} import json with open('json_demo','r') as f: for line in f: ret=json.loads(line) print(ret) 结果: ['aaa', 123, 'bbb', 12.456] {'123': 456} 注:json_demo文件夹下的内容: ["aaa", 123, "bbb", 12.456] {"123": 456} 中文格式的 ensure_ascii = False import json dic={"aaa":"bbb","name":"小明"} str_dic=json.dumps(dic) print(str_dic) dic_str=json.loads(str_dic) print(dic_str) 结果: {"aaa": "bbb", "name": "\u5c0f\u660e"} {'aaa': 'bbb', 'name': '小明'} import json dic={"aaa":"bbb","name":"小明"} str_dic=json.dumps(dic,ensure_ascii=False) print(str_dic) dic_str=json.loads(str_dic) print(dic_str) 结果: {"aaa": "bbb", "name": "小明"} {'aaa': 'bbb', 'name': '小明'} import json dic={"aaa":"bbb","name":"小明"} with open('json_demo','w',encoding='utf-8') as f: json.dump(dic,f,ensure_ascii=False) 结果:json_demo文件夹下的内容: {"aaa": "bbb", "name": "小明"} json的其他参数,是为了用户看的更方便,但是会相对浪费存储空间 import json data = {'username':['李华','二愣子'],'sex':'male','age':16} json_dic2 = json.dumps(data,sort_keys=True,indent=4,separators=(',',':'),ensure_ascii=False) print(json_dic2) 结果: { "age":16, "sex":"male", "username":[ "李华", "二愣子" ] } set不能被dump/dumps
pickle
支持几乎所有对象的序列化 dump的结果是bytes import pickle dic={111:(1,2,3),("name","age"):22} str_dic=pickle.dumps(dic) print(str_dic) dic_str=pickle.loads(str_dic) print(dic_str) 结果: b'\x80\x03}q\x00(KoK\x01K\x02K\x03\x87q\x01X\x04\x00\x00\x00nameq\x02X\x03\x00\x00\x00ageq\x03\x86q\x04K\x16u.' {111: (1, 2, 3), ('name', 'age'): 22} dump用的f文件句柄需要以wb的形式打开,load所用的f是'rb'模式 import pickle class Student: def __init__(self,name,age): self.name = name self.age = age xm = Student('XM',11) print(pickle.dumps(xm)) 结果: b'\x80\x03c__main__\nStudent\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x02\x00\x00\x00XMq\x04X\x03\x00\x00\x00ageq\x05K\x0bub.' ret = pickle.dumps(xm) xh = pickle.loads(ret) print(xh.name) print(xh.age) 结果: XM 11 import pickle class Student: def __init__(self,name,age): self.name = name self.age = age xm = Student('XM',11) with open('pickle_demo','wb') as f: pickle.dump(xm,f) 对于对象的序列化需要这个对象对应的类在内存中 反序列化回来必须含有写入时的类 import pickle class Student: def __init__(self,name,age): self.name = name self.age = age with open('pickle_demo','rb') as f: xh=pickle.load(f) print(xh.name) 结果:XM 对于多次dump/load的操作做了良好的处理 import pickle with open('pickle_demo','wb') as f: pickle.dump({'k1':'v1'}, f) pickle.dump({'k11':'v1'}, f) pickle.dump({'k11':'v1'}, f) pickle.dump({'k12':[1,2,3]}, f) pickle.dump(['k1','v1','l1'], f) with open('pickle_demo','rb') as f: print(pickle.load(f)) print(pickle.load(f)) print(pickle.load(f)) print(pickle.load(f)) print(pickle.load(f)) 当多一个print(pickle.load(f))时就会报错.所以: import pickle with open('pickle_demo','wb') as f: pickle.dump({'k1':'v1'}, f) pickle.dump({'k11':'v1'}, f) pickle.dump({'k11':'v1'}, f) pickle.dump({'k12':[1,2,3]}, f) pickle.dump(['k1','v1','l1'], f) with open('pickle_demo','rb') as f: while True: try: print(pickle.load(f)) except EOFError: break
shelve
shelve也是python提供给我们的序列化工具,比pickle用起来更简单一些。
shelve只提供给我们一个open方法,是用key来访问的,使用起来和字典类似。
import shelve f = shelve.open('shelve_demo') f['key'] = {'k1':(1,2,3),'k2':'v2'} f.close() import shelve f = shelve.open('shelve_demo') content = f['key'] f.close() print(content)
这个模块有个限制,它不支持多个应用同一时间往同一个DB进行写操作。所以当我们知道我们的应用如果只进行读操作,我们可以让shelve通过只读方式打开DB
import shelve f = shelve.open('shelve_file', flag='r') existing = f['key'] f.close() print(existing)
由于shelve在默认情况下是不会记录待持久化对象的任何修改的,所以我们在shelve.open()时候需要修改默认参数,否则对象的修改不会保存。
import shelve f1 = shelve.open('shelve_file') print(f1['key']) f1['key']['new_value'] = 'this was not here before' f1.close() f2 = shelve.open('shelve_file', writeback=True) print(f2['key']) f2['key']['new_value'] = 'this was not here before' f2.close() 设置writeback
writeback方式有优点也有缺点。优点是减少了我们出错的概率,并且让对象的持久化对用户更加的透明了;但这种方式并不是所有的情况下都需要,首先,使用writeback以后,shelf在open()的时候会增加额外的内存消耗,并且当DB在close()的时候会将缓存中的每一个对象都写入到DB,这也会带来额外的等待时间。因为shelve没有办法知道缓存中哪些对象修改了,哪些对象没有修改,因此所有的对象都会被写入。