作者:金良([email protected]) csdn博客:http://blog.csdn.net/u012176591
obj = """ {"name": "Wes", "places_lived": ["United States", "Spain", "Germany"], "pet": null, "siblings": [{"name": "Scott", "age": 25, "pet": "Zuko"}, {"name": "Katie", "age": 33, "pet": "Cisco"}] } """
result = json.loads(obj) #解析字符串
result #打印result值
输出:
{u'name': u'Wes',
u'pet': None,
u'places_lived': [u'United States', u'Spain', u'Germany'],
u'siblings': [{u'age': 25, u'name': u'Scott', u'pet': u'Zuko'},
{u'age': 33, u'name': u'Katie', u'pet': u'Cisco'}]}
下面是文本的实际内容,每个最外层的字典占文本中的一行
{ "a": "Mozilla\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\/535.11 (KHTML, like Gecko) Chrome\/17.0.963.78 Safari\/535.11", "c": "US", "nk": 1, "tz": "America\/New_York", "gr": "MA", "g": "A6qOVH", "h": "wfLQtf", "l": "orofrog", "al": "en-US,en;q=0.8", "hh": "1.usa.gov", "r": "http:\/\/www.facebook.com\/l\/7AQEFzjSi\/1.usa.gov\/wfLQtf", "u": "http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22415991", "t": 1331923247, "hc": 1331822918, "cy": "Danvers", "ll": [ 42.576698, -70.954903 ] }
{ "a": "GoogleMaps\/RochesterNY", "c": "US", "nk": 0, "tz": "America\/Denver", "gr": "UT", "g": "mwszkS", "h": "mwszkS", "l": "bitly", "hh": "j.mp", "r": "http:\/\/www.AwareMap.com\/", "u": "http:\/\/www.monroecounty.gov\/etc\/911\/rss.php", "t": 1331923249, "hc": 1308262393, "cy": "Provo", "ll": [ 40.218102, -111.613297 ] }
读取一行并打印出来,可以看到读取的内容,如下
open(path).readline()
打印的内容,注意最后一个字符是换行符,在后面我发现该换行符有或没有都能正常解析:
'{ "a": "Mozilla\\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\\/535.11 (KHTML, like Gecko) Chrome\\/17.0.963.78 Safari\\/535.11", "c": "US", "nk": 1, "tz": "America\\/New_York", "gr": "MA", "g": "A6qOVH", "h": "wfLQtf", "l": "orofrog", "al": "en-US,en;q=0.8", "hh": "1.usa.gov", "r": "http:\\/\\/www.facebook.com\\/l\\/7AQEFzjSi\\/1.usa.gov\\/wfLQtf", "u": "http:\\/\\/www.ncbi.nlm.nih.gov\\/pubmed\\/22415991", "t": 1331923247, "hc": 1331822918, "cy": "Danvers", "ll": [ 42.576698, -70.954903 ] }\n'
import json
path = 'filename.txt'
records = [json.loads(line) for line in open(path)]
打印效果(records[0]
):
{u'a': u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.78 Safari/535.11',
u'al': u'en-US,en;q=0.8',
u'c': u'US',
u'cy': u'Danvers',
u'g': u'A6qOVH',
u'gr': u'MA',
u'h': u'wfLQtf',
u'hc': 1331822918,
u'hh': u'1.usa.gov',
u'l': u'orofrog',
u'll': [42.576698, -70.954903],
u'nk': 1,
u'r': u'http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/wfLQtf',
u't': 1331923247,
u'tz': u'America/New_York',
u'u': u'http://www.ncbi.nlm.nih.gov/pubmed/22415991'}
内容是一个大的列表,第一行开头是一个[
符,末行最后一个字符’]’,列表的元素是一个个的字典,每个字典占一行,每行结束为一个,
符,除了最后一行。
json格式文件内容如下:
[{"url": "http://home.cnblogs.com/u/panpannju/", "followers": ["tandier", "611154"], "name": "panpannju"},
{"url": "http://home.cnblogs.com/u/429306/", "followers": [], "name": "429306"},
{"url": "http://home.cnblogs.com/u/jkframe/", "followers": ["AleeGreat", "koalaer"], "name": "jkframe"},
{"url": "http://home.cnblogs.com/u/graicesun/", "followers": [], "name": "graicesun"},
{"url": "http://home.cnblogs.com/u/blueshinejason/", "followers": ["overmore"], "name": "blueshinejason"},
{"url": "http://home.cnblogs.com/u/AleeGreat/", "followers": [], "name": "AleeGreat"},
{"url": "http://home.cnblogs.com/u/490449/", "followers": ["superhuake"], "name": "490449"},
{"url": "http://home.cnblogs.com/u/619865/", "followers": [], "name": "619865"},
{"url": "http://home.cnblogs.com/u/holycy/", "followers": ["graicesun"], "name": "holycy"}]
读取文件所有内容
text_file = open("data.json", "r")
lines = text_file.readlines()
查看首行,末行及中间任意一行,观察效果
lines[0] #首行内容
'[{"url": "http://home.cnblogs.com/u/jinliangjiuzhuang/", "followers": [], "name": "jinliangjiuzhuang"},\n'
lines[-1] #末行内容
'{"url": "http://home.cnblogs.com/u/510419/", "followers": [], "name": "510419"}]'
lines[1] #非首行和末行的内容
'{"url": "http://home.cnblogs.com/u/NelsonWu/", "followers": ["jinger", "346359"], "name": "NelsonWu"},\n'
解析的Python语句如下,其实解析json的函数仍是 json.loads()
,与之前的区别是,对读取的每行字符串进行了预处理,以去掉首行的[
和末行的]
。
records = []
for line in lines:
try:
if line.startswith('['):#判断逻辑
myline = line[1:-2]
elif line.endswith(']'):
myline = line[:-1]
else:
myline = line[:-2]
lineloads = json.loads(myline) #解析
except:
print myline #如果出错就打印改行内容
records.append(myline)
对比第二部分的json.loads方法的输入,可知json.loads的输入字符串的换行符是可有可无的。