python json解析注意事项

今天需要解析一个非常长的json字符串，中间碰到了各种问题，总结了一下所有的注意事项。
首先我有一个字符串，原本非常长，我精简了一下，如下所示：

>>> s="{'product': u'\\u62c9\\u52fe\\u7f51', 'downtime': 3.128, 
'monitors': [{'use': 100, 'monitorurl': u'http://oss.lagou.com','monitorweight': 10L,
'monitorname': u'\\u804c\\u4f4d\\u641c\\u7d22'}]}"

这应该不是正规调用json.dumps()得到的字符串，而是用str()，原数据结构是由字典、列表、字符串、长整型的数据拼接起来的，还包含着中文的Unicode字符。即

>>> origin={"product": u"\\u62c9\\u52fe\\u7f51", "downtime": 3.128, 
"monitors": [{"use": 100, "monitorurl": u"http://oss.lagou.com","monitorweight": 10L, 
"monitorname": u"\\u804c\\u4f4d\\u641c\\u7d22"}]}
>>> json.dumps(origin)
'{"product": "\\\\u62c9\\\\u52fe\\\\u7f51", 
"monitors": [{"use": 100, "monitorweight": 10, 
"monitorname": "\\\\u804c\\\\u4f4d\\\\u641c\\\\u7d22", 
"monitorurl": "http://oss.lagou.com/"}], "downtime": 3.1280000000000001}'
>>> str(origin)
"{'product': u'\\\\u62c9\\\\u52fe\\\\u7f51', 
'monitors': [{'use': 100, 'monitorweight': 10L, 
'monitorname': u'\\\\u804c\\\\u4f4d\\\\u641c\\\\u7d22', 
'monitorurl': u'http://oss.lagou.com'}], 'downtime': 3.128}"

如果是json.dumps(s)，直接就可以用json.loads(s)便可转换为对象。那么针对这种用str()的，便会出现各种问题。总结出现的如下几点问题：

字符串里的键值对必须是用双引号，不能用单引号。单引号会报：Expecting property name: line 1 column 1 (char 1)

>>> s1="{'a':'a'}";s2='{"a":"a"}'
>>> json.loads(s1)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
    obj, end = self._scanner.iterscan(s, **kw).next()
  File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = action(m, context)
  File "/usr/lib64/python2.6/json/decoder.py", line 171, in JSONObject
    raise ValueError(errmsg("Expecting property name", s, end))
ValueError: Expecting property name: line 1 column 1 (char 1)
>>> json.loads(s2)
{u'a': u'a'}

str()后不管原来的键值是单引号还是双引号，最终都会变成单引号，外层是双引号。所以需要替换为双引号

>>> s={"a":"a"};str(s)
"{'a': 'a'}"
>>> s={'a':'a'};str(s)
"{'a': 'a'}"

>>> s={'a':'a'};s1=str(s)
>>> s1
"{'a': 'a'}"
>>> s2=s1.replace('\'','\"')
>>> s2
'{"a": "a"}'
>>> json.loads(s2)
{u'a': u'a'}

unicode字符串，str()后还会带u标志，需要去掉。

>>> s={'a':u'拉勾网'}
>>> s
{'a': u'\u62c9\u52fe\u7f51'}
>>> s1=str(s)
>>> s1
"{'a': u'\\u62c9\\u52fe\\u7f51'}"
>>> json.loads(s1)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
    obj, end = self._scanner.iterscan(s, **kw).next()
  File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = action(m, context)
  File "/usr/lib64/python2.6/json/decoder.py", line 171, in JSONObject
    raise ValueError(errmsg("Expecting property name", s, end))
ValueError: Expecting property name: line 1 column 1 (char 1)
>>>

4.长整型数据，str()后还带有L标志，也需要处理。

>>> s={"a":10L}
>>> s1=str(s)
>>> s1
"{'a': 10L}"
>>> s
>>> s1='{"a":10L}'
>>> json.loads(s1)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
    obj, end = self._scanner.iterscan(s, **kw).next()
  File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = action(m, context)
  File "/usr/lib64/python2.6/json/decoder.py", line 193, in JSONObject
    raise ValueError(errmsg("Expecting , delimiter", s, end - 1))
ValueError: Expecting , delimiter: line 1 column 7 (char 7)

最后再回到之前那个复杂的字符串。

>>> s="{'product': u'\\u62c9\\u52fe\\u7f51', 'downtime': 3.128, 'monitors': [{'use': 100L, 'monitorurl': u'http://oss.lagou.com','monitorweight': 10L,'monitorname': u'\\u804c\\u4f4d\\u641c\\u7d22'}]}"
>>> #替换单引号为双引号
>>> s1=s.replace('\'','\"')
>>> s1
'{"product": u"\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": u"http://oss.lagou.com","monitorweight": 10L,"monitorname": u"\\u804c\\u4f4d\\u641c\\u7d22"}]}'
>>> s2=s1.replace('u\"','\"')
>>> #去掉unicode标志u
>>> s2
'{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": "http://oss.lagou.com","monitorweight": 10L,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
>>> s3=s2.replace('..L','')
>>> s3
'{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100L, "monitorurl": "http://oss.lagou.com","monitorweight": 10L,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
>>> #去掉长整型的L
>>> import re
>>> s3=re.sub(r'(\d+)L','\g<1>',s2)
>>> s3
'{"product": "\\u62c9\\u52fe\\u7f51", "downtime": 3.128, "monitors": [{"use": 100, "monitorurl": "http://oss.lagou.com","monitorweight": 10,"monitorname": "\\u804c\\u4f4d\\u641c\\u7d22"}]}'
>>> #最终可以用json.loads()了。
>>> json.loads(s3)
{u'product': u'\u62c9\u52fe\u7f51', u'monitors': [{u'use': 100, u'monitorweight': 10, u'monitorname': u'\u804c\u4f4d\u641c\u7d22', u'monitorurl': u'http://oss.lagou.com'}], u'downtime': 3.1280000000000001}

python json解析注意事项

你可能感兴趣的:(python json解析注意事项)