1. 目标

把网页上存储的json格式文本抓下来，并且计算所有数值的和。

2. 简介：

用urllib把网页数据抓下来
注意忽略ssl错误提醒
检查抓下来的数据格式，这里的json
调用python中的json库把数据变为dictionary格式；先用len查看有多少个value/key对。
把需要的数据抓下来
对数据进行简单处理。

3. 详细过程

用urllib把网页数据抓下来并忽略ssl错误。因为这里的目的是处理数据，可以简单print出来看看格式。对的话继续往下做。

import urllib.request, urllib.parse, urllib.error
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# use urllib to achieve the web data (json)
url = 'http://py4e-data.dr-chuck.net/comments_42.json'
data = urllib.request.urlopen(url, context=ctx).read()
print(data)

先看整个json的格式，即使用sublime看也还是很乱（太多括号了）。这时候可以用len看里面到底有多少个key/value值。

#接上
import json

raw_info = json.loads(data)
print(len(raw_info))
print('note part is -',raw_info['note'])

#输出是 
2 
note part is - This file contains the sample data for testing

也就是说其实raw_info作为dictionary其实只有两个key，一个是“note”，另一个是“comments”。就是下图这样，而comments的value则是一个小的json结构。在以上的print('note part is -',raw_info['note'])验证正确。

raw_info.png

注意需要的是comments里面的数值，取出来后变为一个具有50个value的list。

# 输入检查
raw_comments = raw_info['comments']

print(type(raw_comments))
print(len(raw_comments))

for item in raw_comments:
    print(item)

输出是：

50
{'name': 'Romina', 'count': 97}
{'name': 'Laurie', 'count': 97}
{'name': 'Bayli', 'count': 90}

以上的item是dictionary结构，取出key为‘count’，对应int整形的数值。

for item in raw_comments:
    a = item['count']
    print(a,type(a))

输出：
97 
97

把所有数值加起来，并且把以上代码合为整体。正确输出是2553。想自动化一点可以把url改为input，但我只要交作业就手动加上去了，所以...我就手动加了。

import urllib.request, urllib.parse, urllib.error
import ssl
import json

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# use urllib to achieve the web data (json)
url = 'http://py4e-data.dr-chuck.net/comments_42.json'
data = urllib.request.urlopen(url, context=ctx).read()

# get the commnets and it is a dictionary 
raw_info = json.loads(data)
raw_comments = raw_info['comments']

# sum up all the counts 
summary = 0
for item in raw_comments:
    summary += item['count']

print(summary)

一些遇到的问题：当肉眼看不出来是什么类型时，请务必使用简单快捷的type 和 len查询。

python-小项目1- 获取网页中的json数据并进行简单处理

1. 目标

2. 简介：

3. 详细过程

你可能感兴趣的:(python-小项目1- 获取网页中的json数据并进行简单处理)