csv文件的读取

读取 CSV 文件
python 的 CSV 库主要是面向本地文件,但进行网络数据采集的时候,很多文件都是在线的。有以下方法可以解决这个问题。

  • 手动把 CSV 文件下载到本机,然后用 python 定位文件位置;

  • 写 python 程序下载文件,读取之后再把源文件删除;

  • 从网上直接把文件读成一个字符串,然后转换成一个StringIO对象,使它具有文件的属性。
    综上,应选择第三种比较好,即直接把文件读成字符串,然后封装成 StringIO 对象,使它具有文件的属性。程序代码如下:

以网站 http://pythonscraping.com/files/MontyPythonAlbums.csv 为例

第一种方法:列表表示

#coding:utf-8
from urllib import urlopen
from io import StringIO
import csv

data=urlopen("http://pythonscraping.com/files/MontyPythonAlbums.csv").read().decode('ascii','ignore')  #生成字符串对象
print data
print "\n******************************************\n"
dataFile = StringIO(data)
csvReader = csv.reader(dataFile)      #生成列表,里面含有标题

for row in dictReader:
    print row

输出结果如下:

Name,Year
Monty Python's Flying Circus,1970
Another Monty Python Record,1971
Monty Python's Previous Record,1972
The Monty Python Matching Tie and Handkerchief,1973
Monty Python Live at Drury Lane,1974
An Album of the Soundtrack of the Trailer of the Film of Monty Python and the Holy Grail,1975
Monty Python Live at City Center,1977
The Monty Python Instant Record Collection,1977
Monty Python's Life of Brian,1979
Monty Python's Cotractual Obligation Album,1980
Monty Python's The Meaning of Life,1983
The Final Rip Off,1987
Monty Python Sings,1989
The Ultimate Monty Python Rip Off,1994
Monty Python Sings Again,2014

******************************************

['Name', 'Year']
["Monty Python's Flying Circus", '1970']
['Another Monty Python Record', '1971']
["Monty Python's Previous Record", '1972']
['The Monty Python Matching Tie and Handkerchief', '1973']
['Monty Python Live at Drury Lane', '1974']
['An Album of the Soundtrack of the Trailer of the Film of Monty Python and the Holy Grail', '1975']
['Monty Python Live at City Center', '1977']
['The Monty Python Instant Record Collection', '1977']
["Monty Python's Life of Brian", '1979']
["Monty Python's Cotractual Obligation Album", '1980']
["Monty Python's The Meaning of Life", '1983']
['The Final Rip Off', '1987']
['Monty Python Sings', '1989']
['The Ultimate Monty Python Rip Off', '1994']
['Monty Python Sings Again', '2014']

第二种方法:字典表示

#coding:utf-8
from urllib import urlopen
from io import StringIO
import csv

data=urlopen("http://pythonscraping.com/files/MontyPythonAlbums.csv").read().decode('ascii','ignore')

dataFile = StringIO(data)

dictReader = csv.DictReader(dataFile) #生成字典对象


for row in dictReader:
    print row

输出结果如下:

{'Name': "Monty Python's Flying Circus", 'Year': '1970'}
{'Name': 'Another Monty Python Record', 'Year': '1971'}
{'Name': "Monty Python's Previous Record", 'Year': '1972'}
{'Name': 'The Monty Python Matching Tie and Handkerchief', 'Year': '1973'}
{'Name': 'Monty Python Live at Drury Lane', 'Year': '1974'}
{'Name': 'An Album of the Soundtrack of the Trailer of the Film of Monty Python and the Holy Grail', 'Year': '1975'}
{'Name': 'Monty Python Live at City Center', 'Year': '1977'}
{'Name': 'The Monty Python Instant Record Collection', 'Year': '1977'}
{'Name': "Monty Python's Life of Brian", 'Year': '1979'}
{'Name': "Monty Python's Cotractual Obligation Album", 'Year': '1980'}
{'Name': "Monty Python's The Meaning of Life", 'Year': '1983'}
{'Name': 'The Final Rip Off', 'Year': '1987'}
{'Name': 'Monty Python Sings', 'Year': '1989'}
{'Name': 'The Ultimate Monty Python Rip Off', 'Year': '1994'}
{'Name': 'Monty Python Sings Again', 'Year': '2014'}

你可能感兴趣的:(网络爬虫)