读取 CSV 文件
python 的 CSV 库主要是面向本地文件,但进行网络数据采集的时候,很多文件都是在线的。有以下方法可以解决这个问题。
手动把 CSV 文件下载到本机,然后用 python 定位文件位置;
写 python 程序下载文件,读取之后再把源文件删除;
从网上直接把文件读成一个字符串,然后转换成一个StringIO对象,使它具有文件的属性。
综上,应选择第三种比较好,即直接把文件读成字符串,然后封装成 StringIO 对象,使它具有文件的属性。程序代码如下:
以网站 http://pythonscraping.com/files/MontyPythonAlbums.csv 为例
#coding:utf-8
from urllib import urlopen
from io import StringIO
import csv
data=urlopen("http://pythonscraping.com/files/MontyPythonAlbums.csv").read().decode('ascii','ignore') #生成字符串对象
print data
print "\n******************************************\n"
dataFile = StringIO(data)
csvReader = csv.reader(dataFile) #生成列表,里面含有标题
for row in dictReader:
print row
输出结果如下:
Name,Year Monty Python's Flying Circus,1970 Another Monty Python Record,1971 Monty Python's Previous Record,1972 The Monty Python Matching Tie and Handkerchief,1973 Monty Python Live at Drury Lane,1974 An Album of the Soundtrack of the Trailer of the Film of Monty Python and the Holy Grail,1975 Monty Python Live at City Center,1977 The Monty Python Instant Record Collection,1977 Monty Python's Life of Brian,1979 Monty Python's Cotractual Obligation Album,1980 Monty Python's The Meaning of Life,1983 The Final Rip Off,1987 Monty Python Sings,1989 The Ultimate Monty Python Rip Off,1994 Monty Python Sings Again,2014 ****************************************** ['Name', 'Year'] ["Monty Python's Flying Circus", '1970'] ['Another Monty Python Record', '1971'] ["Monty Python's Previous Record", '1972'] ['The Monty Python Matching Tie and Handkerchief', '1973'] ['Monty Python Live at Drury Lane', '1974'] ['An Album of the Soundtrack of the Trailer of the Film of Monty Python and the Holy Grail', '1975'] ['Monty Python Live at City Center', '1977'] ['The Monty Python Instant Record Collection', '1977'] ["Monty Python's Life of Brian", '1979'] ["Monty Python's Cotractual Obligation Album", '1980'] ["Monty Python's The Meaning of Life", '1983'] ['The Final Rip Off', '1987'] ['Monty Python Sings', '1989'] ['The Ultimate Monty Python Rip Off', '1994'] ['Monty Python Sings Again', '2014']
#coding:utf-8
from urllib import urlopen
from io import StringIO
import csv
data=urlopen("http://pythonscraping.com/files/MontyPythonAlbums.csv").read().decode('ascii','ignore')
dataFile = StringIO(data)
dictReader = csv.DictReader(dataFile) #生成字典对象
for row in dictReader:
print row
输出结果如下:
{'Name': "Monty Python's Flying Circus", 'Year': '1970'} {'Name': 'Another Monty Python Record', 'Year': '1971'} {'Name': "Monty Python's Previous Record", 'Year': '1972'} {'Name': 'The Monty Python Matching Tie and Handkerchief', 'Year': '1973'} {'Name': 'Monty Python Live at Drury Lane', 'Year': '1974'} {'Name': 'An Album of the Soundtrack of the Trailer of the Film of Monty Python and the Holy Grail', 'Year': '1975'} {'Name': 'Monty Python Live at City Center', 'Year': '1977'} {'Name': 'The Monty Python Instant Record Collection', 'Year': '1977'} {'Name': "Monty Python's Life of Brian", 'Year': '1979'} {'Name': "Monty Python's Cotractual Obligation Album", 'Year': '1980'} {'Name': "Monty Python's The Meaning of Life", 'Year': '1983'} {'Name': 'The Final Rip Off', 'Year': '1987'} {'Name': 'Monty Python Sings', 'Year': '1989'} {'Name': 'The Ultimate Monty Python Rip Off', 'Year': '1994'} {'Name': 'Monty Python Sings Again', 'Year': '2014'}