利用Python解析CSV文件

1.CSV的特点:

每行文本以行为单位呈现

字段被分隔符(通常为逗号)隔开

只存储数据

不需要任何软件就可以读取


2.手动解析CSV文件

# Your task is to read the input DATAFILE line by line, and for the first 10 lines (not including the header)
# split each line on "," and then for each line, create a dictionary
# where the key is the header title of the field, and the value is the value of that field in the row.
# The function parse_file should return a list of dictionaries,
# each data line in the file being a single list entry.
# Field names and values should not contain extra whitespace, like spaces or newline characters.
# You can use the Python string method strip() to remove the extra whitespace.
# You have to parse only the first 10 data lines in this exercise,
# so the returned list should have 10 entries!
import os

DATADIR = ""
DATAFILE = "beatles-diskography.csv"


def parse_file(datafile):
    data = []
    with open(datafile, "r") as f:
        header = f.readline().split(",")   #获取表头
        counter = 0
        for line in f:
            if counter == 10:
                break
            fields = line.split(",")
            entry = {}
            for i, value in enumerate(fields):
                entry[header[i].strip()] = value.strip();    #用strip方法去除空白
            data.append(entry)
            couter += 1

    return data

3.利用CSV模块解析CSV文件

数据中的值可能包含分隔符(逗号),影响解析。而CSV模块会自动解决这些难题。

# -*- coding: UTF-8 -*- 
import os
import pprint
import csv

DATADIR = ""
DATAFILE = "beatles-diskography.csv"


def parse_csv(datafile):
    data = []
    n = 0
    with open(datafile, "rb") as sd:
        r = csv.DictReader(sd)   #为每行创建一个字典,同时将字段名称与表头对应
        for line in r:
            data.append(line)

    return data

if __name__ == '__main__':
    datafile = os.path.join(DATADIR, DATAFILE)
    d = parse_csv(datafile)
    pprint.pprint(d)


你可能感兴趣的:(数据处理)