python学习笔记_10(CSV文件)

Create by westfallon on 8/20

本文用到的文件在exercise_data文件夹中

python读取普通文件

传统方式(四步走)

标明路径
open函数打开文件
处理文件
关闭资源

示例(打开input.txt文件并输出文件内容)

input_path = "input.txt"  # 标明路径
file_reader = open(input_path, 'r')  # 打开文件
for row in file_reader:  # 按行处理文件
    print(row)
file_reader.close()  # 关闭资源

# 结果: 代亚群是世界上最好看的人！

with结构(三步走)

标明路径
with打开文件
处理文件

优势: 自动关闭资源, 简单

示例(功能同上)

input_path = "input.txt"
with open(input_path, 'r') as file_reader:  # with打开文件
    for row in file_reader:  # 处理文件
        print(row.split(','))

# 结果: 代亚群是世界上最好看的人！

python向普通文件中写入内容(与读取内容联系记忆)

传统方式, 与读文件相同都是四步走

标明路径
open函数打开文件
写入文件
关闭资源

示例(给定values列表, 将列表输出到output.txt文件中)

values = ['a', 'b', 'c', 'd', 'e']
output_path = "output.csv"  # 标明路径
file_writer = open(output_path, 'w')  # open函数打开文件
file_writer.write(str(values))  # 写入文件
file_writer.close()  # 关闭资源

# output.txt中: ['a', 'b', 'c', 'd', 'e']

with结构, 两步走

标明路径
with打开文件
写入内容

示例(功能如上)

values = ['a', 'b', 'c', 'd', 'e']
output_path = "output.csv"
with open(output_path, 'w') as file_writer:
    for value in values:
        file_writer.write(value + ',')

# output.txt中: ['a', 'b', 'c', 'd', 'e']

pandas读取csv文件(重要)

input_path = "input.csv"
date_frame = pd.read_csv(input_path)
print(date_frame)

在输入文件中筛选出特定行的三种方法

一、行中的值满足某个条件

传统方法(要求能看懂啥意思)

读取文件后利用for循环一行一行的判断, 如果满足某个条件则将该行写入输出文件中

示例(读取supplier_data.csv中的数据, 将其中supplier == "Supplier Z" 或者 cost > 600.0的行输出到output.csv文件中)

# 代码不要求会用, 但要求能看懂
input_path = "supplier_data.csv"
output_path = "output.csv"
with open(input_path, 'r') as input_file:
    with open(output_path, 'w') as output_file:
        file_reader = csv.reader(input_file)
        file_writer = csv.writer(output_file)
        header = next(file_reader)
        file_writer.writerow(header)
        for row_list in file_reader:
            supplier = str(row_list[0]).strip()
            cost = str(row_list[3]).lstrip('$')
            if supplier == "Supplier Z" or float(cost) > 600.0:
                file_writer.writerow(row_list)

pandas方法(要求会用)

将文件读取成为data_frame, 直接使用loc函数对其进行操作
loc函数全称为Selection by Label, 即为按标签选取元素, 后面是中括号, 有两个参数, 第一个是index(行), 第二个是column(列), : 表示全要

示例(需求同上)

input_path = "supplier_data.csv"
output_path = "output.csv"
data_frame = pd.read_csv(input_path)
data_frame['Cost'] = data_frame['Cost'].str.strip('$').astype(float)
output_data_frame = data_frame.loc[(data_frame['Supplier Name'].str.contains('Z')) | \
                                 (data_frame['Cost'] > 600.0), :]
output_data_frame.to_csv(output_path, index=False)

二、行中的值属于某个集合

传统方法(要求能看懂)

读取数据后使用for循环一行一行的判断, 使用if .. in ..结构来判断某个值是否在某个集合中

示例(要求筛选supplier_data.csv中日期在['1/20/14', '1/30/14']之中的数据)

important_date = ['1/20/14', '1/30/14']
input_path = "supplier_data.csv"
output_path = "output.csv"
with open(input_path, 'r') as input_file:
    with open(output_path, 'w') as output_file:
        file_reader = csv.reader(input_file)
        file_writer = csv.writer(output_file)
        header = next(file_reader)
        file_writer.writerow(header)
        for row_list in file_reader:
            if row_list[4] in important_date:
                file_writer.writerow(row_list)

pandas方法(会用)

同样使用data_frame和loc函数, 使用isin函数来判断某数是否在集合中

示例(需求同上)

# 这个例子重点记isin函数的使用
important_date = ['1/20/14', '1/30/14']
input_path = "supplier_data.csv"
output_path = "output.csv"
data_frame = pd.read_csv(input_path)
output_data_frame = data_frame.loc[data_frame['Purchase Date'].\
                                 isin(important_date), :]
output_data_frame.to_csv(output_path, index=False)

三、行中的值匹配于某个模式

传统方法不要求, 以下是pandas方法

同样使用data_frame和loc函数, 使用startswith函数来判断某字符串是否以另一字符串开头(字串)

示例(筛选supplier_data.csv中Invoice Number以001-开头的数据)

# 这个例子重点记startswith函数
input_path = "supplier_data.csv"
output_path = "output.csv"
data_frame = pd.read_csv(input_path)
output_data_frame = data_frame.loc[data_frame['Invoice Number'].\
                                 str.startswith("001-"), :]
output_data_frame.to_csv(output_path)

选取指定列的方法

传统方法(根据列序号筛选)

读取数据后使用for循环一行一行的筛选, 将一行中在指定列的数据放入输出中即可

示例(选取supplier_data.csv中第0列和第3列的数据)

my_columns = [0, 3]
input_path = "supplier_data.csv"
output_path = "output.csv"
with open(input_path, 'r') as input_file:
    with open(output_path, 'w') as output_file:
        file_reader = csv.reader(input_file)
        file_writer = csv.writer(output_file)
        for row_list in file_reader:
            row_list_output = []
            for column in my_columns:
                row_list_output.append(row_list[column])
            file_writer.writerow(row_list_output)

pandas方法(根据列序号筛选)

将数据读取为data_frame形式, 使用iloc函数进行筛选
iloc函数为Selection by Position，即按位置选择数据，即第n行，第n列数据，只接受整型参数
iloc与loc的区别: iloc只接受整形参数, 而loc只接受标签参数
注意: x : y为左开右闭区间

示例(需求同上)

my_columns = [0, 3]
input_path = "supplier_data.csv"
output_path = "output.csv"
data_frame = pd.read_csv(input_path)
output_data_frame = data_frame.iloc[:, my_columns]
output_data_frame.to_csv(output_path, index=False)

pandas方法(根据列名筛选)(重要, 要求掌握)

将数据读取为data_frame形式, 使用loc函数进行筛选

示例(选取supplier_data.csv中Supplier Name列和Cost列的数据)

my_columns = ['Supplier Name', 'Cost']
input_path = "supplier_data.csv"
output_path = "output.csv"
data_frame = pd.read_csv(input_path)
output_data_frame = data_frame.loc[:, my_columns]
output_data_frame.to_csv(output_path, index=False)

删除行或列

传统方法不要求, 以下是pandas方法

将数据读取为data_frame形式, 使用drop函数删除特定列
drop函数的参数为一个列表, 列表内如果是整数, 表示需要删除的行序号, 如果是字符串, 表示需要删除的列名

示例(删除supplier_data_unnecessary_header_footer.csv中第0, 1, 2, 16, 17, 18行)

input_path = "supplier_data_unnecessary_header_footer.csv"
output_path = "output.csv"
data_frame = pd.read_csv(input_path, header=None)
data_frame = data_frame.drop([0, 1, 2, 16, 17, 18])
data_frame.columns = data_frame.iloc[0]
data_frame = data_frame.drop(3)  # 多了一行
data_frame.to_csv(output_path, index=False)

使用pandas添加列名

使用情景: 某些数据未设定标题行, 需手动添加
方法: 直接在使用pd.read_csv函数读取文件时指定列名

示例(将supplier_data_no_header_row.csv设置列名)

input_path = "supplier_data_no_header_row.csv"
output_path = "output.csv"
my_columns = ['Supplier Name', 'Invoice Number', 'Part Number', 'Cost', 'Purchase Date']
data_frame = pd.read_csv(input_path, header=None, names=my_columns)
data_frame.to_csv(output_path, index=False)

python学习笔记_10(CSV文件)

python读取普通文件

传统方式(四步走)

with结构(三步走)

python向普通文件中写入内容(与读取内容联系记忆)

传统方式, 与读文件相同都是四步走

with结构, 两步走

pandas读取csv文件(重要)

在输入文件中筛选出特定行的三种方法

一、 行中的值满足某个条件

传统方法(要求能看懂啥意思)

pandas方法(要求会用)

二、 行中的值属于某个集合

传统方法(要求能看懂)

pandas方法(会用)

三、 行中的值匹配于某个模式

传统方法不要求, 以下是pandas方法

选取指定列的方法

传统方法(根据列序号筛选)

pandas方法(根据列序号筛选)

pandas方法(根据列名筛选)(重要, 要求掌握)

删除行或列

传统方法不要求, 以下是pandas方法

使用pandas添加列名

你可能感兴趣的:(python学习笔记_10(CSV文件))

一、行中的值满足某个条件

二、行中的值属于某个集合

三、行中的值匹配于某个模式