如何在Python中解析CSV文件

CSV files are used a lot in storing tabular data into a file. We can easily export data from database tables or excel files to CSV files. It’s also easy to read by humans as well as in the program. In this tutorial, we will learn how to parse CSV files in Python.

CSV文件用于将表格数据存储到文件中。 我们可以轻松地将数据从数据库表或Excel文件导出到CSV文件。 它也很容易被人类以及程序读取。 在本教程中,我们将学习如何在Python中解析CSV文件。

什么是解析? (What is Parsing?)

Parsing a file means reading the data from a file. The file may contain textual data so-called text files, or they may be a spreadsheet.

解析文件意味着从文件中读取数据。 该文件可能包含称为文本文件的文本数据,或者它们可以是电子表格。

什么是CSV文件? (What is a CSV file?)

CSV stands for Comma Separated Files, i.e. data is separated using comma from each other. CSV files are created by the program that handles a large number of data. Data from CSV files can be easily exported in the form of spreadsheet and database as well as imported to be used by other programs.

CSV代表逗号分隔文件,即,数据之间使用逗号分隔。 CSV文件是由处理大量数据的程序创建的。 CSV文件中的数据可以以电子表格和数据库的形式轻松导出,也可以导入以供其他程序使用。

Let’s see how to parse a CSV file. Parsing CSV files in Python is quite easy. Python has an inbuilt CSV library which provides the functionality of both readings and writing the data from and to CSV files. There are a variety of formats available for CSV files in the library which makes data processing user-friendly.

让我们看看如何解析CSV文件。 在Python中解析CSV文件非常容易。 Python有一个内置的CSV库,该库提供了从CSV文件读取数据和将数据写入CSV文件的功能。 库中CSV文件有多种格式可供使用,这使数据处理变得用户友好。

用Python解析CSV文件 (Parsing a CSV file in Python)

Reading CSV files using the inbuilt Python CSV module.

使用内置的Python CSV模块读取CSV文件。

import csv

with open('university_records.csv', 'r') as csv_file:
    reader = csv.reader(csv_file)

    for row in reader:
        print(row)

Output:

输出:

如何在Python中解析CSV文件_第1张图片

Python Parse CSV File

Python解析CSV文件

用Python编写CSV文件 (Writing a CSV file in Python)

For writing a file, we have to open it in write mode or append mode. Here, we will append the data to the existing CSV file.

要写入文件,我们必须以写入模式或附加模式打开它。 在这里,我们会将数据附加到现有的CSV文件中。

import csv

row = ['David', 'MCE', '3', '7.8']

row1 = ['Lisa', 'PIE', '3', '9.1']

row2 = ['Raymond', 'ECE', '2', '8.5']

with open('university_records.csv', 'a') as csv_file:
    writer = csv.writer(csv_file)

    writer.writerow(row)

    writer.writerow(row1)

    writer.writerow(row2)
如何在Python中解析CSV文件_第2张图片

Python Append To CSV File

Python附加到CSV文件

使用Pandas库解析CSV文件 (Parse CSV Files using Pandas library)

There is one more way to work with CSV files, which is the most popular and more professional, and that is using the pandas library.

还有另一种使用CSV文件的方法,它是最受欢迎和更专业的,并且使用了熊猫库。

Pandas is a Python data analysis library. It offers different structures, tools, and operations for working and manipulating given data which is mostly two dimensional or one-dimensional tables.

Pandas是一个Python数据分析库。 它提供了不同的结构,工具和操作来处理和处理给定的数据,这些数据主要是二维表或一维表。

熊猫图书馆的用途和特点 (Uses and Features of pandas Library)

  • Data sets pivoting and reshaping.

    数据集透视和重塑。
  • Data manipulation with indexing using DataFrame objects.

    使用DataFrame对象建立索引的数据操作。
  • Data filtration.

    数据过滤。
  • Merge and join operation on data sets.

    对数据集的合并和联接操作。
  • Slicing, indexing, and subset of massive datasets.

    大规模数据集的切片,索引和子集。
  • Missing data handling and data alignment.

    缺少数据处理和数据对齐。
  • Row/Column insertion and deletion.

    行/列的插入和删除。
  • One-Dimensional different file formats.

    一维不同的文件格式。
  • Reading and writing tools for data in various file formats.

    各种文件格式数据的读写工具。

To work with the CSV file, you need to install pandas. Installing pandas is quite simple, follow the instructions below to install it using PIP.

要使用CSV文件,您需要安装熊猫。 安装熊猫非常简单,请按照以下说明使用PIP进行安装。

$ pip install pandas
如何在Python中解析CSV文件_第3张图片

Python Install Pandas

Python安装熊猫

如何在Python中解析CSV文件_第4张图片

Python Install Pandas Cmd

Python安装Pandas Cmd

Once the installation is complete, you are good to go.

安装完成后,一切就好了。

使用Pandas Module读取CSV文件 (Reading a CSV file using Pandas Module)

You need to know the path where your data file is in your filesystem and what is your current working directory before you can use pandas to import your CSV file data.

您需要先了解数据文件在文件系统中的路径以及当前的工作目录,然后才能使用熊猫导入CSV文件数据。

I suggest keeping your code and the data file in the same directory or folder so that you will not need to specify the path which will save you time and space.

我建议将代码和数据文件保存在相同的目录或文件夹中,这样您就无需指定路径来节省时间和空间。

import pandas

result = pandas.read_csv('ign.csv')

print(result)

Output

输出量

如何在Python中解析CSV文件_第5张图片

Read CSV File using pandas module

使用pandas模块读取CSV文件

使用Pandas模块编写CSV文件 (Writing a CSV file using Pandas Module)

Writing CSV files using pandas is as simple as reading. The only new term used is DataFrame.

使用熊猫编写CSV文件就像阅读一样简单。 使用的唯一新术语是DataFrame

Pandas DataFrame is a two-dimensional, heterogeneous tabular data structure (data is arranged in a tabular fashion in rows and columns.

Pandas DataFrame是二维的异构表格数据结构(数据以表格的形式排列在行和列中。

Pandas DataFrame consists of three main components – data, columns, and rows –  with a labeled x-axis and y-axis (rows and columns).

Pandas DataFrame由三个主要组件组成-数据,列和行-带有标记的x轴和y轴(行和列)。

from pandas import DataFrame

C = {'Programming language': ['Python', 'Java', 'C++'],

     'Designed by': ['Guido van Rossum', 'James Gosling', 'Bjarne Stroustrup'],

     'Appeared': ['1991', '1995', '1985'],

     'Extension': ['.py', '.java', '.cpp'],

     }

df = DataFrame(C, columns=['Programming language', 'Designed by', 'Appeared', 'Extension'])

export_csv = df.to_csv(r'program_lang.csv', index=None, header=True)

Output

输出量

如何在Python中解析CSV文件_第6张图片

Python Pandas Write CSV File

Python Pandas写入CSV文件

结论 (Conclusion)

We learned to parse a CSV file using built-in CSV module and pandas module. There are many different ways to parse the files, but programmers do not widely use them.

我们学习了使用内置的CSV模块和pandas模块解析CSV文件的方法。 解析文件有很多不同的方法,但是程序员并未广泛使用它们。

Libraries like PlyPlus, PLY, and ANTLR are some of the libraries used for parsing text data. Now you know how to use inbuilt CSV library and powerful pandas module for reading and writing data in CSV format. The codes shown above are very basic and straightforward. It is understandable by anyone familiar with python, so I don’t think there is any need for explanation.

PlyPlus,PLY和ANTLR等库是用于解析文本数据的一些库。 现在,您知道如何使用内置的CSV库和强大的pandas模块来以CSV格式读取和写入数据。 上面显示的代码非常基本和直接。 任何熟悉python的人都可以理解它,所以我认为不需要任何解释。

However, the manipulation of complex data with empty and ambiguous data entry is not easy. It requires practice and knowledge of various tools in pandas. CSV is the best way of saving and sharing data. Pandas is an excellent alternative to CSV modules. You may find it difficult in the beginning, but it isn’t so hard to learn. With a little bit of practice, you will master it.

但是,用空的和不明确的数据输入来处理复杂数据并不容易。 它需要大熊猫的各种工具的实践和知识。 CSV是保存和共享数据的最佳方法。 Pandas是CSV模块的绝佳替代品。 您可能在一开始会发现困难,但是学习起来并不难。 稍加练习,您就可以掌握它。

翻译自: https://www.journaldev.com/30140/parse-csv-files-in-python

你可能感兴趣的:(python,大数据,java,机器学习,数据库)