《Python数据分析技术栈》第06章使用 Pandas 准备数据 05 通过从其他格式导入数据创建DataFrame(Creating DataFrames by importing data from other formats)
Pandas can read data from a wide variety of formats using its reader functions (refer to the complete list of supported formats here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html). The following are some of the commonly used formats.
Pandas 可以使用其阅读器函数从多种格式中读取数据(请参阅此处的完整支持格式列表:https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)。以下是一些常用格式。
The read_csv function can be used to read data from a CSV file into a DataFrame, as shown in the following.
read_csv 函数可用于将 CSV 文件中的数据读入 DataFrame,如下所示。
titanic=pd.read_csv('titanic.csv')
Reading data from CSV files is one of the most common ways to create a DataFrame. CSV files are comma-separated files for storing and retrieving values, where each line is equivalent to a row. Remember to upload the CSV file in Jupyter using the upload button on the Jupyter home page (Figure 6-1), before calling the “read_csv” function.
从 CSV 文件读取数据是创建 DataFrame 的最常见方法之一。CSV 文件是以逗号分隔的文件,用于存储和检索值,每一行相当于一行。在调用 "read_csv "函数之前,请记住使用 Jupyter 主页(图 6-1)上的上传按钮在 Jupyter 中上传 CSV 文件。
Pandas provides support for importing data from both xls and xlsx file formats using the pd.read_excel function, as shown in the following.
Pandas 支持使用 pd.read_excel 函数从 xls 和 xlsx 文件格式导入数据,如下所示。
titanic_excel=pd.read_excel('titanic.xls')
JSON stands for JavaScript Object Notation and is a cross-platform file format for transmitting and exchanging data between the client and server. Pandas provides the function read_json to read data from a JSON file, as shown in the following.
JSON 是 JavaScript Object Notation 的缩写,是一种跨平台文件格式,用于在客户端和服务器之间传输和交换数据。Pandas 提供了 read_json 函数,用于从 JSON 文件中读取数据,如下所示。
titanic=pd.read_json('titanic-json.json')
We can also import data from a web page using the pd.read_html function.
我们还可以使用 pd.read_html 函数从网页中导入数据。
In the following example, this function parses the tables on the web page into DataFrame objects. This function returns a list of DataFrame objects which correspond to the tables on the web page. In the following example, table[0] corresponds to the first table on the mentioned URL.
在下面的示例中,该函数将网页上的表格解析为 DataFrame 对象。该函数将返回一个 DataFrame 对象列表,该列表与网页上的表格相对应。在下面的示例中,table[0] 对应上述 URL 中的第一个表格。
url="https://www.w3schools.com/sql/sql_create_table.asp"
table=pd.read_html(url)
table[0]
Further reading: See the complete list of supported formats in Pandas and the functions for reading data from such formats:
https://pandas.pydata.org/pandas-docs/stable/reference/io.html
进一步阅读: 请参阅 Pandas 支持格式的完整列表以及从这些格式读取数据的函数:
https://pandas.pydata.org/pandas-docs/stable/reference/io.html