Python handles data of various formats mainly through the two libraries, Pandas and Numpy. We have already seen the important features of these two libraries in the previous chapters. In this chapter we will see some basic examples from each of the libraries on how to operate on data.
Python主要通过两个库Pandas和Numpy处理各种格式的数据。 在前面的章节中,我们已经看到了这两个库的重要功能。 在本章中,我们将看到每个库中有关如何操作数据的一些基本示例。
The most important object defined in NumPy is an N-dimensional array type called ndarray. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. An instance of ndarray class can be constructed by different array creation routines described later in the tutorial. The basic ndarray is created using an array function in NumPy as follows −
NumPy中定义的最重要的对象是称为ndarray的N维数组类型。 它描述了相同类型的项目的集合。 可以使用从零开始的索引来访问集合中的项目。 可以通过本教程后面介绍的不同数组创建例程来构造ndarray类的实例。 基本ndarray使用NumPy中的数组函数创建,如下所示-
numpy.array
Following are some examples on Numpy Data handling.
以下是有关Numpy数据处理的一些示例。
# more than one dimensions
import numpy as np
a = np.array([[1, 2], [3, 4]])
print a
The output is as follows −
输出如下-
[[1, 2]
[3, 4]]
# minimum dimensions
import numpy as np
a = np.array([1, 2, 3,4,5], ndmin = 2)
print a
The output is as follows −
输出如下-
[[1, 2, 3, 4, 5]]
# dtype parameter
import numpy as np
a = np.array([1, 2, 3], dtype = complex)
print a
The output is as follows −
输出如下-
[ 1.+0.j, 2.+0.j, 3.+0.j]
Pandas handles data through Series,Data Frame, and Panel. We will see some examples from each of these.
熊猫通过Series , Data Frame和Panel处理数据。 我们将从每个例子中看到一些例子。
Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. A pandas Series can be created using the following constructor −
系列是一维标记的数组,能够保存任何类型的数据(整数,字符串,浮点数,python对象等)。 轴标签统称为索引。 可以使用以下构造函数创建pandas系列-
pandas.Series( data, index, dtype, copy)
Here we create a series from a Numpy Array.
在这里,我们从一个Numpy数组创建一个序列。
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s
Its output is as follows −
其输出如下-
0 a
1 b
2 c
3 d
dtype: object
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. A pandas DataFrame can be created using the following constructor −
数据框是二维数据结构,即,数据以表格形式在行和列中对齐。 可以使用以下构造函数创建pandas DataFrame-
pandas.DataFrame( data, index, columns, dtype, copy)
Let us now create an indexed DataFrame using arrays.
现在让我们使用数组创建索引的DataFrame。
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
Its output is as follows −
其输出如下-
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
A panel is a 3D container of data. The term Panel data is derived from econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s.
面板是3D数据容器。 术语“ 面板数据”是从计量经济学派生而来的,部分原因是名称pandas- pan(el)-da(ta) -s。
A Panel can be created using the following constructor −
面板可以使用以下构造函数创建-
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
In the below example we create a panel from dict of DataFrame Objects
在下面的示例中,我们根据DataFrame对象的字典创建面板
#creating an empty panel
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print p
Its output is as follows −
其输出如下-
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4
翻译自: https://www.tutorialspoint.com/python_data_science/python_data_operations.htm