Pandas初级50题练习
文章目录
- Pandas初级50题练习
-
- 1.模块导入
- 2.查看pandas版本信息
- 3.从列表创建Series序列
- 4.从Ndarray创建Series
- 5.从dict创建Series
- 6.修改Series索引
- 7.Series纵向拼接
- 8.Series按指定索引删除元素
- 9.Series修改指定索引的元素
- 10.Series按指定索引查找元素
- 11.Series的切片操作
- 12.Series加法运算
- 13.Series减法运算
- 14.Series乘法运算
- 15. Series 除法运算
- 16. Series 求中位数
- 17. Series 求和
- 18. Series 求最大值
- 19. Series 求最小值
- 20. 通过 NumPy 数组创建 DataFrame【DataFrame操作部分】
- 21. 通过字典数组创建 DataFrame
- 22. 查看 DataFrame 的数据类型
- 23.预览 DataFrame 的前 5 行数据
- 24. 查看 DataFrame 的后 3 行数据
- 25.查看 DataFrame 的索引
- 26. 查看 DataFrame 的列名
- 27. 查看 DataFrame 的数值
- 28. 查看 DataFrame 的统计数据
- 29. DataFrame 转置操作
- 30. 对 DataFrame 进行按列排序
- 31. 对 DataFrame 数据切片
- 32. 对 DataFrame 通过标签查询(单列)
- 33. 对 DataFrame 通过标签查询(多列)
- 34. 对 DataFrame 通过位置查询
- 35. DataFrame 副本拷贝
- 36. 判断 DataFrame 元素是否为空
- 37. 添加列数据
- 38. 根据 DataFrame 的下标值进行更改
- 39. 根据 DataFrame 的标签对数据进行修改
- 40. DataFrame 求平均值操作
- 41. 对 DataFrame 中任意列做求和操作
- 42. 将字符串转化为小写字母:
- 43. 将字符串转化为大写字母
- 44. 对缺失值进行填充
- 45. 删除存在缺失值的行
- 46. DataFrame 按指定列对齐
- 47.DataFrame 文件操作
- 48. CSV 文件读取
- 49. Excel 写入操作
- 50. Excel 读取操作
1.模块导入
import pandas as pd
2.查看pandas版本信息
print(pd.__version__)
1.0.1
注意
Pandas 的数据结构:Pandas 主要有 Series(一维数组),DataFrame(二维数组),Panel(三维数组),Panel4D(四维数组),PanelND(更多维数组)等数据结构。其中 Series 和 DataFrame 应用的最为广泛。
Series 是一维带标签的数组,它可以包含任何数据类型。包括整数,字符串,浮点数,Python 对象等。Series 可以通过标签来定位。
DataFrame 是二维的带标签的数据结构。我们可以通过标签来定位数据。这是 NumPy 所没有的。
3.从列表创建Series序列
ListSeries = [1,2,3,4,5,5.6]
LS3 = pd.Series(ListSeries)
LS3
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
dtype: float64
4.从Ndarray创建Series
import numpy as np
numpy_4 = np.random.randn(5)
numpy_4
index = ['小明','小强','小刚','小王','小q']
LS4 = pd.Series(numpy_4,index = index)
LS4
小明 2.269755
小强 -1.454366
小刚 0.045759
小王 -0.187184
小斯 1.532779
dtype: float64
5.从dict创建Series
dict5 = {'AA':1,'BB':2,'CC':3,'DD':4,'EE':5}
LS5 = pd.Series(dict5)
LS5
AA 1
BB 2
CC 3
DD 4
EE 5
dtype: int64
6.修改Series索引
tip:以下是关于Series的基本操作
print(LS5)
LS5.index = ['A','B','C','D','E']
LS5
AA 1
BB 2
CC 3
DD 4
EE 5
dtype: int64
A 1
B 2
C 3
D 4
E 5
dtype: int64
7.Series纵向拼接
print(LS5)
print(LS3)
LS7 = LS5.append(LS3)
LS7
LS7_1 = LS3.append(LS5)
LS7_1
A 1
B 2
C 3
D 4
E 5
dtype: int64
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
dtype: float64
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
A 1.0
B 2.0
C 3.0
D 4.0
E 5.0
dtype: float64
8.Series按指定索引删除元素
print(LS7)
LS8 = LS7.drop('E')
LS8
A 1.0
B 2.0
C 3.0
D 4.0
E 5.0
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
dtype: float64
A 1.0
B 2.0
C 3.0
D 4.0
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
dtype: float64
9.Series修改指定索引的元素
print(LS7)
LS7['A'] = 100
LS7
A 1.0
B 2.0
C 3.0
D 4.0
E 5.0
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
dtype: float64
A 100.0
B 2.0
C 3.0
D 4.0
E 5.0
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
dtype: float64
10.Series按指定索引查找元素
LS7['B']
2.0
11.Series的切片操作
print(LS7)
LS7[:3]
A 100.0
B 2.0
C 3.0
D 4.0
E 5.0
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
dtype: float64
A 100.0
B 2.0
C 3.0
dtype: float64
LS7[:-3]
A 100.0
B 2.0
C 3.0
D 4.0
E 5.0
0 1.0
1 2.0
2 3.0
dtype: float64
LS7[1:4]
B 2.0
C 3.0
D 4.0
dtype: float64
12.Series加法运算
tip: 讲运算
这里区别以下NAN,None,Null,
NAN:Not a Number,表示不是一个数,注意一点就是 np.nan == np.nan是不同的,会返回FALSE
None:表示一个空对象,常用来占位,把它看成一个值。
Null:表示为空,不是一个值。
print(LS7)
LS7.add(LS7)
A 100.0
B 2.0
C 3.0
D 4.0
E 5.0
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 5.6
dtype: float64
A 200.0
B 4.0
C 6.0
D 8.0
E 10.0
0 2.0
1 4.0
2 6.0
3 8.0
4 10.0
5 11.2
dtype: float64
13.Series减法运算
LS7.sub(LS7)
A 0.0
B 0.0
C 0.0
D 0.0
E 0.0
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
5 0.0
dtype: float64
14.Series乘法运算
LS7.mul(LS7)
A 10000.00
B 4.00
C 9.00
D 16.00
E 25.00
0 1.00
1 4.00
2 9.00
3 16.00
4 25.00
5 31.36
dtype: float64
15. Series 除法运算
LS7.div(LS7)
A 1.0
B 1.0
C 1.0
D 1.0
E 1.0
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
dtype: float64
16. Series 求中位数
LS7.median()
4.0
17. Series 求和
LS7.sum()
134.6
18. Series 求最大值
LS7.max()
100.0
19. Series 求最小值
LS7.min()
1.0
20. 通过 NumPy 数组创建 DataFrame【DataFrame操作部分】
np.random.seed(0)
date_20 = pd.date_range('today',periods = 6)
datas = np.random.randn(6,4)
columns = ['A','B','C','D']
df_20 = pd.DataFrame(datas,index = date_20,columns = columns)
df_20
|
A |
B |
C |
D |
2021-01-21 23:41:06.572078 |
1.764052 |
0.400157 |
0.978738 |
2.240893 |
2021-01-22 23:41:06.572078 |
1.867558 |
-0.977278 |
0.950088 |
-0.151357 |
2021-01-23 23:41:06.572078 |
-0.103219 |
0.410599 |
0.144044 |
1.454274 |
2021-01-24 23:41:06.572078 |
0.761038 |
0.121675 |
0.443863 |
0.333674 |
2021-01-25 23:41:06.572078 |
1.494079 |
-0.205158 |
0.313068 |
-0.854096 |
2021-01-26 23:41:06.572078 |
-2.552990 |
0.653619 |
0.864436 |
-0.742165 |
21. 通过字典数组创建 DataFrame
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df21 = pd.DataFrame(data, index=labels)
df21
|
animal |
age |
visits |
priority |
a |
cat |
2.5 |
1 |
yes |
b |
cat |
3.0 |
3 |
yes |
c |
snake |
0.5 |
2 |
no |
d |
dog |
NaN |
3 |
yes |
e |
dog |
5.0 |
2 |
no |
f |
cat |
2.0 |
3 |
no |
g |
snake |
4.5 |
1 |
no |
h |
cat |
NaN |
1 |
yes |
i |
dog |
7.0 |
2 |
no |
j |
dog |
3.0 |
1 |
no |
22. 查看 DataFrame 的数据类型
print(df21.dtypes)
print(type(df21))
animal object
age float64
visits int64
priority object
dtype: object
23.预览 DataFrame 的前 5 行数据
df21.head(5)
|
animal |
age |
visits |
priority |
a |
cat |
2.5 |
1 |
yes |
b |
cat |
3.0 |
3 |
yes |
c |
snake |
0.5 |
2 |
no |
d |
dog |
NaN |
3 |
yes |
e |
dog |
5.0 |
2 |
no |
24. 查看 DataFrame 的后 3 行数据
df21.tail(3)
|
animal |
age |
visits |
priority |
h |
cat |
NaN |
1 |
yes |
i |
dog |
7.0 |
2 |
no |
j |
dog |
3.0 |
1 |
no |
25.查看 DataFrame 的索引
df21.index
Index(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], dtype='object')
26. 查看 DataFrame 的列名
df21.columns
Index(['animal', 'age', 'visits', 'priority'], dtype='object')
27. 查看 DataFrame 的数值
df21.values
array([['cat', 2.5, 1, 'yes'],
['cat', 3.0, 3, 'yes'],
['snake', 0.5, 2, 'no'],
['dog', nan, 3, 'yes'],
['dog', 5.0, 2, 'no'],
['cat', 2.0, 3, 'no'],
['snake', 4.5, 1, 'no'],
['cat', nan, 1, 'yes'],
['dog', 7.0, 2, 'no'],
['dog', 3.0, 1, 'no']], dtype=object)
28. 查看 DataFrame 的统计数据
df21.describe()
|
age |
visits |
count |
8.000000 |
10.000000 |
mean |
3.437500 |
1.900000 |
std |
2.007797 |
0.875595 |
min |
0.500000 |
1.000000 |
25% |
2.375000 |
1.000000 |
50% |
3.000000 |
2.000000 |
75% |
4.625000 |
2.750000 |
max |
7.000000 |
3.000000 |
29. DataFrame 转置操作
df21.T
|
a |
b |
c |
d |
e |
f |
g |
h |
i |
j |
animal |
cat |
cat |
snake |
dog |
dog |
cat |
snake |
cat |
dog |
dog |
age |
2.5 |
3 |
0.5 |
NaN |
5 |
2 |
4.5 |
NaN |
7 |
3 |
visits |
1 |
3 |
2 |
3 |
2 |
3 |
1 |
1 |
2 |
1 |
priority |
yes |
yes |
no |
yes |
no |
no |
no |
yes |
no |
no |
30. 对 DataFrame 进行按列排序
df21.sort_values(by = 'visits')
|
animal |
age |
visits |
priority |
a |
cat |
2.5 |
1 |
yes |
g |
snake |
4.5 |
1 |
no |
h |
cat |
NaN |
1 |
yes |
j |
dog |
3.0 |
1 |
no |
c |
snake |
0.5 |
2 |
no |
e |
dog |
5.0 |
2 |
no |
i |
dog |
7.0 |
2 |
no |
b |
cat |
3.0 |
3 |
yes |
d |
dog |
NaN |
3 |
yes |
f |
cat |
2.0 |
3 |
no |
31. 对 DataFrame 数据切片
print(df21)
df21[1:3]
animal age visits priority
a cat 2.5 1 yes
b cat 3.0 3 yes
c snake 0.5 2 no
d dog NaN 3 yes
e dog 5.0 2 no
f cat 2.0 3 no
g snake 4.5 1 no
h cat NaN 1 yes
i dog 7.0 2 no
j dog 3.0 1 no
|
animal |
age |
visits |
priority |
b |
cat |
3.0 |
3 |
yes |
c |
snake |
0.5 |
2 |
no |
32. 对 DataFrame 通过标签查询(单列)
df21['age']
a 2.5
b 3.0
c 0.5
d NaN
e 5.0
f 2.0
g 4.5
h NaN
i 7.0
j 3.0
Name: age, dtype: float64
df21.age
a 2.5
b 3.0
c 0.5
d NaN
e 5.0
f 2.0
g 4.5
h NaN
i 7.0
j 3.0
Name: age, dtype: float64
33. 对 DataFrame 通过标签查询(多列)
df21[['age','visits']]
|
age |
visits |
a |
2.5 |
1 |
b |
3.0 |
3 |
c |
0.5 |
2 |
d |
NaN |
3 |
e |
5.0 |
2 |
f |
2.0 |
3 |
g |
4.5 |
1 |
h |
NaN |
1 |
i |
7.0 |
2 |
j |
3.0 |
1 |
34. 对 DataFrame 通过位置查询
df21.iloc[1:3]
|
animal |
age |
visits |
priority |
b |
cat |
3.0 |
3 |
yes |
c |
snake |
0.5 |
2 |
no |
35. DataFrame 副本拷贝
df35 = df21.copy()
df35
|
animal |
age |
visits |
priority |
a |
cat |
2.5 |
1 |
yes |
b |
cat |
3.0 |
3 |
yes |
c |
snake |
0.5 |
2 |
no |
d |
dog |
NaN |
3 |
yes |
e |
dog |
5.0 |
2 |
no |
f |
cat |
2.0 |
3 |
no |
g |
snake |
4.5 |
1 |
no |
h |
cat |
NaN |
1 |
yes |
i |
dog |
7.0 |
2 |
no |
j |
dog |
3.0 |
1 |
no |
36. 判断 DataFrame 元素是否为空
a = df3.isnull()
a
|
animal |
age |
visits |
priority |
No. |
1 |
False |
False |
False |
False |
False |
2 |
False |
False |
False |
False |
False |
3 |
False |
False |
False |
False |
False |
4 |
False |
True |
False |
False |
False |
5 |
False |
False |
False |
False |
False |
6 |
False |
False |
False |
False |
False |
7 |
False |
False |
False |
False |
False |
8 |
False |
True |
False |
False |
False |
9 |
False |
False |
False |
False |
False |
10 |
False |
False |
False |
False |
False |
f |
True |
False |
True |
True |
True |
a.describe()
|
animal |
age |
visits |
priority |
No. |
count |
11 |
11 |
11 |
11 |
11 |
unique |
2 |
2 |
2 |
2 |
2 |
top |
False |
False |
False |
False |
False |
freq |
10 |
9 |
10 |
10 |
10 |
37. 添加列数据
print(df35)
num = pd.Series([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], index=df35.index)
df35['No.'] = num
df35
animal age visits priority
a cat 2.5 1 yes
b cat 3.0 3 yes
c snake 0.5 2 no
d dog NaN 3 yes
e dog 5.0 2 no
f cat 2.0 3 no
g snake 4.5 1 no
h cat NaN 1 yes
i dog 7.0 2 no
j dog 3.0 1 no
|
animal |
age |
visits |
priority |
No. |
a |
cat |
2.5 |
1 |
yes |
0 |
b |
cat |
3.0 |
3 |
yes |
1 |
c |
snake |
0.5 |
2 |
no |
2 |
d |
dog |
NaN |
3 |
yes |
3 |
e |
dog |
5.0 |
2 |
no |
4 |
f |
cat |
2.0 |
3 |
no |
5 |
g |
snake |
4.5 |
1 |
no |
6 |
h |
cat |
NaN |
1 |
yes |
7 |
i |
dog |
7.0 |
2 |
no |
8 |
j |
dog |
3.0 |
1 |
no |
9 |
38. 根据 DataFrame 的下标值进行更改
df3.iat[1, 1] = 2
df3
|
animal |
age |
visits |
priority |
No. |
1 |
cat |
2.5 |
1.0 |
yes |
0.0 |
2 |
cat |
2.0 |
3.0 |
yes |
1.0 |
3 |
snake |
0.5 |
2.0 |
no |
2.0 |
4 |
dog |
NaN |
3.0 |
yes |
3.0 |
5 |
dog |
5.0 |
2.0 |
no |
4.0 |
6 |
cat |
2.0 |
3.0 |
no |
5.0 |
7 |
snake |
4.5 |
1.0 |
no |
6.0 |
8 |
cat |
NaN |
1.0 |
yes |
7.0 |
9 |
dog |
7.0 |
2.0 |
no |
8.0 |
10 |
dog |
3.0 |
1.0 |
no |
9.0 |
f |
NaN |
1.5 |
NaN |
NaN |
NaN |
39. 根据 DataFrame 的标签对数据进行修改
df3.loc['f', 'age'] = 1.5
df3
|
animal |
age |
visits |
priority |
No. |
1 |
cat |
2.5 |
1.0 |
yes |
0.0 |
2 |
cat |
2.0 |
3.0 |
yes |
1.0 |
3 |
snake |
0.5 |
2.0 |
no |
2.0 |
4 |
dog |
NaN |
3.0 |
yes |
3.0 |
5 |
dog |
5.0 |
2.0 |
no |
4.0 |
6 |
cat |
2.0 |
3.0 |
no |
5.0 |
7 |
snake |
4.5 |
1.0 |
no |
6.0 |
8 |
cat |
NaN |
1.0 |
yes |
7.0 |
9 |
dog |
7.0 |
2.0 |
no |
8.0 |
10 |
dog |
3.0 |
1.0 |
no |
9.0 |
f |
NaN |
1.5 |
NaN |
NaN |
NaN |
40. DataFrame 求平均值操作
df3.mean()
age 3.111111
visits 1.900000
No. 4.500000
dtype: float64
41. 对 DataFrame 中任意列做求和操作
df3['visits'].sum()
19.0
42. 将字符串转化为小写字母:
string = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca',
np.nan, 'CABA', 'dog', 'cat'])
print(string)
string.str.lower()
0 A
1 B
2 C
3 Aaba
4 Baca
5 NaN
6 CABA
7 dog
8 cat
dtype: object
0 a
1 b
2 c
3 aaba
4 baca
5 NaN
6 caba
7 dog
8 cat
dtype: object
43. 将字符串转化为大写字母
string.str.upper()
0 A
1 B
2 C
3 AABA
4 BACA
5 NaN
6 CABA
7 DOG
8 CAT
dtype: object
44. 对缺失值进行填充
df44 = df35.copy()
print(df44)
df44.fillna(value=3)
animal age visits priority No.
a cat 2.5 1 yes 0
b cat 3.0 3 yes 1
c snake 0.5 2 no 2
d dog NaN 3 yes 3
e dog 5.0 2 no 4
f cat 2.0 3 no 5
g snake 4.5 1 no 6
h cat NaN 1 yes 7
i dog 7.0 2 no 8
j dog 3.0 1 no 9
|
animal |
age |
visits |
priority |
No. |
a |
cat |
2.5 |
1 |
yes |
0 |
b |
cat |
3.0 |
3 |
yes |
1 |
c |
snake |
0.5 |
2 |
no |
2 |
d |
dog |
3.0 |
3 |
yes |
3 |
e |
dog |
5.0 |
2 |
no |
4 |
f |
cat |
2.0 |
3 |
no |
5 |
g |
snake |
4.5 |
1 |
no |
6 |
h |
cat |
3.0 |
1 |
yes |
7 |
i |
dog |
7.0 |
2 |
no |
8 |
j |
dog |
3.0 |
1 |
no |
9 |
45. 删除存在缺失值的行
df45 = df35.copy()
print(df45)
df45.dropna(how='any')
animal age visits priority No.
a cat 2.5 1 yes 0
b cat 3.0 3 yes 1
c snake 0.5 2 no 2
d dog NaN 3 yes 3
e dog 5.0 2 no 4
f cat 2.0 3 no 5
g snake 4.5 1 no 6
h cat NaN 1 yes 7
i dog 7.0 2 no 8
j dog 3.0 1 no 9
|
animal |
age |
visits |
priority |
No. |
a |
cat |
2.5 |
1 |
yes |
0 |
b |
cat |
3.0 |
3 |
yes |
1 |
c |
snake |
0.5 |
2 |
no |
2 |
e |
dog |
5.0 |
2 |
no |
4 |
f |
cat |
2.0 |
3 |
no |
5 |
g |
snake |
4.5 |
1 |
no |
6 |
i |
dog |
7.0 |
2 |
no |
8 |
j |
dog |
3.0 |
1 |
no |
9 |
46. DataFrame 按指定列对齐
left = pd.DataFrame({'key': ['foo1', 'foo2'], 'one': [1, 2]})
right = pd.DataFrame({'key': ['foo2', 'foo3'], 'two': [4, 5]})
print(left)
print(right)
pd.merge(left, right, on='key')
key one
0 foo1 1
1 foo2 2
key two
0 foo2 4
1 foo3 5
47.DataFrame 文件操作
df35.to_csv('animal.csv')
print("写入成功.")
写入成功.
48. CSV 文件读取
df_animal = pd.read_csv('animal.csv')
df_animal
|
Unnamed: 0 |
animal |
age |
visits |
priority |
No. |
0 |
a |
cat |
2.5 |
1 |
yes |
0 |
1 |
b |
cat |
3.0 |
3 |
yes |
1 |
2 |
c |
snake |
0.5 |
2 |
no |
2 |
3 |
d |
dog |
NaN |
3 |
yes |
3 |
4 |
e |
dog |
5.0 |
2 |
no |
4 |
5 |
f |
cat |
2.0 |
3 |
no |
5 |
6 |
g |
snake |
4.5 |
1 |
no |
6 |
7 |
h |
cat |
NaN |
1 |
yes |
7 |
8 |
i |
dog |
7.0 |
2 |
no |
8 |
9 |
j |
dog |
3.0 |
1 |
no |
9 |
49. Excel 写入操作
df35.to_excel('animal.xlsx', sheet_name='Sheet1')
print("写入成功.")
写入成功.
50. Excel 读取操作
pd.read_excel('animal.xlsx', 'Sheet1', index_col=None, na_values=['NA'])
|
Unnamed: 0 |
animal |
age |
visits |
priority |
No. |
0 |
a |
cat |
2.5 |
1 |
yes |
0 |
1 |
b |
cat |
3.0 |
3 |
yes |
1 |
2 |
c |
snake |
0.5 |
2 |
no |
2 |
3 |
d |
dog |
NaN |
3 |
yes |
3 |
4 |
e |
dog |
5.0 |
2 |
no |
4 |
5 |
f |
cat |
2.0 |
3 |
no |
5 |
6 |
g |
snake |
4.5 |
1 |
no |
6 |
7 |
h |
cat |
NaN |
1 |
yes |
7 |
8 |
i |
dog |
7.0 |
2 |
no |
8 |
9 |
j |
dog |
3.0 |
1 |
no |
9 |