Pandas常用基本操作---第一节

Pandas基本操作—第一节

以下是实验所用的数据表,需要数据表练习的请留言
Pandas常用基本操作---第一节_第1张图片
本文的所有试验均基于jupyter notebook

使用Pandas读取csv文件并显示

import pandas as pd
#读取csv数据文件
food_info = pd.read_csv('food_info.csv')
#查看数据类型   DataFrame数据类型是pandas的核心数据类型之一
print(type(food_info))
#查看数据
food_info[0:10]  #仅显示前10行

NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
0 1001 BUTTER WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 51.368 21.021 3.043 215.0
1 1002 BUTTER WHIPPED WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 50.489 23.426 3.012 219.0
2 1003 BUTTER OIL ANHYDROUS 0.24 876 0.28 99.48 0.00 0.00 0.0 0.00 ... 3069.0 840.0 2.80 1.8 73.0 8.6 61.924 28.732 3.694 256.0
3 1004 CHEESE BLUE 42.41 353 21.40 28.74 5.11 2.34 0.0 0.50 ... 721.0 198.0 0.25 0.5 21.0 2.4 18.669 7.778 0.800 75.0
4 1005 CHEESE BRICK 41.11 371 23.24 29.68 3.18 2.79 0.0 0.51 ... 1080.0 292.0 0.26 0.5 22.0 2.5 18.764 8.598 0.784 94.0
5 1006 CHEESE BRIE 48.42 334 20.75 27.68 2.70 0.45 0.0 0.45 ... 592.0 174.0 0.24 0.5 20.0 2.3 17.410 8.013 0.826 100.0
6 1007 CHEESE CAMEMBERT 51.80 300 19.80 24.26 3.68 0.46 0.0 0.46 ... 820.0 241.0 0.21 0.4 18.0 2.0 15.259 7.023 0.724 72.0
7 1008 CHEESE CARAWAY 39.28 376 25.18 29.20 3.28 3.06 0.0 NaN ... 1054.0 271.0 NaN NaN NaN NaN 18.584 8.275 0.830 93.0
8 1009 CHEESE CHEDDAR 37.10 406 24.04 33.82 3.71 1.33 0.0 0.28 ... 994.0 263.0 0.78 0.6 24.0 2.9 19.368 8.428 1.433 102.0
9 1010 CHEESE CHESHIRE 37.65 387 23.37 30.60 3.60 4.78 0.0 NaN ... 985.0 233.0 NaN NaN NaN NaN 19.475 8.671 0.870 103.0

10 rows × 36 columns

查看读入的csv文件的每一列的数据类型

print(food_info.dtypes)  
#返回每一列的数据类型  最常见的三种int64  float64  object(即string)
NDB_No               int64
Shrt_Desc           object
Water_(g)          float64
Energ_Kcal           int64
Protein_(g)        float64
Lipid_Tot_(g)      float64
Ash_(g)            float64
Carbohydrt_(g)     float64
Fiber_TD_(g)       float64
Sugar_Tot_(g)      float64
Calcium_(mg)       float64
Iron_(mg)          float64
Magnesium_(mg)     float64
Phosphorus_(mg)    float64
Potassium_(mg)     float64
Sodium_(mg)        float64
Zinc_(mg)          float64
Copper_(mg)        float64
Manganese_(mg)     float64
Selenium_(mcg)     float64
Vit_C_(mg)         float64
Thiamin_(mg)       float64
Riboflavin_(mg)    float64
Niacin_(mg)        float64
Vit_B6_(mg)        float64
Vit_B12_(mcg)      float64
Vit_A_IU           float64
Vit_A_RAE          float64
Vit_E_(mg)         float64
Vit_D_mcg          float64
Vit_D_IU           float64
Vit_K_(mcg)        float64
FA_Sat_(g)         float64
FA_Mono_(g)        float64
FA_Poly_(g)        float64
Cholestrl_(mg)     float64
dtype: object

显示前表格的前五行

#显示表格
food_info.head() #默认只显示前5行
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
0 1001 BUTTER WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 51.368 21.021 3.043 215.0
1 1002 BUTTER WHIPPED WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 50.489 23.426 3.012 219.0
2 1003 BUTTER OIL ANHYDROUS 0.24 876 0.28 99.48 0.00 0.00 0.0 0.00 ... 3069.0 840.0 2.80 1.8 73.0 8.6 61.924 28.732 3.694 256.0
3 1004 CHEESE BLUE 42.41 353 21.40 28.74 5.11 2.34 0.0 0.50 ... 721.0 198.0 0.25 0.5 21.0 2.4 18.669 7.778 0.800 75.0
4 1005 CHEESE BRICK 41.11 371 23.24 29.68 3.18 2.79 0.0 0.51 ... 1080.0 292.0 0.26 0.5 22.0 2.5 18.764 8.598 0.784 94.0

5 rows × 36 columns

显示前表格的前几行

#当需要显示别的数目的行数时可以给head传参
food_info.head(2)  #显示两行
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
0 1001 BUTTER WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 51.368 21.021 3.043 215.0
1 1002 BUTTER WHIPPED WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 50.489 23.426 3.012 219.0

2 rows × 36 columns

显示表格的最后五行

#可以查看开头几行,当然也能查看尾几行
food_info.tail()#默认5行
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
8613 83110 MACKEREL SALTED 43.00 305 18.50 25.10 13.40 0.00 0.0 0.0 ... 157.0 47.0 2.38 25.2 1006.0 7.8 7.148 8.320 6.210 95.0
8614 90240 SCALLOP (BAY&SEA) CKD STMD 70.25 111 20.54 0.84 2.97 5.41 0.0 0.0 ... 5.0 2.0 0.00 0.0 2.0 0.0 0.218 0.082 0.222 41.0
8615 90480 SYRUP CANE 26.00 269 0.00 0.00 0.86 73.14 0.0 73.2 ... 0.0 0.0 0.00 0.0 0.0 0.0 0.000 0.000 0.000 0.0
8616 90560 SNAIL RAW 79.20 90 16.10 1.40 1.30 2.00 0.0 0.0 ... 100.0 30.0 5.00 0.0 0.0 0.1 0.361 0.259 0.252 50.0
8617 93600 TURTLE GREEN RAW 78.50 89 19.80 0.50 1.20 0.00 0.0 0.0 ... 100.0 30.0 0.50 0.0 0.0 0.1 0.127 0.088 0.170 50.0

5 rows × 36 columns

显示表格的最后几行

food_info.tail(2)#只显示最后两行
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
8616 90560 SNAIL RAW 79.2 90 16.1 1.4 1.3 2.0 0.0 0.0 ... 100.0 30.0 5.0 0.0 0.0 0.1 0.361 0.259 0.252 50.0
8617 93600 TURTLE GREEN RAW 78.5 89 19.8 0.5 1.2 0.0 0.0 0.0 ... 100.0 30.0 0.5 0.0 0.0 0.1 0.127 0.088 0.170 50.0

2 rows × 36 columns

查看csv文件的所有列名

#显示列名
print(food_info.columns,type(food_info.columns))
Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)',
       'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)',
       'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)',
       'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)',
       'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)',
       'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)',
       'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg',
       'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)',
       'Cholestrl_(mg)'],
      dtype='object') 

查看当前读入的csv文件的数据结构

#查看当前数据的结构多少行多少列
print(food_info.shape)
(8618, 36)

对当前的数据文件进行索引

#pands读入的csv文件也可以进行索引和切片但是要通过一个loc方法
food_info.loc[0]  #只显示第一行也就是csv文件中的列名那一行
NDB_No                         1001
Shrt_Desc          BUTTER WITH SALT
Water_(g)                     15.87
Energ_Kcal                      717
Protein_(g)                    0.85
Lipid_Tot_(g)                 81.11
Ash_(g)                        2.11
Carbohydrt_(g)                 0.06
Fiber_TD_(g)                      0
Sugar_Tot_(g)                  0.06
Calcium_(mg)                     24
Iron_(mg)                      0.02
Magnesium_(mg)                    2
Phosphorus_(mg)                  24
Potassium_(mg)                   24
Sodium_(mg)                     643
Zinc_(mg)                      0.09
Copper_(mg)                       0
Manganese_(mg)                    0
Selenium_(mcg)                    1
Vit_C_(mg)                        0
Thiamin_(mg)                  0.005
Riboflavin_(mg)               0.034
Niacin_(mg)                   0.042
Vit_B6_(mg)                   0.003
Vit_B12_(mcg)                  0.17
Vit_A_IU                       2499
Vit_A_RAE                       684
Vit_E_(mg)                     2.32
Vit_D_mcg                       1.5
Vit_D_IU                         60
Vit_K_(mcg)                       7
FA_Sat_(g)                   51.368
FA_Mono_(g)                  21.021
FA_Poly_(g)                   3.043
Cholestrl_(mg)                  215
Name: 0, dtype: object

对当前的csv文件进行切片操作

#对csv文件数据的切片
food_info.loc[0:5]  #查看前5行与food.head显示的一致
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
0 1001 BUTTER WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 51.368 21.021 3.043 215.0
1 1002 BUTTER WHIPPED WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 50.489 23.426 3.012 219.0
2 1003 BUTTER OIL ANHYDROUS 0.24 876 0.28 99.48 0.00 0.00 0.0 0.00 ... 3069.0 840.0 2.80 1.8 73.0 8.6 61.924 28.732 3.694 256.0
3 1004 CHEESE BLUE 42.41 353 21.40 28.74 5.11 2.34 0.0 0.50 ... 721.0 198.0 0.25 0.5 21.0 2.4 18.669 7.778 0.800 75.0
4 1005 CHEESE BRICK 41.11 371 23.24 29.68 3.18 2.79 0.0 0.51 ... 1080.0 292.0 0.26 0.5 22.0 2.5 18.764 8.598 0.784 94.0
5 1006 CHEESE BRIE 48.42 334 20.75 27.68 2.70 0.45 0.0 0.45 ... 592.0 174.0 0.24 0.5 20.0 2.3 17.410 8.013 0.826 100.0

6 rows × 36 columns

food_info.loc[0:9:2]  #类似Python中的切片,food_info.loc[开始,结束,步长],这里显示前10行的偶数行
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
0 1001 BUTTER WITH SALT 15.87 717 0.85 81.11 2.11 0.06 0.0 0.06 ... 2499.0 684.0 2.32 1.5 60.0 7.0 51.368 21.021 3.043 215.0
2 1003 BUTTER OIL ANHYDROUS 0.24 876 0.28 99.48 0.00 0.00 0.0 0.00 ... 3069.0 840.0 2.80 1.8 73.0 8.6 61.924 28.732 3.694 256.0
4 1005 CHEESE BRICK 41.11 371 23.24 29.68 3.18 2.79 0.0 0.51 ... 1080.0 292.0 0.26 0.5 22.0 2.5 18.764 8.598 0.784 94.0
6 1007 CHEESE CAMEMBERT 51.80 300 19.80 24.26 3.68 0.46 0.0 0.46 ... 820.0 241.0 0.21 0.4 18.0 2.0 15.259 7.023 0.724 72.0
8 1009 CHEESE CHEDDAR 37.10 406 24.04 33.82 3.71 1.33 0.0 0.28 ... 994.0 263.0 0.78 0.6 24.0 2.9 19.368 8.428 1.433 102.0

5 rows × 36 columns

通过列索引取数据

#通过索引可以按行取数据,当然也可以通过列索引来取整列的数据
food_info['NDB_No'][0:5]  #只显示NDB_No这一列的前5条数据
0    1001
1    1002
2    1003
3    1004
4    1005
Name: NDB_No, dtype: int64

获取所有以g结尾的列名

#现在有一个需求要获取所有以g结尾的列名
columns = food_info.columns #这里得到的columns并不是list而是可以使用tolist方法转为list类型
print(columns.tolist(),type(columns.tolist()))
print('*'*100)
g_columns = []
for this_column in columns:
    if this_column.endswith('(g)'):
        g_columns.append(this_column)
print(g_columns)
food_info[g_columns][0:5]  #所有以g结尾的有29列  只显示前5行
['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)', 'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)', 'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)', 'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)', 'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)', 'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)', 'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg', 'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)', 'Cholestrl_(mg)'] 
****************************************************************************************************
['Water_(g)', 'Protein_(g)', 'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)', 'Sugar_Tot_(g)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)']
Water_(g) Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g)
0 315.87 0.85 81.11 2.11 0.06 0.0 0.06 51.368 21.021 3.043
1 315.87 0.85 81.11 2.11 0.06 0.0 0.06 50.489 23.426 3.012
2 300.24 0.28 99.48 0.00 0.00 0.0 0.00 61.924 28.732 3.694
3 342.41 21.40 28.74 5.11 2.34 0.0 0.50 18.669 7.778 0.800
4 341.11 23.24 29.68 3.18 2.79 0.0 0.51 18.764 8.598 0.784

Pandas对csv文件的基本数学运算

#针对数据的基本数学运算 +|-|×|÷  与常数运算每一个值与常数运算,列与列运算,列的对应位置运算
#将Iron_(mg)这一列的数据转换成以g为单位
(food_info['Iron_(mg)']/1000)[0:5]  #只显示前五条数据
0    0.00002
1    0.00016
2    0.00000
3    0.00031
4    0.00043
Name: Iron_(mg), dtype: float64
#将Protein_(g)这一列的值全部加100
food_info['Protein_(g)'] += 100
food_info['Protein_(g)'][0:5]  #只显示前五行
0    100.85
1    100.85
2    100.28
3    121.40
4    123.24
Name: Protein_(g), dtype: float64
#将Lipid_Tot_(g)这一列的值全部减去10
food_info['Lipid_Tot_(g)'] -= 10
food_info['Lipid_Tot_(g)'][0:5]  #只显示前五行
0    71.11
1    71.11
2    89.48
3    18.74
4    19.68
Name: Lipid_Tot_(g), dtype: float64
#将water_(g)与Protein_(g)相乘
new_lipid = food_info['Lipid_Tot_(g)'] * food_info['Carbohydrt_(g)']
new_lipid[0:5]
0     4.2666
1     4.2666
2     0.0000
3    43.8516
4    54.9072
dtype: float64

按某一列的值排序操作

food_info.sort_values('Water_(g)',inplace=True)  #inplace = True不生成新的dataframe
food_info.sort_values('Water_(g)',inplace=False)[0:5]#inplace = False,默认为false生成新的dataframe只显示前5行
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) ... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
676 4544 SHORTENING HOUSEHOLD LARD&VEG OIL 400.0 900 0.0 90.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 21.5 40.3 44.4 10.9 56.0
664 4520 FAT MUTTON TALLOW 400.0 902 0.0 90.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 2.8 0.7 28.0 0.0 47.3 40.6 7.8 102.0
665 4528 OIL WALNUT 400.0 884 0.0 90.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.4 0.0 0.0 15.0 9.1 22.8 63.3 0.0
666 4529 OIL ALMOND 400.0 884 0.0 90.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 39.2 0.0 0.0 7.0 8.2 69.9 17.4 0.0
667 4530 OIL APRICOT KERNEL 400.0 884 0.0 90.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 4.0 NaN NaN NaN 6.3 60.0 29.3 0.0

5 rows × 36 columns

按某一列的值索引值排序返回索引与列值

food_info['Water_(g)'].sort_index(ascending=False)[0:5]  #对某一列索引值的排序  只显示前五行
8617    478.50
8616    479.20
8615    426.00
8614    470.25
8613    443.00
Name: Water_(g), dtype: float64

你可能感兴趣的:(Pandas基本操作,读取文件,pandas数据处理)