(六)pandas知识学习1-python数据分析与机器学习实战(学习笔记)

文章原创,最近更新:2018-05-2

1.pandas数据读取
2.pandas索引与计算

课程来源: python数据分析与机器学习实战-唐宇迪

1.pandas数据读取

1.1read_csv函数的运用

food_info.csv这个文件是关于食品包含各种各样的维生素的指标.csv是以逗号为分隔符的文件.
用.read_csv这个函数读取文件的数据.
food_info是一个DataFrame格式.
用dtype当前的数据文件包含几种数据类型的结构. object是字符串类型

n [10]: import pandas
food_info=pandas.read_csv("food_info.csv")

type(food_info)
Out[13]: pandas.core.frame.DataFrame

food_info.dtypes
Out[14]: 
NDB_No               int64
Shrt_Desc           object
Water_(g)          float64
Energ_Kcal           int64
Protein_(g)        float64
Lipid_Tot_(g)      float64
Ash_(g)            float64
Carbohydrt_(g)     float64
Fiber_TD_(g)       float64
Sugar_Tot_(g)      float64
Calcium_(mg)       float64
Iron_(mg)          float64
Magnesium_(mg)     float64
Phosphorus_(mg)    float64
Potassium_(mg)     float64
Sodium_(mg)        float64
Zinc_(mg)          float64
Copper_(mg)        float64
Manganese_(mg)     float64
Selenium_(mcg)     float64
Vit_C_(mg)         float64
Thiamin_(mg)       float64
Riboflavin_(mg)    float64
Niacin_(mg)        float64
Vit_B6_(mg)        float64
Vit_B12_(mcg)      float64
Vit_A_IU           float64
Vit_A_RAE          float64
Vit_E_(mg)         float64
Vit_D_mcg          float64
Vit_D_IU           float64
Vit_K_(mcg)        float64
FA_Sat_(g)         float64
FA_Mono_(g)        float64
FA_Poly_(g)        float64
Cholestrl_(mg)     float64
dtype: object

用help()函数深入了解pandas.read_csv.常规用法,参数等等.


(六)pandas知识学习1-python数据分析与机器学习实战(学习笔记)_第1张图片

拓外:
使用pandas下的read_csv方法,读取csv文件,参数是文件的路径,这是一个相对路径,是相对于当前工作目录的,那么如何知道当前的工作目录呢?
使用os.getcwd()方法获取当前工作目录

import os

os.getcwd()
Out[11]: 'C:\\Users\\Administrator'

将文件放置在 'C:\Users\Administrator'这个路径里.

1.2head函数的运用

.head()将food_info.csv文件里面的前5条数据显示出来.

import pandas as pd

food_info=pandas.read_csv("food_info.csv")

food_info.head()
Out[19]: 
   NDB_No                 Shrt_Desc  Water_(g)  Energ_Kcal  Protein_(g)  \
0    1001          BUTTER WITH SALT      15.87         717         0.85   
1    1002  BUTTER WHIPPED WITH SALT      15.87         717         0.85   
2    1003      BUTTER OIL ANHYDROUS       0.24         876         0.28   
3    1004               CHEESE BLUE      42.41         353        21.40   
4    1005              CHEESE BRICK      41.11         371        23.24   

   Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  Fiber_TD_(g)  Sugar_Tot_(g)  \
0          81.11     2.11            0.06           0.0           0.06   
1          81.11     2.11            0.06           0.0           0.06   
2          99.48     0.00            0.00           0.0           0.00   
3          28.74     5.11            2.34           0.0           0.50   
4          29.68     3.18            2.79           0.0           0.51   

       Vit_A_IU  Vit_A_RAE  Vit_E_(mg)  Vit_D_mcg  Vit_D_IU  \
0       ...          2499.0      684.0        2.32        1.5      60.0   
1       ...          2499.0      684.0        2.32        1.5      60.0   
2       ...          3069.0      840.0        2.80        1.8      73.0   
3       ...           721.0      198.0        0.25        0.5      21.0   
4       ...          1080.0      292.0        0.26        0.5      22.0   

   Vit_K_(mcg)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  Cholestrl_(mg)  
0          7.0      51.368       21.021        3.043           215.0  
1          7.0      50.489       23.426        3.012           219.0  
2          8.6      61.924       28.732        3.694           256.0  
3          2.4      18.669        7.778        0.800            75.0  
4          2.5      18.764        8.598        0.784            94.0  

[5 rows x 36 columns]

如果只是想显示前3条数据应该怎么办?
.head(3)就可以显示前3条数据了.

food_info.head(3)
Out[20]: 
   NDB_No                 Shrt_Desc  Water_(g)  Energ_Kcal  Protein_(g)  \
0    1001          BUTTER WITH SALT      15.87         717         0.85   
1    1002  BUTTER WHIPPED WITH SALT      15.87         717         0.85   
2    1003      BUTTER OIL ANHYDROUS       0.24         876         0.28   

   Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  Fiber_TD_(g)  Sugar_Tot_(g)  \
0          81.11     2.11            0.06           0.0           0.06   
1          81.11     2.11            0.06           0.0           0.06   
2          99.48     0.00            0.00           0.0           0.00   

       Vit_A_IU  Vit_A_RAE  Vit_E_(mg)  Vit_D_mcg  Vit_D_IU  \
0       ...          2499.0      684.0        2.32        1.5      60.0   
1       ...          2499.0      684.0        2.32        1.5      60.0   
2       ...          3069.0      840.0        2.80        1.8      73.0   

   Vit_K_(mcg)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  Cholestrl_(mg)  
0          7.0      51.368       21.021        3.043           215.0  
1          7.0      50.489       23.426        3.012           219.0  
2          8.6      61.924       28.732        3.694           256.0  

[3 rows x 36 columns]

1.3tail函数的运用

tail函数默认显示文件的后5行.

import pandas as pd

food_info=pandas.read_csv("food_info.csv")

food_info.tail()
Out[21]: 
      NDB_No                   Shrt_Desc  Water_(g)  Energ_Kcal  Protein_(g)  \
8613   83110             MACKEREL SALTED      43.00         305        18.50   
8614   90240  SCALLOP (BAY&SEA) CKD STMD      70.25         111        20.54   
8615   90480                  SYRUP CANE      26.00         269         0.00   
8616   90560                   SNAIL RAW      79.20          90        16.10   
8617   93600            TURTLE GREEN RAW      78.50          89        19.80   

      Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  Fiber_TD_(g)  Sugar_Tot_(g)  \
8613          25.10    13.40            0.00           0.0            0.0   
8614           0.84     2.97            5.41           0.0            0.0   
8615           0.00     0.86           73.14           0.0           73.2   
8616           1.40     1.30            2.00           0.0            0.0   
8617           0.50     1.20            0.00           0.0            0.0   

       Vit_A_IU  Vit_A_RAE  Vit_E_(mg)  Vit_D_mcg  Vit_D_IU  \
8613       ...           157.0       47.0        2.38       25.2    1006.0   
8614       ...             5.0        2.0        0.00        0.0       2.0   
8615       ...             0.0        0.0        0.00        0.0       0.0   
8616       ...           100.0       30.0        5.00        0.0       0.0   
8617       ...           100.0       30.0        0.50        0.0       0.0   

      Vit_K_(mcg)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  Cholestrl_(mg)  
8613          7.8       7.148        8.320        6.210            95.0  
8614          0.0       0.218        0.082        0.222            41.0  
8615          0.0       0.000        0.000        0.000             0.0  
8616          0.1       0.361        0.259        0.252            50.0  
8617          0.1       0.127        0.088        0.170            50.0  

[5 rows x 36 columns]

1.4columns函数的运用

运用columns函数打印文件的第一行的标题也就是列名.

food_info.columns
Out[22]: 
Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)',
       'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)',
       'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)',
       'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)',
       'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)',
       'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)',
       'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg',
       'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)',
       'Cholestrl_(mg)'],
      dtype='object')

1.5shape函数的运用

运用shape可以知道DataFrame的由几行几列构成的

food_info.shape
Out[23]: (8618, 36)

当前这份数据是一共由8000多个样本,每个样本有36个指标.

2.pandas索引与计算

2.1用loc进行数据选取(行数据)

对于DataFrame的行的标签索引,引入了特殊的标签运算符loc。它们可以让你用类似NumPy的标记,使用轴标签(loc),从DataFrame选择行和列的子集。

import pandas as pd
food_info=pandas.read_csv("food_info.csv")

food_info.loc[0]
Out[24]: 
NDB_No                         1001
Shrt_Desc          BUTTER WITH SALT
Water_(g)                     15.87
Energ_Kcal                      717
Protein_(g)                    0.85
Lipid_Tot_(g)                 81.11
Ash_(g)                        2.11
Carbohydrt_(g)                 0.06
Fiber_TD_(g)                      0
Sugar_Tot_(g)                  0.06
Calcium_(mg)                     24
Iron_(mg)                      0.02
Magnesium_(mg)                    2
Phosphorus_(mg)                  24
Potassium_(mg)                   24
Sodium_(mg)                     643
Zinc_(mg)                      0.09
Copper_(mg)                       0
Manganese_(mg)                    0
Selenium_(mcg)                    1
Vit_C_(mg)                        0
Thiamin_(mg)                  0.005
Riboflavin_(mg)               0.034
Niacin_(mg)                   0.042
Vit_B6_(mg)                   0.003
Vit_B12_(mcg)                  0.17
Vit_A_IU                       2499
Vit_A_RAE                       684
Vit_E_(mg)                     2.32
Vit_D_mcg                       1.5
Vit_D_IU                         60
Vit_K_(mcg)                       7
FA_Sat_(g)                   51.368
FA_Mono_(g)                  21.021
FA_Poly_(g)                   3.043
Cholestrl_(mg)                  215
Name: 0, dtype: object

food_info.loc[0]显示的结果是相当于文件数据中的第0行的内容.,行数是从0开始算起.


food_info.loc[6]显示的结果是相当于文件数据中的第6行的内容.

food_info.loc[6]
Out[26]: 
NDB_No                         1007
Shrt_Desc          CHEESE CAMEMBERT
Water_(g)                      51.8
Energ_Kcal                      300
Protein_(g)                    19.8
Lipid_Tot_(g)                 24.26
Ash_(g)                        3.68
Carbohydrt_(g)                 0.46
Fiber_TD_(g)                      0
Sugar_Tot_(g)                  0.46
Calcium_(mg)                    388
Iron_(mg)                      0.33
Magnesium_(mg)                   20
Phosphorus_(mg)                 347
Potassium_(mg)                  187
Sodium_(mg)                     842
Zinc_(mg)                      2.38
Copper_(mg)                   0.021
Manganese_(mg)                0.038
Selenium_(mcg)                 14.5
Vit_C_(mg)                        0
Thiamin_(mg)                  0.028
Riboflavin_(mg)               0.488
Niacin_(mg)                    0.63
Vit_B6_(mg)                   0.227
Vit_B12_(mcg)                   1.3
Vit_A_IU                        820
Vit_A_RAE                       241
Vit_E_(mg)                     0.21
Vit_D_mcg                       0.4
Vit_D_IU                         18
Vit_K_(mcg)                       2
FA_Sat_(g)                   15.259
FA_Mono_(g)                   7.023
FA_Poly_(g)                   0.724
Cholestrl_(mg)                   72
Name: 6, dtype: object

如果行数超过文件的行数,就会报错,如下:

food_info.loc[8620]

KeyError: 'the label [8620] is not in the [index]'

拓外:
DataFrame常见的dtype几种数据类型.

  • object - For string values,字符类型
  • int - For integer values,整型
  • float - For float values,浮点型
  • datetime - For time values时间类型
  • bool - For Boolean values布尔型

loc函数也适用于一个标签或多个标签的切片:
比如取3,4,5,6行的数据

food_info.loc[3:6]
Out[28]: 
   NDB_No         Shrt_Desc  Water_(g)  Energ_Kcal  Protein_(g)  \
3    1004       CHEESE BLUE      42.41         353        21.40   
4    1005      CHEESE BRICK      41.11         371        23.24   
5    1006       CHEESE BRIE      48.42         334        20.75   
6    1007  CHEESE CAMEMBERT      51.80         300        19.80   

   Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  Fiber_TD_(g)  Sugar_Tot_(g)  \
3          28.74     5.11            2.34           0.0           0.50   
4          29.68     3.18            2.79           0.0           0.51   
5          27.68     2.70            0.45           0.0           0.45   
6          24.26     3.68            0.46           0.0           0.46   

       Vit_A_IU  Vit_A_RAE  Vit_E_(mg)  Vit_D_mcg  Vit_D_IU  \
3       ...           721.0      198.0        0.25        0.5      21.0   
4       ...          1080.0      292.0        0.26        0.5      22.0   
5       ...           592.0      174.0        0.24        0.5      20.0   
6       ...           820.0      241.0        0.21        0.4      18.0   

   Vit_K_(mcg)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  Cholestrl_(mg)  
3          2.4      18.669        7.778        0.800            75.0  
4          2.5      18.764        8.598        0.784            94.0  
5          2.3      17.410        8.013        0.826           100.0  
6          2.0      15.259        7.023        0.724            72.0  

[4 rows x 36 columns]

比如取第2,5,6行的数据

wo_five_ten = [2,5,10]

food_info.loc[wo_five_ten]#等价与food_info.loc[[2,5,10]]
Out[33]: 
    NDB_No             Shrt_Desc  Water_(g)  Energ_Kcal  Protein_(g)  \
2     1003  BUTTER OIL ANHYDROUS       0.24         876         0.28   
5     1006           CHEESE BRIE      48.42         334        20.75   
10    1011          CHEESE COLBY      38.20         394        23.76   

    Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  Fiber_TD_(g)  Sugar_Tot_(g)  \
2           99.48     0.00            0.00           0.0           0.00   
5           27.68     2.70            0.45           0.0           0.45   
10          32.11     3.36            2.57           0.0           0.52   

       Vit_A_IU  Vit_A_RAE  Vit_E_(mg)  Vit_D_mcg  Vit_D_IU  \
2        ...          3069.0      840.0        2.80        1.8      73.0   
5        ...           592.0      174.0        0.24        0.5      20.0   
10       ...           994.0      264.0        0.28        0.6      24.0   

    Vit_K_(mcg)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  Cholestrl_(mg)  
2           8.6      61.924       28.732        3.694           256.0  
5           2.3      17.410        8.013        0.826           100.0  
10          2.7      20.218        9.280        0.953            95.0  

[3 rows x 36 columns]

2.2列名进行数据选取(列数据)

如何抽取1列的数据?
food_info[columns],直接将列名传入到columns,具体用法如下:

ndb_col=food_info["NDB_No"]

ndb_col
Out[36]: 
0        1001
1        1002
2        1003
3        1004
4        1005
5        1006
6        1007
7        1008
8        1009
9        1010
10       1011
11       1012
12       1013
13       1014
14       1015
15       1016
16       1017
17       1018
18       1019
19       1020
20       1021
21       1022
22       1023
23       1024
24       1025
25       1026
26       1027
27       1028
28       1029
29       1030
 
8588    43544
8589    43546
8590    43550
8591    43566
8592    43570
8593    43572
8594    43585
8595    43589
8596    43595
8597    43597
8598    43598
8599    44005
8600    44018
8601    44048
8602    44055
8603    44061
8604    44074
8605    44110
8606    44158
8607    44203
8608    44258
8609    44259
8610    44260
8611    48052
8612    80200
8613    83110
8614    90240
8615    90480
8616    90560
8617    93600
Name: NDB_No, Length: 8618, dtype: int64

相当于文件列名为"NDB_No"所在列所有的数据.


(六)pandas知识学习1-python数据分析与机器学习实战(学习笔记)_第2张图片

那抽取2列的数据又是如何抽取?
food_info[columns],以列表的形式将列名传入到columns.具体用法如下:

food_info[["Zinc_(mg)","Copper_(mg)"]]
Out[37]: 
      Zinc_(mg)  Copper_(mg)
0          0.09        0.000
1          0.05        0.016
2          0.01        0.001
3          2.66        0.040
4          2.60        0.024
5          2.38        0.019
6          2.38        0.021
7          2.94        0.024
8          3.43        0.056
9          2.79        0.042
10         3.07        0.042
11         0.40        0.029
12         0.33        0.040
13         0.47        0.030
14         0.51        0.033
15         0.38        0.028
16         0.51        0.019
17         3.75        0.036
18         2.88        0.032
19         3.50        0.025
20         1.14        0.080
21         3.90        0.036
22         3.90        0.032
23         2.10        0.021
24         3.00        0.032
25         2.92        0.011
26         2.46        0.022
27         2.76        0.025
28         3.61        0.034
29         2.81        0.031
        ...          ...
8588       3.30        0.377
8589       0.05        0.040
8590       0.05        0.030
8591       1.15        0.116
8592       5.03        0.200
8593       3.83        0.545
8594       0.08        0.035
8595       3.90        0.027
8596       4.10        0.100
8597       3.13        0.027
8598       0.13        0.000
8599       0.02        0.000
8600       0.09        0.037
8601       0.21        0.026
8602       2.77        0.571
8603       0.41        0.838
8604       0.05        0.028
8605       0.03        0.023
8606       0.10        0.112
8607       0.02        0.020
8608       1.49        0.854
8609       0.19        0.040
8610       0.10        0.038
8611       0.85        0.182
8612       1.00        0.250
8613       1.10        0.100
8614       1.55        0.033
8615       0.19        0.020
8616       1.00        0.400
8617       1.00        0.250

[8618 rows x 2 columns]

怎么查找文件的列名以(g)为结尾的?

首先将所有的列名拿到手,再看哪些列名是以(g)为结尾.用tolist()将当前的结果打印成列表.

col_names=food_info.columns.tolist()

col_names
Out[39]: 
['NDB_No',
 'Shrt_Desc',
 'Water_(g)',
 'Energ_Kcal',
 'Protein_(g)',
 'Lipid_Tot_(g)',
 'Ash_(g)',
 'Carbohydrt_(g)',
 'Fiber_TD_(g)',
 'Sugar_Tot_(g)',
 'Calcium_(mg)',
 'Iron_(mg)',
 'Magnesium_(mg)',
 'Phosphorus_(mg)',
 'Potassium_(mg)',
 'Sodium_(mg)',
 'Zinc_(mg)',
 'Copper_(mg)',
 'Manganese_(mg)',
 'Selenium_(mcg)',
 'Vit_C_(mg)',
 'Thiamin_(mg)',
 'Riboflavin_(mg)',
 'Niacin_(mg)',
 'Vit_B6_(mg)',
 'Vit_B12_(mcg)',
 'Vit_A_IU',
 'Vit_A_RAE',
 'Vit_E_(mg)',
 'Vit_D_mcg',
 'Vit_D_IU',
 'Vit_K_(mcg)',
 'FA_Sat_(g)',
 'FA_Mono_(g)',
 'FA_Poly_(g)',
 'Cholestrl_(mg)']

然后看col_names这个列表,里面哪个元素是以g()为结尾的?

gram_columns = []

for c in col_names:
    if c.endswith("(g)"): #与c[-3:]=="(g)"等价
        gram_columns.append(c)
gram_df = food_info[gram_columns]
gram_df.head(3)

Out[43]: 
   Water_(g)  Protein_(g)  Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  \
0      15.87         0.85          81.11     2.11            0.06   
1      15.87         0.85          81.11     2.11            0.06   
2       0.24         0.28          99.48     0.00            0.00   

   Fiber_TD_(g)  Sugar_Tot_(g)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  
0           0.0           0.06      51.368       21.021        3.043  
1           0.0           0.06      50.489       23.426        3.012  
2           0.0           0.00      61.924       28.732        3.694  

2.3pandas加减乘除运算

pandas最重要的一个功能是,它可以对不同索引的对象进行算术运算。在将对象相加时,如果存在不同的索引对,则结果的索引就是该索引对的并集。对于有数据库经验的用户,这就像在索引标签上进行自动外连接。看一个简单的例子:

将mg为结尾的列转换成以g为结尾的列.
例如:现将"Iron_(mg)"的数据打印出来.

food_info["Iron_(mg)"]
Out[58]: 
0        0.02
1        0.16
2        0.00
3        0.31
4        0.43
5        0.50
6        0.33
7        0.64
8        0.16
9        0.21
10       0.76
11       0.07
12       0.16
13       0.15
14       0.13
15       0.14
16       0.38
17       0.44
18       0.65
19       0.23
20       0.52
21       0.24
22       0.17
23       0.13
24       0.72
25       0.44
26       0.20
27       0.22
28       0.23
29       0.41
 
8588     9.00
8589     0.30
8590     0.10
8591     1.63
8592    34.82
8593     2.28
8594     0.17
8595     0.17
8596     4.86
8597     0.25
8598     0.23
8599     0.13
8600     0.11
8601     0.68
8602     7.83
8603     3.11
8604     0.30
8605     0.18
8606     0.80
8607     0.04
8608     3.87
8609     0.05
8610     0.38
8611     5.20
8612     1.50
8613     1.40
8614     0.58
8615     3.60
8616     3.50
8617     1.40
Name: Iron_(mg), Length: 8618, dtype: float64

food_info[columns]/1000就可以将mg转换为g.跟numpy很类似,加减乘除一个数,相当于对所有的数都加减乘除了.

div_1000=food_info["Iron_(mg)"]/1000

div_1000
Out[60]: 
0       0.00002
1       0.00016
2       0.00000
3       0.00031
4       0.00043
5       0.00050
6       0.00033
7       0.00064
8       0.00016
9       0.00021
10      0.00076
11      0.00007
12      0.00016
13      0.00015
14      0.00013
15      0.00014
16      0.00038
17      0.00044
18      0.00065
19      0.00023
20      0.00052
21      0.00024
22      0.00017
23      0.00013
24      0.00072
25      0.00044
26      0.00020
27      0.00022
28      0.00023
29      0.00041
  
8588    0.00900
8589    0.00030
8590    0.00010
8591    0.00163
8592    0.03482
8593    0.00228
8594    0.00017
8595    0.00017
8596    0.00486
8597    0.00025
8598    0.00023
8599    0.00013
8600    0.00011
8601    0.00068
8602    0.00783
8603    0.00311
8604    0.00030
8605    0.00018
8606    0.00080
8607    0.00004
8608    0.00387
8609    0.00005
8610    0.00038
8611    0.00520
8612    0.00150
8613    0.00140
8614    0.00058
8615    0.00360
8616    0.00350
8617    0.00140
Name: Iron_(mg), Length: 8618, dtype: float64

其他类似的小练习:

add_100 = food_info["Iron_(mg)"] + 100
sub_100 = food_info["Iron_(mg)"] - 100
mult_2 = food_info["Iron_(mg)"]*2

对两个列进行组合,假设维度是一样的.如果进行加减乘除的操作,会做对应位置的加减乘除的操作,具体如下:

food_info["Water_(g)"]
Out[61]: 
0       15.87
1       15.87
2        0.24
3       42.41
4       41.11
5       48.42
6       51.80
7       39.28
8       37.10
9       37.65
10      38.20
11      79.79
12      79.64
13      81.01
14      81.24
15      82.48
16      54.44
17      41.56
18      55.22
19      37.92
20      13.44
21      41.46
22      33.19
23      48.42
24      41.01
25      50.01
26      48.38
27      53.78
28      45.54
29      41.77
 
8588     2.00
8589    76.70
8590    83.10
8591     1.30
8592     5.00
8593     2.80
8594    81.60
8595    59.60
8596    14.50
8597    49.90
8598    21.70
8599     0.00
8600    23.90
8601    55.50
8602     9.00
8603     4.20
8604    84.40
8605    53.00
8606    54.66
8607    28.24
8608     6.80
8609    10.40
8610     6.84
8611     8.20
8612    81.90
8613    43.00
8614    70.25
8615    26.00
8616    79.20
8617    78.50
Name: Water_(g), Length: 8618, dtype: float64

food_info["Energ_Kcal"]
Out[62]: 
0       717
1       717
2       876
3       353
4       371
5       334
6       300
7       376
8       406
9       387
10      394
11       98
12       97
13       72
14       81
15       72
16      342
17      357
18      264
19      389
20      466
21      356
22      413
23      327
24      373
25      300
26      318
27      254
28      301
29      368

8588    389
8589     91
8590     68
8591    465
8592    401
8593    429
8594     73
8595    179
8596    377
8597    280
8598    688
8599    884
8600    279
8601    257
8602    319
8603    356
8604     62
8605    179
8606    181
8607    287
8608    365
8609    351
8610    350
8611    370
8612     73
8613    305
8614    111
8615    269
8616     90
8617     89
Name: Energ_Kcal, Length: 8618, dtype: int64

water_energy = food_info["Water_(g)"] * food_info["Energ_Kcal"]

water_energy
Out[64]: 
0       11378.79
1       11378.79
2         210.24
3       14970.73
4       15251.81
5       16172.28
6       15540.00
7       14769.28
8       15062.60
9       14570.55
10      15050.80
11       7819.42
12       7725.08
13       5832.72
14       6580.44
15       5938.56
16      18618.48
17      14836.92
18      14578.08
19      14750.88
20       6263.04
21      14759.76
22      13707.47
23      15833.34
24      15296.73
25      15003.00
26      15384.84
27      13660.12
28      13707.54
29      15371.36
  
8588      778.00
8589     6979.70
8590     5650.80
8591      604.50
8592     2005.00
8593     1201.20
8594     5956.80
8595    10668.40
8596     5466.50
8597    13972.00
8598    14929.60
8599        0.00
8600     6668.10
8601    14263.50
8602     2871.00
8603     1495.20
8604     5232.80
8605     9487.00
8606     9893.46
8607     8104.88
8608     2482.00
8609     3650.40
8610     2394.00
8611     3034.00
8612     5978.70
8613    13115.00
8614     7797.75
8615     6994.00
8616     7128.00
8617     6986.50
Length: 8618, dtype: float64

其他类似的小练习:

water_energy = food_info["Water_(g)"] * food_info["Energ_Kcal"]

我们看一下之前的water_energy的shape形式:

food_info.shape
Out[65]: (8618, 36)

由此可以看出是8618行36列,那我们如何新增加一列?
案例:原先是以"Iron_(mg)"以mg结尾,新增加一列以g结尾.

iron_grams = food_info["Iron_(mg)"] / 1000

food_info["Iron_(g)"] = iron_grams

food_info["Iron_(g)"]
Out[68]: 
0       0.00002
1       0.00016
2       0.00000
3       0.00031
4       0.00043
5       0.00050
6       0.00033
7       0.00064
8       0.00016
9       0.00021
10      0.00076
11      0.00007
12      0.00016
13      0.00015
14      0.00013
15      0.00014
16      0.00038
17      0.00044
18      0.00065
19      0.00023
20      0.00052
21      0.00024
22      0.00017
23      0.00013
24      0.00072
25      0.00044
26      0.00020
27      0.00022
28      0.00023
29      0.00041
  
8588    0.00900
8589    0.00030
8590    0.00010
8591    0.00163
8592    0.03482
8593    0.00228
8594    0.00017
8595    0.00017
8596    0.00486
8597    0.00025
8598    0.00023
8599    0.00013
8600    0.00011
8601    0.00068
8602    0.00783
8603    0.00311
8604    0.00030
8605    0.00018
8606    0.00080
8607    0.00004
8608    0.00387
8609    0.00005
8610    0.00038
8611    0.00520
8612    0.00150
8613    0.00140
8614    0.00058
8615    0.00360
8616    0.00350
8617    0.00140
Name: Iron_(g), Length: 8618, dtype: float64

打印food_info.shape,可以看出由原来的(8618, 36)变成(8618, 37),food_info增加了一列.

food_info.shape
Out[69]: (8618, 37)

如何求一列的最大值?
用food_info[columns].max()函数求某一列的最大值.

max_calories=food_info["Energ_Kcal"].max()

max_calories
Out[71]: 902

其他相关的小练习:

normalized_calories = food_info["Energ_Kcal"] / max_calories
normalized_protein = food_info["Protein_(g)"] / food_info["Protein_(g)"].max()
normalized_fat = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()

你可能感兴趣的:((六)pandas知识学习1-python数据分析与机器学习实战(学习笔记))