1. Introduction to Pandas

One of the biggest advantages that pandas has over NumPy is the ability to store mixed data types in rows and columns. Many tabular datasets contain a range of data types and pandas dataframes handle mixed data types effortlessly while NumPy doesn't. Pandas dataframes can also handle missing values gracefully using a custom object, NaN, to represent those values. A common complaint with NumPy is its lack of an object to represent missing values and people end up having to find and replace these values manually

2. 读取CSV格式文件：pandas.read_csv()

input

import pandas
food_info = pandas.read_csv("food_info.csv")
print(type(food_info))

output

3. 显示dataframe的头几行dataframe：XXXX.head()

To select the first 5 rows of a dataframe, use the dataframe method head()

input


print(food_info.head(3))     # First 3 rows

output

   NDB_No                 Shrt_Desc  Water_(g)  Energ_Kcal  Protein_(g)  \
0    1001          BUTTER WITH SALT      15.87         717         0.85   
1    1002  BUTTER WHIPPED WITH SALT      15.87         717         0.85   
2    1003      BUTTER OIL ANHYDROUS       0.24         876         0.28   

   Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  Fiber_TD_(g)  Sugar_Tot_(g)  \
0          81.11     2.11            0.06           0.0           0.06   
1          81.11     2.11            0.06           0.0           0.06   
2          99.48     0.00            0.00           0.0           0.00   

        ...        Vit_A_IU  Vit_A_RAE  Vit_E_(mg)  Vit_D_mcg  Vit_D_IU  \
0       ...          2499.0      684.0        2.32        1.5      60.0   
1       ...          2499.0      684.0        2.32        1.5      60.0   
2       ...          3069.0      840.0        2.80        1.8      73.0   

   Vit_K_(mcg)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  Cholestrl_(mg)  
0          7.0      51.368       21.021        3.043           215.0  
1          7.0      50.489       23.426        3.012           219.0  
2          8.6      61.924       28.732        3.694           256.0  

[3 rows x 36 columns]

#4.显示维度与行/列：XXXX.shape

input

dimensions = food_info.shape 
print(dimensions)

output

(8618, 36)

input

num_rows = dimensions[0]     # The number of rows, 8618.
print(num_rows)
num_cols = dimensions[1]     # The number of columns, 36.
print(num_cols)

output

5.从数据框中选取一行：XXXX.loc[N]

input

num_rows = food_info.shape[0]         #行数目
last_rows = food_info.loc[num_rows-5:num_rows-1]   #显示最后5行

6.查看数据类型：XXXX.dtypes - object

object - for representing string values.
int - for representing integer values.
float - for representing float values.
datetime - for representing time values.
bool - for representing Boolean values.

input

print(food_info.dtypes)

output

NDB_No               int64
Shrt_Desc           object
Water_(g)          float64
Energ_Kcal           int64
Protein_(g)        float64
Lipid_Tot_(g)      float64
Ash_(g)            float64
Carbohydrt_(g)     float64
Fiber_TD_(g)       float64
Sugar_Tot_(g)      float64
Calcium_(mg)       float64
Iron_(mg)          float64
Magnesium_(mg)     float64
Phosphorus_(mg)    float64
Potassium_(mg)     float64
Sodium_(mg)        float64
Zinc_(mg)          float64
Copper_(mg)        float64
Manganese_(mg)     float64
Selenium_(mcg)     float64
Vit_C_(mg)         float64
Thiamin_(mg)       float64
Riboflavin_(mg)    float64
Niacin_(mg)        float64
Vit_B6_(mg)        float64
Vit_B12_(mcg)      float64
Vit_A_IU           float64
Vit_A_RAE          float64
Vit_E_(mg)         float64
Vit_D_mcg          float64
Vit_D_IU           float64
Vit_K_(mcg)        float64
FA_Sat_(g)         float64
FA_Mono_(g)        float64
FA_Poly_(g)        float64
Cholestrl_(mg)     float64
dtype: object

7.选择某一列：XXXX['XXXX']

input

saturated_fat = food_info['FA_Sat_(g)']
cholesterol = food_info['Cholestrl_(mg)']

columns = ["Zinc_(mg)", "Copper_(mg)"]
zinc_copper = food_info[columns]
selenium_thiamin = food_info[['Selenium_(mcg)', 'Thiamin_(mg)']]

8.选取特定的列

XXXX.tolist()：在本例中，是将引索转化为list
XXXX.endswith('YY')：结尾如果是"YY"，返回TRUE
XXXX.startswith('YY')：开头如果是"YY"，返回FALSE

input

col_names = food_info.columns.tolist()
gram_columns = []
for c in col_names:
    if c.endswith('(g)'):
        gram_columns.append(c)
gram_df = food_info[gram_columns]
print(gram_df)

#2.1.3 Introduction To Pandas.md

1. Introduction to Pandas

2. 读取CSV格式文件：pandas.read_csv()

input

output

3. 显示dataframe的头几行dataframe：XXXX.head()

input

output

input

output

input

output

5.从数据框中选取一行：XXXX.loc[N]

input

6.查看数据类型：XXXX.dtypes - object

input

output

7.选择某一列：XXXX['XXXX']

input

8.选取特定的列

input

你可能感兴趣的:(#2.1.3 Introduction To Pandas.md)