#2.1.3 Introduction To Pandas.md

1. Introduction to Pandas

One of the biggest advantages that pandas has over NumPy is the ability to store mixed data types in rows and columns. Many tabular datasets contain a range of data types and pandas dataframes handle mixed data types effortlessly while NumPy doesn't. Pandas dataframes can also handle missing values gracefully using a custom object, NaN, to represent those values. A common complaint with NumPy is its lack of an object to represent missing values and people end up having to find and replace these values manually

2. 读取CSV格式文件:pandas.read_csv()

input
import pandas
food_info = pandas.read_csv("food_info.csv")
print(type(food_info))
output

3. 显示dataframe的头几行dataframe:XXXX.head()

  • To select the first 5 rows of a dataframe, use the dataframe method head()
input

print(food_info.head(3))     # First 3 rows

output
   NDB_No                 Shrt_Desc  Water_(g)  Energ_Kcal  Protein_(g)  \
0    1001          BUTTER WITH SALT      15.87         717         0.85   
1    1002  BUTTER WHIPPED WITH SALT      15.87         717         0.85   
2    1003      BUTTER OIL ANHYDROUS       0.24         876         0.28   

   Lipid_Tot_(g)  Ash_(g)  Carbohydrt_(g)  Fiber_TD_(g)  Sugar_Tot_(g)  \
0          81.11     2.11            0.06           0.0           0.06   
1          81.11     2.11            0.06           0.0           0.06   
2          99.48     0.00            0.00           0.0           0.00   

        ...        Vit_A_IU  Vit_A_RAE  Vit_E_(mg)  Vit_D_mcg  Vit_D_IU  \
0       ...          2499.0      684.0        2.32        1.5      60.0   
1       ...          2499.0      684.0        2.32        1.5      60.0   
2       ...          3069.0      840.0        2.80        1.8      73.0   

   Vit_K_(mcg)  FA_Sat_(g)  FA_Mono_(g)  FA_Poly_(g)  Cholestrl_(mg)  
0          7.0      51.368       21.021        3.043           215.0  
1          7.0      50.489       23.426        3.012           219.0  
2          8.6      61.924       28.732        3.694           256.0  

[3 rows x 36 columns]

#4.显示维度与行/列:XXXX.shape

input
dimensions = food_info.shape 
print(dimensions)
output
(8618, 36)
input
num_rows = dimensions[0]     # The number of rows, 8618.
print(num_rows)
num_cols = dimensions[1]     # The number of columns, 36.
print(num_cols)
output
861836

5.从数据框中选取一行:XXXX.loc[N]

input
num_rows = food_info.shape[0]         #行数目
last_rows = food_info.loc[num_rows-5:num_rows-1]   #显示最后5行

6.查看数据类型:XXXX.dtypes - object

  • object - for representing string values.
  • int - for representing integer values.
  • float - for representing float values.
  • datetime - for representing time values.
  • bool - for representing Boolean values.
input
print(food_info.dtypes)
output
NDB_No               int64
Shrt_Desc           object
Water_(g)          float64
Energ_Kcal           int64
Protein_(g)        float64
Lipid_Tot_(g)      float64
Ash_(g)            float64
Carbohydrt_(g)     float64
Fiber_TD_(g)       float64
Sugar_Tot_(g)      float64
Calcium_(mg)       float64
Iron_(mg)          float64
Magnesium_(mg)     float64
Phosphorus_(mg)    float64
Potassium_(mg)     float64
Sodium_(mg)        float64
Zinc_(mg)          float64
Copper_(mg)        float64
Manganese_(mg)     float64
Selenium_(mcg)     float64
Vit_C_(mg)         float64
Thiamin_(mg)       float64
Riboflavin_(mg)    float64
Niacin_(mg)        float64
Vit_B6_(mg)        float64
Vit_B12_(mcg)      float64
Vit_A_IU           float64
Vit_A_RAE          float64
Vit_E_(mg)         float64
Vit_D_mcg          float64
Vit_D_IU           float64
Vit_K_(mcg)        float64
FA_Sat_(g)         float64
FA_Mono_(g)        float64
FA_Poly_(g)        float64
Cholestrl_(mg)     float64
dtype: object

7.选择某一列:XXXX['XXXX']

input
saturated_fat = food_info['FA_Sat_(g)']
cholesterol = food_info['Cholestrl_(mg)']

columns = ["Zinc_(mg)", "Copper_(mg)"]
zinc_copper = food_info[columns]
selenium_thiamin = food_info[['Selenium_(mcg)', 'Thiamin_(mg)']]

8.选取特定的列

  • XXXX.tolist():在本例中,是将引索转化为list
  • XXXX.endswith('YY'):结尾如果是"YY",返回TRUE
  • XXXX.startswith('YY'):开头如果是"YY",返回FALSE
input
col_names = food_info.columns.tolist()
gram_columns = []
for c in col_names:
    if c.endswith('(g)'):
        gram_columns.append(c)
gram_df = food_info[gram_columns]
print(gram_df)

你可能感兴趣的:(#2.1.3 Introduction To Pandas.md)