#2.1.4 Data Manipulation with pandas.md

1.标准化:列/最大值

While there are many ways to normalize data, one of the simplest ways is to divide all of the values in a column by that column's maximum value. This way, all of the columns will range from 0 to 1. To calculate the maximum value of a column, we use the Series.max()
method.

input
max_protein = food_info["Protein_(g)"].max()
normalized_protein = food_info["Protein_(g)"] / max_protein
print(normalized_protein.head(5))

output

0 0.009624 
1 0.009624 
2 0.003170 
3 0.242301 
4 0.263134 
Name: Protein_(g), dtype: float64

2.列之间的加减

food_info["Normalized_Protein"] = food_info["Protein_(g)"] / food_info["Protein_(g)"].max()
food_info["Normalized_Fat"] = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()
food_info["Norm_Nutr_Index"] = 2*food_info["Normalized_Protein"]  + (-0.75*food_info["Normalized_Fat"])

3.创建一个新列

food_info["Normalized_Protein"] = normalized_protein
food_info["Normalized_Fat"] = normalized_fat

4.升降序排列文档:Dataframe.sort_values(‘YY’, ascending=True)

food_info.sort_values("Norm_Nutr_Index", inplace=True, ascending=False)
  • inplace=True,不创建新的对象,直接在原始对象上尽心修改;
  • inplace=False,在对原始对象进行修改,而会创建新的对象;
  • ascending:
    Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

你可能感兴趣的:(#2.1.4 Data Manipulation with pandas.md)