pandas练习-apply函数

步骤1 读取数据,并将数据命名为student ``` >>> import pandas as pd,numpy as np >>> student = pd.read_csv('D:/Data/PythonPractise/pandasdata/student-mat.csv') >>> student.head() school sex age address famsize Pstatus ... Walc health absences G1 G2 G3 0 GP F 18 U GT3 A ... 1 3 6 5 6 6 1 GP F 17 U GT3 T ... 1 3 4 5 5 6 2 GP F 15 U LE3 T ... 3 3 10 7 8 10 3 GP F 15 U GT3 T ... 1 5 2 15 14 15 4 GP F 16 U GT3 T ... 2 5 4 6 10 10 [5 rows x 33 columns] ``` 步骤2 从'school'到'guardian'将数据切片 ``` >>> stud_alcoh = student.loc[:,'school':'guardian'] >>> stud_alcoh.head() school sex age address famsize ... Fedu Mjob Fjob reason guardian 0 GP F 18 U GT3 ... 4 at_home teacher course mother 1 GP F 17 U GT3 ... 1 at_home other course father 2 GP F 15 U LE3 ... 1 at_home other other mother 3 GP F 15 U GT3 ... 2 health services home mother 4 GP F 16 U GT3 ... 3 other other home father ``` 步骤3 创建一个捕获字符串的lambda函数 ``` >>> captalizer = lambda x: x.upper() ``` 步骤4 使'Fjob'列都大写 ``` >>> stud_alcoh['Fjob'].apply(captalizer) 0 TEACHER 1 OTHER 2 OTHER 3 SERVICES 4 OTHER ~~~ ``` 步骤5 打印数据集的最后几行元素 ``` >>> stud_alcoh.tail() school sex age address famsize ... Fedu Mjob Fjob reason guardian 390 MS M 20 U LE3 ... 2 services services course other 391 MS M 17 U LE3 ... 1 services services course mother 392 MS M 21 R GT3 ... 1 other other course other 393 MS M 18 R LE3 ... 2 services other course mother 394 MS M 19 U LE3 ... 1 other at_home course father [5 rows x 12 columns] ``` 步骤6 注意到原始数据框仍然是小写字母,接下来改进一下 ``` stud_alcoh['Fjob'] = stud_alcoh['Fjob'].apply(captalizer) >>> stud_alcoh.tail() school sex age address famsize ... Fedu Mjob Fjob reason guardian 390 MS M 20 U LE3 ... 2 services SERVICES course other 391 MS M 17 U LE3 ... 1 services SERVICES course mother 392 MS M 21 R GT3 ... 1 other OTHER course other 393 MS M 18 R LE3 ... 2 services OTHER course mother 394 MS M 19 U LE3 ... 1 other AT_HOME course father ``` 步骤7 创建一个名为majority的函数,它返回一个布尔值到一个名为legal_drinker的新列(多数年龄大于17岁) ``` >>> def majority(x): if x > 17: return True else: return False >>> stud_alcoh['legal_drinker'] = stud_alcoh['age'].apply(majority) >>> stud_alcoh.head() school sex age address ... Fjob reason guardian legal_drinker 0 GP F 18 U ... TEACHER course mother True 1 GP F 17 U ... OTHER course father False 2 GP F 15 U ... OTHER other mother False 3 GP F 15 U ... SERVICES home mother False 4 GP F 16 U ... OTHER home father False [5 rows x 13 columns] ``` 步骤8 将数据集的每个数字乘以10 ``` >>> def times10(x): if type(x) is int: return 10 * x return x >>> stud_alcoh.applymap(times10).head(10) school sex age address ... Fjob reason guardian legal_drinker 0 GP F 180 U ... TEACHER course mother True 1 GP F 170 U ... OTHER course father False 2 GP F 150 U ... OTHER other mother False 3 GP F 150 U ... SERVICES home mother False 4 GP F 160 U ... OTHER home father False 5 GP M 160 U ... OTHER reputation mother False 6 GP M 160 U ... OTHER home mother False 7 GP F 170 U ... TEACHER home mother False 8 GP M 150 U ... OTHER home mother False 9 GP M 150 U ... OTHER home mother False [10 rows x 13 columns] ``` #拓展: apply/applymap/map的区别 1、当我们要对数据框(DataFrame)的数据进行按行或按列操作时用apply() 2、当我们要对数据框(DataFrame)的每一个数据进行操作时用applymap(),返回结果是DataFrame格式 3、当我们要对Series的每一个数据进行操作时用map() 在上面的步骤4中,是的Fjob列的每个字母都大写,同样可以这样操作 ``` >>> stud_alcoh.Fjob.map(captalizer) ``` 练习:将DataFrame中的某一列的数据类型str转化成int ``` >>> data1 = pd.DataFrame({'a':['1','2','3'],'b':['4','5','6']}) >>> data1 a b 0 1 4 1 2 5 2 3 6 >>> data1.info() RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): a 3 non-null object b 3 non-null object dtypes: object(2) memory usage: 128.0+ bytes >>> data1 = data1.applymap(int) >>> data1.info() RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): a 3 non-null int64 b 3 non-null int64 dtypes: int64(2) memory usage: 128.0 bytes ```

你可能感兴趣的:(pandas练习-apply函数)