pyspark学习笔记(一),修改列的dtype

先查看一下各列

df.printSchema()
root
 |-- Id: string (nullable = true)
 |-- groupId: string (nullable = true)
 |-- matchId: string (nullable = true)
 |-- assists: string (nullable = true)
 |-- boosts: string (nullable = true)
 |-- damageDealt: string (nullable = true)
 |-- DBNOs: string (nullable = true)
 |-- headshotKills: string (nullable = true)
 |-- heals: string (nullable = true)
 |-- killPlace: string (nullable = true)
 |-- killPoints: string (nullable = true)
 |-- kills: string (nullable = true)
 |-- killStreaks: string (nullable = true)
 |-- longestKill: string (nullable = true)
 |-- maxPlace: string (nullable = true)
 |-- numGroups: string (nullable = true)
 |-- revives: string (nullable = true)
 |-- rideDistance: string (nullable = true)
 |-- roadKills: string (nullable = true)
 |-- swimDistance: string (nullable = true)
 |-- teamKills: string (nullable = true)
 |-- vehicleDestroys: string (nullable = true)
 |-- walkDistance: string (nullable = true)
 |-- weaponsAcquired: string (nullable = true)
 |-- winPoints: string (nullable = true)
 |-- winPlacePerc: string (nullable = true)

看到kills的dtype是string

根据官方文档,修改一下:

df.kills.astype("int")
Out[29]: Column

再看一下列属性,发现没变:

df.select("kills").dtypes
Out[34]: [('kills', 'string')]

一个可行的方法:

df = df.withColumn("kills",df.kills.astype("int"))
df.select("kills").dtypes
Out[36]: [('kills', 'int')]

成功了

你可能感兴趣的:(pyspark学习)