DataFrame改变列数据类型的方法主要有2类:
1) Series/df.astype('float64') “使用频率高” (DataFrame, Series都适用)
2) Series/pf.infer_objects() : 将‘object’ 类型更改为‘float64/int...’类型(DataFrame, Series都适用)
3) infer_object()的旧版本方法:Series/df.convert_objects(convert_numeric=True) “不推荐继续使用”
(新旧区别:200行数据中196行数字的objetct,3行“?”, 就方法可以实现对196行进行object-->float64的转换;新方法报错)
4) For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric( numeric暂时还没用过).
help()信息:
a) infer_objects(self): Attempt to infer better dtypes for object columns.
The inference rules are the same as during normal Series/DataFrame construction
help(pd.DataFrame.infer_objects)
b) astype(self, dtype, copy=True, errors='raise', **kwargs): Cast a pandas object to a specified dtype ``dtype``. (Series/DataFrame )
help(pd.DataFrame.astype)
c) to_numeric(arg, errors='raise', downcast=None): Convert argument to a numeric type.
Parameters: arg : list, tuple, 1-d array, or Series (不适用DataFrame)
help(pd.to_numeric)
d) ........
pandas.DataFrame.astype : Cast argument to a specified dtype.
pandas.to_datetime : Convert argument to datetime.
pandas.to_timedelta : Convert argument to timedelta.
numpy.ndarray.astype : Cast a numpy array to a specified type. ........
# 导入excel文件 , 注意转义字符, 最好在地址前加上 r''; .parse('Sheet1') 解析excel的sheet页数据
cars2_xlsx = pd.ExcelFile(r'C:\Users\admin\Desktop\cars2.xlsx')
cars = cars2_xlsx.parse('Sheet1')
cars.info()
RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
mpg 200 non-null float64
cylinders 200 non-null int64
displacement 200 non-null int64
horsepower 200 non-null object
weight 200 non-null int64
acceleration 200 non-null float64
model 200 non-null int64
origin 200 non-null int64
car 200 non-null object
dtypes: float64(2), int64(5), object(2)
memory usage: 14.1+ KB
方法1):
cars[['mpg', 'cylinders']] = cars[['mpg', 'cylinders']].astype('object')
cars[['mpg', 'cylinders']] = cars[['mpg', 'cylinders']].astype('object')
cars.info()
RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
mpg 200 non-null object
cylinders 200 non-null object
displacement 200 non-null int64
horsepower 200 non-null object
weight 200 non-null int64
acceleration 200 non-null float64
model 200 non-null int64
origin 200 non-null int64
car 200 non-null object
dtypes: float64(1), int64(4), object(4)
memory usage: 14.1+ KB
方法2):
cars = cars.infer_objects()
cars = cars.infer_objects()
cars.info()
RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
mpg 200 non-null float64
cylinders 200 non-null int64
displacement 200 non-null int64
horsepower 200 non-null object
weight 200 non-null int64
acceleration 200 non-null float64
model 200 non-null int64
origin 200 non-null int64
car 200 non-null object
dtypes: float64(2), int64(5), object(2)
memory usage: 14.1+ KB