DataFrame改变列数据类型的--- Series/DataFrame. astype(...) //.infer_objects()

  • 概述:

DataFrame改变列数据类型的方法主要有2类:

1)    Series/df.astype('float64')    “使用频率高”  (DataFrame, Series都适用)

2)   Series/pf.infer_objects() : 将‘object’ 类型更改为‘float64/int...’类型(DataFrame, Series都适用)

3)   infer_object()的旧版本方法:Series/df.convert_objects(convert_numeric=True)    “不推荐继续使用” 

    (新旧区别:200行数据中196行数字的objetct,3行“?”, 就方法可以实现对196行进行object-->float64的转换;新方法报错)

4)  For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric( numeric暂时还没用过).     

       

help()信息: 

a)    infer_objects(self):    Attempt to infer better dtypes for object columns.

      The inference rules are the same as during normal Series/DataFrame construction

       help(pd.DataFrame.infer_objects)

b)    astype(self, dtype, copy=True, errors='raise', **kwargs):    Cast a pandas object to a specified         dtype ``dtype``.      (Series/DataFrame )

        help(pd.DataFrame.astype)     

c)    to_numeric(arg, errors='raise', downcast=None):     Convert argument to a numeric type.

       Parameters:   arg : list, tuple, 1-d array, or Series  (不适用DataFrame)

        help(pd.to_numeric)

d)    ........

       pandas.DataFrame.astype : Cast argument to a specified dtype.
       pandas.to_datetime : Convert argument to datetime.
       pandas.to_timedelta : Convert argument to timedelta.

       numpy.ndarray.astype : Cast a numpy array to a specified type.  ........


  • 实例代码:
# 导入excel文件 , 注意转义字符, 最好在地址前加上 r'';  .parse('Sheet1') 解析excel的sheet页数据
cars2_xlsx = pd.ExcelFile(r'C:\Users\admin\Desktop\cars2.xlsx')      

cars = cars2_xlsx.parse('Sheet1')         
cars.info()

RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
mpg             200 non-null float64
cylinders       200 non-null int64
displacement    200 non-null int64
horsepower      200 non-null object
weight          200 non-null int64
acceleration    200 non-null float64
model           200 non-null int64
origin          200 non-null int64
car             200 non-null object
dtypes: float64(2), int64(5), object(2)
memory usage: 14.1+ KB

方法1):

                cars[['mpg', 'cylinders']] = cars[['mpg', 'cylinders']].astype('object')

cars[['mpg', 'cylinders']] = cars[['mpg', 'cylinders']].astype('object')

cars.info()

RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
mpg             200 non-null object
cylinders       200 non-null object
displacement    200 non-null int64
horsepower      200 non-null object
weight          200 non-null int64
acceleration    200 non-null float64
model           200 non-null int64
origin          200 non-null int64
car             200 non-null object
dtypes: float64(1), int64(4), object(4)
memory usage: 14.1+ KB   

方法2):

                cars = cars.infer_objects()

cars = cars.infer_objects()

cars.info()

RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
mpg             200 non-null float64
cylinders       200 non-null int64
displacement    200 non-null int64
horsepower      200 non-null object
weight          200 non-null int64
acceleration    200 non-null float64
model           200 non-null int64
origin          200 non-null int64
car             200 non-null object
dtypes: float64(2), int64(5), object(2)
memory usage: 14.1+ KB

你可能感兴趣的:(pandas)