Pandas : OverflowError: Python int too large to convert to C long问题解决的一个方法

有的时候,我们在处理dataframe时,需要将这些dtype(just means data type)是object的数据转换为int,如果数据很大的话,就会出现如下的报错:

In [1]: data['itemId'] = data['itemId'].astype(int)
Out[1]: Traceback (most recent call last):

  File "", line 1, in <module>
    data['itemId'] = data['itemId'].astype(int)

  File "D:\Anaconda3 5.2.0\lib\site-packages\pandas\util\_decorators.py", line 177, in wrapper
    return func(*args, **kwargs)

  File "D:\Anaconda3 5.2.0\lib\site-packages\pandas\core\generic.py", line 4997, in astype
    **kwargs)

  File "D:\Anaconda3 5.2.0\lib\site-packages\pandas\core\internals.py", line 3714, in astype
    return self.apply('astype', dtype=dtype, **kwargs)

  File "D:\Anaconda3 5.2.0\lib\site-packages\pandas\core\internals.py", line 3581, in apply
    applied = getattr(b, f)(**kwargs)

  File "D:\Anaconda3 5.2.0\lib\site-packages\pandas\core\internals.py", line 575, in astype
    **kwargs)

  File "D:\Anaconda3 5.2.0\lib\site-packages\pandas\core\internals.py", line 664, in _astype
    values = astype_nansafe(values.ravel(), dtype, copy=True)

  File "D:\Anaconda3 5.2.0\lib\site-packages\pandas\core\dtypes\cast.py", line 709, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)

  File "pandas\_libs\lib.pyx", line 456, in pandas._libs.lib.astype_intsafe

  File "pandas/_libs/src\util.pxd", line 142, in util.set_value_at_unsafe

OverflowError: Python int too large to convert to C long

这是因为python对int没有上限,但是C有,有些函数是用C写的而且没有针对大整数做调整的话,如果传入参数大于C语言的int上限就会出错。

在pandas中,出现这个问题主要是对于pandas dtype认识不足
Pandas : OverflowError: Python int too large to convert to C long问题解决的一个方法_第1张图片
所以应该这样进行类型转换

In [1]: data['itemId'] = data['itemId'].astype('int64')

In [2]: data['itemId'].dtype
Out[2]: dtype('int64')

值得注意的是,object一般是字符串类型或者是混合类型(比如这一列数据中既有字符串也有数字类型)。针对这类情况,只能自己写一个函数解决。
另外astype()一般只对整列数据类型是一致的才能进行转换,否则会出现报错。

参考资料:

  1. CSDN博文:python-OverflowError: Python int too large to convert to C
  2. https://bugs.python.org/issue21816
  3. Stackoverflow:Int too large to convert to C long while doing .astype(int)
  4. Overview of Pandas Data Types
  5. pandas官方文档:pandas.DataFrame.astype¶

你可能感兴趣的:(python)