A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_in

某日在捣鼓pandas时发生了warning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

意思是一个值正被赋给来自于DataFrame类型的切片的拷贝,使用.loc方法来赋值。

遂研究了下,感觉很奇怪

In [233]: import pandas as pd

In [234]: A = pd.DataFrame([[1,2,3], [7,8,9],[14,15,16]], columns = ['a', 'b', 'c'])

In [235]: A
Out[235]:
    a   b   c
0   1   2   3
1   7   8   9
2  14  15  16

把A的第一列赋值给B,B是Series对象,修改B的某一个数发现A也被修改了

In [236]: B = A['a']

In [237]: B
Out[237]:
0     1
1     7
2    14
Name: a, dtype: int64

In [238]: type(B)
Out[238]: pandas.core.series.Series

In [239]: B[0] = 3

In [240]: B
Out[240]:
0     3
1     7
2    14
Name: a, dtype: int64

In [241]: A
Out[241]:
    a   b   c
0   3   2   3
1   7   8   9
2  14  15  16

然后把A的第一列和第二列赋值给C,C是A的切片的拷贝?,C是DataFrame类型,修改C的第一行第一列,发生了警告,C被修改,A未被修改

In [243]: C = A[['a', 'b']]

In [244]: C
Out[244]:
    a   b
0   3   2
1   7   8
2  14  15

In [245]: type(C)
Out[245]: pandas.core.frame.DataFrame

In [246]: C['a'][0]
Out[246]: 3

In [247]: C['a'][0] =5
c:\program files\python36\lib\site-packages\IPython\core\interactiveshell.py:2910: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)

In [248]: C
Out[248]:
    a   b
0   5   2
1   7   8
2  14  15

In [249]: A
Out[249]:
    a   b   c
0   3   2   3
1   7   8   9
2  14  15  16

利用A的loc方法生成D,D和C一样,再进行同样的操作,没有发生警告
 

In [250]: D = A.loc[:, ['a','b']]

In [251]: D
Out[251]:
    a   b
0   3   2
1   7   8
2  14  15

In [252]: type(D)
Out[252]: pandas.core.frame.DataFrame

In [253]: D['a'][0]
Out[253]: 3

In [254]: D['a'][0] = 5

In [255]: D
Out[255]:
    a   b
0   5   2
1   7   8
2  14  15

In [256]: A
Out[256]:
    a   b   c
0   3   2   3
1   7   8   9
2  14  15  16

C和D有什么区别?我尝试了一下C规避警告的办法,可使用

C = C.copy()

再进行修改数值操作,就不会发生警告了。

看了一下官方文档:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

大意是这样的:

In [339]: dfmi = pd.DataFrame([list('abcd'),
   .....:                      list('efgh'),
   .....:                      list('ijkl'),
   .....:                      list('mnop')],
   .....:                     columns=pd.MultiIndex.from_product([['one','two'],
   .....:                                                         ['first','second']]))
   .....: 

In [340]: dfmi
Out[340]: 
    one          two       
  first second first second
0     a      b     c      d
1     e      f     g      h
2     i      j     k      l
3     m      n     o      p

如果你使用loc方法,

dfmi.loc[:,('one','second')] = value

在pandas里会被视为(等价于)调用了loc的__setitem__方法

# becomes
dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)

如果你使用下面的方式赋值

dfmi['one']['second'] = value

pandas内部等价于如下

# becomes
dfmi.__getitem__('one').__setitem__('second', value)

即首先调用了__getitem__方法,返回了一个DataFrame对象,再对这个对象调用__setitem__方法,也就是说,调用了两次,称为链式索引(chained indexing),时间上会比loc更慢。

但通常,pandas不会因为你多花了一些时间就给你报错,而是因为pandas无法保证第一次返回的DataFrame对象是view还是copy,取决于数组的布局(layout of array),如果返回的是view,那么皆大欢喜,没有问题。如果返回的是copy,那我给一个copy赋值后,它的原变量没有发生改变。pandas无法保证__setitem__是会修改dfmi还是修改一个马上被扔掉的临时对象,所以最好使用loc方法。

What’s up with the SettingWithCopy warning? We don’t usually throw warnings around when you do something that might cost a few extra milliseconds!But it turns out that assigning to the product of chained indexing has inherently unpredictable results.

Outside of simple cases, it’s very hard to predict whether it will return a view or a copy (it depends on the memory layout of the array, about which pandas makes no guarantees), and therefore whether the __setitem__ will modify dfmi or a temporary object that gets thrown out immediately afterward. That’s what SettingWithCopy is warning you about!

 

回到我自己的问题,从上面代码执行的情况来看,C是A的slice的copy,因为改变了C对A没有影响。那为什么还会警告?我猜是因为pandas内部认为,C是上面提到的“马上要被扔掉的临时对象”,而B是A的slice的view,所以没有被警告。

你可能感兴趣的:(A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_in)