pandas 中 reindex 详解 method=nearest? bfill?, ffill?,limit,fill_value,

yinhaibo@yinhaibo-OptiPlex-9020:~$ ipython
Python 3.6.5 |Anaconda custom (64-bit)| (default, Apr 29 2018, 16:14:56) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd                                                     

In [2]: import numpy as np                                                      

In [3]: a = np.random.randint(0, 1, 16\)                                        
  File "", line 1
    a = np.random.randint(0, 1, 16\)
                                    ^
SyntaxError: unexpected character after line continuation character


In [4]: a = np.random.randint(0, 1, 16)                                         

In [5]: a = np.random.randint(0, 1, 16).reshape(4, 4)                           

In [6]: a                                                                       
Out[6]: 
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [7]: a = np.random.randint(0, 2, 16).reshape(4, 4)                           

In [8]: a                                                                       
Out[8]: 
array([[0, 1, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 0]])

In [9]: a = np.random.randint(0, 2, 16).reshape(4, 4)                           

In [10]: a                                                                      
Out[10]: 
array([[1, 0, 0, 1],
       [1, 0, 0, 0],
       [0, 0, 1, 0],
       [1, 0, 0, 0]])

In [11]: b = pd.DateFrame(data=a columns=['a', 'b', 'v', 'd'])                  
  File "", line 1
    b = pd.DateFrame(data=a columns=['a', 'b', 'v', 'd'])
                                  ^
SyntaxError: invalid syntax


In [12]: b = pd.DateFrame(data=a, columns=['a', 'b', 'v', 'd'])                 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-7de6edc58341> in <module>
----> 1 b = pd.DateFrame(data=a, columns=['a', 'b', 'v', 'd'])

AttributeError: module 'pandas' has no attribute 'DateFrame'

In [13]: b = pd.DataFrame(data=a, columns=['a', 'b', 'v', 'd'])                 

In [14]: b                                                                      
Out[14]: 
   a  b  v  d
0  1  0  0  1
1  1  0  0  0
2  0  0  1  0
3  1  0  0  0

In [15]: index = pd.DatetimeIndex(start='20180606', periods=4, freq='5S')       

In [16]: index                                                                  
Out[16]: 
DatetimeIndex(['2018-06-06 00:00:00', '2018-06-06 00:00:05',
               '2018-06-06 00:00:10', '2018-06-06 00:00:15'],
              dtype='datetime64[ns]', freq='5S')

In [17]: pd['c'] =  index                                                       
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-bca98fd5f016> in <module>
----> 1 pd['c'] =  index

TypeError: 'module' object does not support item assignment

In [18]: b['c'] =  index                                                        

In [19]: b                                                                      
Out[19]: 
   a  b  v  d                   c
0  1  0  0  1 2018-06-06 00:00:00
1  1  0  0  0 2018-06-06 00:00:05
2  0  0  1  0 2018-06-06 00:00:10
3  1  0  0  0 2018-06-06 00:00:15

In [20]: b.index = index                                                        

In [21]: b                                                                      
Out[21]: 
                     a  b  v  d                   c
2018-06-06 00:00:00  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:05  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:10  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:15  1  0  0  0 2018-06-06 00:00:15

In [22]: index = pd.DatetimeIndex(start=index[0], end=index[3], freq='S')       

In [23]: cc = b.reindex(index=index, method='nearest')                          

In [24]: cc                                                                     
Out[24]: 
                     a  b  v  d                   c
2018-06-06 00:00:00  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:01  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:02  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:03  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:04  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:05  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:06  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:07  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:08  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:09  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:10  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:11  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:12  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:13  1  0  0  0 2018-06-06 00:00:15
2018-06-06 00:00:14  1  0  0  0 2018-06-06 00:00:15
2018-06-06 00:00:15  1  0  0  0 2018-06-06 00:00:15

In [25]: 5b                                                                     
  File "", line 1
    5b
     ^
SyntaxError: invalid syntax


In [26]: b                                                                      
Out[26]: 
                     a  b  v  d                   c
2018-06-06 00:00:00  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:05  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:10  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:15  1  0  0  0 2018-06-06 00:00:15

In [27]: b.reindex?                                                                                                                                                                                         

In [28]: cc                                                                                                                                                                                                 
Out[28]: 
                     a  b  v  d                   c
2018-06-06 00:00:00  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:01  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:02  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:03  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:04  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:05  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:06  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:07  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:08  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:09  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:10  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:11  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:12  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:13  1  0  0  0 2018-06-06 00:00:15
2018-06-06 00:00:14  1  0  0  0 2018-06-06 00:00:15
2018-06-06 00:00:15  1  0  0  0 2018-06-06 00:00:15

In [29]: b                                                                                                                                                                                                  
Out[29]: 
                     a  b  v  d                   c
2018-06-06 00:00:00  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:05  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:10  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:15  1  0  0  0 2018-06-06 00:00:15

In [30]: dd = b.reindex(index=index, method='nearest', limit=1)                                                                                                                                             

In [31]: dd                                                                                                                                                                                                 
Out[31]: 
                       a    b    v    d                   c
2018-06-06 00:00:00  1.0  0.0  0.0  1.0 2018-06-06 00:00:00
2018-06-06 00:00:01  1.0  0.0  0.0  1.0 2018-06-06 00:00:00
2018-06-06 00:00:02  NaN  NaN  NaN  NaN                 NaT
2018-06-06 00:00:03  NaN  NaN  NaN  NaN                 NaT
2018-06-06 00:00:04  1.0  0.0  0.0  0.0 2018-06-06 00:00:05
2018-06-06 00:00:05  1.0  0.0  0.0  0.0 2018-06-06 00:00:05
2018-06-06 00:00:06  1.0  0.0  0.0  0.0 2018-06-06 00:00:05
2018-06-06 00:00:07  NaN  NaN  NaN  NaN                 NaT
2018-06-06 00:00:08  NaN  NaN  NaN  NaN                 NaT
2018-06-06 00:00:09  0.0  0.0  1.0  0.0 2018-06-06 00:00:10
2018-06-06 00:00:10  0.0  0.0  1.0  0.0 2018-06-06 00:00:10
2018-06-06 00:00:11  0.0  0.0  1.0  0.0 2018-06-06 00:00:10
2018-06-06 00:00:12  NaN  NaN  NaN  NaN                 NaT
2018-06-06 00:00:13  NaN  NaN  NaN  NaN                 NaT
2018-06-06 00:00:14  1.0  0.0  0.0  0.0 2018-06-06 00:00:15
2018-06-06 00:00:15  1.0  0.0  0.0  0.0 2018-06-06 00:00:15

In [32]: dd = b.reindex(index=index, method='nearest', limit=1, fill_value=miss)                                                                                                                            
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-32-e742a3d0e458> in <module>
----> 1 dd = b.reindex(index=index, method='nearest', limit=1, fill_value=miss)

NameError: name 'miss' is not defined

In [33]: dd = b.reindex(index=index, method='nearest', limit=1, fill_value='miss')                                                                                                                          

In [34]: dd                                                                                                                                                                                                 
Out[34]: 
                        a     b     v     d                   c
2018-06-06 00:00:00     1     0     0     1 2018-06-06 00:00:00
2018-06-06 00:00:01     1     0     0     1 2018-06-06 00:00:00
2018-06-06 00:00:02  miss  miss  miss  miss                 NaT
2018-06-06 00:00:03  miss  miss  miss  miss                 NaT
2018-06-06 00:00:04     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:05     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:06     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:07  miss  miss  miss  miss                 NaT
2018-06-06 00:00:08  miss  miss  miss  miss                 NaT
2018-06-06 00:00:09     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:10     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:11     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:12  miss  miss  miss  miss                 NaT
2018-06-06 00:00:13  miss  miss  miss  miss                 NaT
2018-06-06 00:00:14     1     0     0     0 2018-06-06 00:00:15
2018-06-06 00:00:15     1     0     0     0 2018-06-06 00:00:15

In [35]: b.reindex?                                                                                                                                                                                         

In [36]: dd = b.reindex(index=index, method='nearest', limit=1, fill_value='miss',  tolerance=2)                                                                                                            

In [37]: dd                                                                                                                                                                                                 
Out[37]: 
                        a     b     v     d                   c
2018-06-06 00:00:00     1     0     0     1 2018-06-06 00:00:00
2018-06-06 00:00:01  miss  miss  miss  miss                 NaT
2018-06-06 00:00:02  miss  miss  miss  miss                 NaT
2018-06-06 00:00:03  miss  miss  miss  miss                 NaT
2018-06-06 00:00:04  miss  miss  miss  miss                 NaT
2018-06-06 00:00:05     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:06  miss  miss  miss  miss                 NaT
2018-06-06 00:00:07  miss  miss  miss  miss                 NaT
2018-06-06 00:00:08  miss  miss  miss  miss                 NaT
2018-06-06 00:00:09  miss  miss  miss  miss                 NaT
2018-06-06 00:00:10     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:11  miss  miss  miss  miss                 NaT
2018-06-06 00:00:12  miss  miss  miss  miss                 NaT
2018-06-06 00:00:13  miss  miss  miss  miss                 NaT
2018-06-06 00:00:14  miss  miss  miss  miss                 NaT
2018-06-06 00:00:15     1     0     0     0 2018-06-06 00:00:15

In [38]: dd = b.reindex(index=index, method='nearest', limit=1, fill_value='miss',  tolerance=5)                                                                                                            

In [39]: dd                                                                                                                                                                                                 
Out[39]: 
                        a     b     v     d                   c
2018-06-06 00:00:00     1     0     0     1 2018-06-06 00:00:00
2018-06-06 00:00:01  miss  miss  miss  miss                 NaT
2018-06-06 00:00:02  miss  miss  miss  miss                 NaT
2018-06-06 00:00:03  miss  miss  miss  miss                 NaT
2018-06-06 00:00:04  miss  miss  miss  miss                 NaT
2018-06-06 00:00:05     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:06  miss  miss  miss  miss                 NaT
2018-06-06 00:00:07  miss  miss  miss  miss                 NaT
2018-06-06 00:00:08  miss  miss  miss  miss                 NaT
2018-06-06 00:00:09  miss  miss  miss  miss                 NaT
2018-06-06 00:00:10     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:11  miss  miss  miss  miss                 NaT
2018-06-06 00:00:12  miss  miss  miss  miss                 NaT
2018-06-06 00:00:13  miss  miss  miss  miss                 NaT
2018-06-06 00:00:14  miss  miss  miss  miss                 NaT
2018-06-06 00:00:15     1     0     0     0 2018-06-06 00:00:15

In [40]: dd = b.reindex(index=index, method='nearest', limit=1, fill_value='miss',  tolerance=0)                                                                                                            

In [41]: dd                                                                                                                                                                                                 
Out[41]: 
                        a     b     v     d                   c
2018-06-06 00:00:00     1     0     0     1 2018-06-06 00:00:00
2018-06-06 00:00:01  miss  miss  miss  miss                 NaT
2018-06-06 00:00:02  miss  miss  miss  miss                 NaT
2018-06-06 00:00:03  miss  miss  miss  miss                 NaT
2018-06-06 00:00:04  miss  miss  miss  miss                 NaT
2018-06-06 00:00:05     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:06  miss  miss  miss  miss                 NaT
2018-06-06 00:00:07  miss  miss  miss  miss                 NaT
2018-06-06 00:00:08  miss  miss  miss  miss                 NaT
2018-06-06 00:00:09  miss  miss  miss  miss                 NaT
2018-06-06 00:00:10     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:11  miss  miss  miss  miss                 NaT
2018-06-06 00:00:12  miss  miss  miss  miss                 NaT
2018-06-06 00:00:13  miss  miss  miss  miss                 NaT
2018-06-06 00:00:14  miss  miss  miss  miss                 NaT
2018-06-06 00:00:15     1     0     0     0 2018-06-06 00:00:15

In [42]: dd = b.reindex(index=index, method='nearest', limit=1, fill_value='miss',  tolerance=[])                                                                                                           
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-2b2a93af1f9e> in <module>
----> 1 dd = b.reindex(index=index, method='nearest', limit=1, fill_value='miss',  tolerance=[])

~/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    185         @wraps(func)
    186         def wrapper(*args, **kwargs):
--> 187             return func(*args, **kwargs)
    188 
    189         if not PY2:

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in reindex(self, *args, **kwargs)
   3564         kwargs.pop('axis', None)
   3565         kwargs.pop('labels', None)
-> 3566         return super(DataFrame, self).reindex(**kwargs)
   3567 
   3568     @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   3687         # perform the reindex on the axes
   3688         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3689                                   fill_value, copy).__finalize__(self)
   3690 
   3691     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3499         if index is not None:
   3500             frame = frame._reindex_index(index, method, copy, level,
-> 3501                                          fill_value, limit, tolerance)
   3502 
   3503         return frame

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _reindex_index(self, new_index, method, copy, level, fill_value, limit, tolerance)
   3507         new_index, indexer = self.index.reindex(new_index, method=method,
   3508                                                 level=level, limit=limit,
-> 3509                                                 tolerance=tolerance)
   3510         return self._reindex_with_indexers({0: [new_index, indexer]},
   3511                                            copy=copy, fill_value=fill_value,

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in reindex(self, target, method, level, limit, tolerance)
   3620                     indexer = self.get_indexer(target, method=method,
   3621                                                limit=limit,
-> 3622                                                tolerance=tolerance)
   3623                 else:
   3624                     if method is not None or limit is not None:

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   3248             indexer = self._get_fill_indexer(target, method, limit, tolerance)
   3249         elif method == 'nearest':
-> 3250             indexer = self._get_nearest_indexer(target, limit, tolerance)
   3251         else:
   3252             if tolerance is not None:

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in _get_nearest_indexer(self, target, limit, tolerance)
   3331         if tolerance is not None:
   3332             indexer = self._filter_indexer_tolerance(target, indexer,
-> 3333                                                      tolerance)
   3334         return indexer
   3335 

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in _filter_indexer_tolerance(self, target, indexer, tolerance)
   3336     def _filter_indexer_tolerance(self, target, indexer, tolerance):
   3337         distance = abs(self.values[indexer] - target)
-> 3338         indexer = np.where(distance <= tolerance, indexer, -1)
   3339         return indexer
   3340 

ValueError: operands could not be broadcast together with shapes (16,) (0,) 

In [43]: dd = b.reindex(index=index, method='nearest', limit=1, fill_value='miss')                                                                                                                          

In [44]: dd                                                                                                                                                                                                 
Out[44]: 
                        a     b     v     d                   c
2018-06-06 00:00:00     1     0     0     1 2018-06-06 00:00:00
2018-06-06 00:00:01     1     0     0     1 2018-06-06 00:00:00
2018-06-06 00:00:02  miss  miss  miss  miss                 NaT
2018-06-06 00:00:03  miss  miss  miss  miss                 NaT
2018-06-06 00:00:04     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:05     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:06     1     0     0     0 2018-06-06 00:00:05
2018-06-06 00:00:07  miss  miss  miss  miss                 NaT
2018-06-06 00:00:08  miss  miss  miss  miss                 NaT
2018-06-06 00:00:09     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:10     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:11     0     0     1     0 2018-06-06 00:00:10
2018-06-06 00:00:12  miss  miss  miss  miss                 NaT
2018-06-06 00:00:13  miss  miss  miss  miss                 NaT
2018-06-06 00:00:14     1     0     0     0 2018-06-06 00:00:15
2018-06-06 00:00:15     1     0     0     0 2018-06-06 00:00:15

In [45]: dd = b.reindex(index=index, method='nearest', limit=10, fill_value='miss')                                                                                                                         

In [46]: dd                                                                                                                                                                                                 
Out[46]: 
                     a  b  v  d                   c
2018-06-06 00:00:00  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:01  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:02  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:03  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:04  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:05  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:06  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:07  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:08  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:09  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:10  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:11  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:12  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:13  1  0  0  0 2018-06-06 00:00:15
2018-06-06 00:00:14  1  0  0  0 2018-06-06 00:00:15
2018-06-06 00:00:15  1  0  0  0 2018-06-06 00:00:15

In [47]: dd = b.reindex(index=index, method='nearest',  fill_value='miss')                                                                                                                                  

In [48]: dd                                                                                                                                                                                                 
Out[48]: 
                     a  b  v  d                   c
2018-06-06 00:00:00  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:01  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:02  1  0  0  1 2018-06-06 00:00:00
2018-06-06 00:00:03  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:04  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:05  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:06  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:07  1  0  0  0 2018-06-06 00:00:05
2018-06-06 00:00:08  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:09  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:10  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:11  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:12  0  0  1  0 2018-06-06 00:00:10
2018-06-06 00:00:13  1  0  0  0 2018-06-06 00:00:15
2018-06-06 00:00:14  1  0  0  0 2018-06-06 00:00:15
2018-06-06 00:00:15  1  0  0  0 2018-06-06 00:00:15

In [49]:                                                                                                                                                                                                    

你可能感兴趣的:(pandas 中 reindex 详解 method=nearest? bfill?, ffill?,limit,fill_value,)