说明:本文章为Python数据处理学习日志,记录内容为实现书本内容时遇到的错误以及一些与书本不一致的地方,一些简单操作则不再赘述。日志主要内容来自书本《利用Python进行数据分析》,Wes McKinney著,机械工业出版社。
Init signature:
Series(self, data=None, index=None, dtype=None,name=None, copy=False, fastpath=False)Docstring:
One-dimensional ndarray with axis labels (including time series).Labels need not be unique but must be any hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN)
Operations between Series (+, -, /, , *) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.
Parameters:
data : array-like, dict, or scalar value
Contains data stored in Series
index : array-like or Index (1d)
Values must be unique and hashable, same length as data. Index object (or other iterable of same length as data) Will default to RangeIndex(len(data)) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.
dtype : numpy.dtype or None
If None, dtype will be inferred
copy : boolean, default False
Copy input data
What is hashable type?
Init signature:
DataFrame(self, data=None, index=None, columns=None, dtype=None, copy=False)Docstring:
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary
pandas data structureParameters:
data : numpy ndarray (structured or homogeneous), dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
index : Index or array-like
Index to use for resulting frame. Will default to np.arange(n) if
no indexing information part of input data and no index provided
columns : Index or array-like
Column labels to use for resulting frame. Will default to
np.arange(n) if no column labels are provided
dtype : dtype, default None
Data type to force, otherwise infer
copy : boolean, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input
P116 Series
Series.index显示模式有所不同:
obj = Series([4,7,-5,3])
obj
Out[12]:
0 4
1 7
2 -5
3 3
dtype: int64
obj.values
Out[15]: array([ 4, 7, -5, 3], dtype=int64)
obj.index
Out[16]: RangeIndex(start=0, stop=4, step=1)
obj2 = Series([4,7,-5,3],index=['d','b','a','c'])
obj2
Out[19]:
d 4
b 7
a -5
c 3
dtype: int64
obj2.index
Out[20]: Index([u'd', u'b', u'a', u'c'], dtype='object')
P120 显示DataFrame的列
"""
frame2['state']结果与frame2.state一样
frame2['year']结果与frame2.year一样
frame2['debt']结果与frame2.debt一样
frame2['pop']结果与frame2.pop却不一样
"""
frame2['state']
Out[39]:
one ohio
two ohio
three ohio
four Nevada
five Nevada
Name: state, dtype: object
frame2.state
Out[40]:
one ohio
two ohio
three ohio
four Nevada
five Nevada
Name: state, dtype: object
frame2['year']
Out[41]:
one 2000
two 2001
three 2002
four 2000
five 2001
Name: year, dtype: int64
frame2.year
Out[42]:
one 2000
two 2001
three 2002
four 2000
five 2001
Name: year, dtype: int64
frame2['debt']
Out[43]:
one NaN
two -1.2
three NaN
four -1.5
five -1.7
Name: debt, dtype: float64
frame2.debt
Out[44]:
one NaN
two -1.2
three NaN
four -1.5
five -1.7
Name: debt, dtype: float64
frame2['pop']
Out[45]:
one 1.5
two 1.7
three 3.6
four 2.4
five 2.9
Name: pop, dtype: float64
frame2.pop
Out[46]:
<bound method DataFrame.pop of year state pop debt
one 2000 ohio 1.5 NaN
two 2001 ohio 1.7 -1.2
three 2002 ohio 3.6 NaN
four 2000 Nevada 2.4 -1.5
five 2001 Nevada 2.9 -1.7>
P122 del方法
"""
不能用del frame2.column_name
而要用del frmae2['column_name']
"""
frame2
Out[48]:
year state pop debt eastern
one 2000 ohio 1.5 NaN False
two 2001 ohio 1.7 -1.2 False
three 2002 ohio 3.6 NaN False
four 2000 Nevada 2.4 -1.5 False
five 2001 Nevada 2.9 -1.7 False
del frame2.eastern
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
49-1f2f896bbb30> in ()
----> 1 del frame2.eastern
AttributeError: eastern
del frame2['eastern']
frame2
Out[51]:
year state pop debt
one 2000 ohio 1.5 NaN
two 2001 ohio 1.7 -1.2
three 2002 ohio 3.6 NaN
four 2000 Nevada 2.4 -1.5
five 2001 Nevada 2.9 -1.7
Signature:
obj3.reindex(index=None, **kwargs)Docstring: Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous
index. A new object is produced unless the new index is equivalent to
the current one and copy=FalseParameters:
index :array-like, optional (can be specified in order, or as keywords)
New labels / index to conform to. Preferably an Index object to
avoid duplicating data
method : {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}, optional
method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.* default: don't fill gaps * pad / ffill: propagate last valid observation forward to next valid * backfill / bfill: use next valid observation to fill gap * nearest: use nearest valid observations to fill gap
copy : boolean, default True
Return a new object, even if the passed indexes are the same
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any
“compatible” value
limit : int, default None
Maximum number of consecutive elements to forward or backward fill
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
satisfy the equationabs(index[indexer] - target) <= tolerance
.
Type: property
Docstring:
A primarily label-location based indexer, with integer position fallback.
.ix[]
supports mixed integer and label based access. It is
primarily label based, but will fall back to integer positional access
unless the corresponding axis is of integer type.
.ix
is the most general indexer and will support any of the inputs
in.loc
and.iloc
..ix
also supports floating point label
schemes..ix
is exceptionally useful when dealing with mixed
positional and label based hierachical indexes.However, when an axis is integer based, ONLY label based access and
not positional access is supported. Thus, in such cases, it’s usually
better to be explicit and use.iloc
or.loc
.
Signature: data.drop(labels, axis=0, level=None, inplace=False, errors=’raise’)
Docstring: Return new object with labels in requested axis removed.
Parameters:
labels : single label or list-like
axis : int or axis name
level : int or level name, default None
For MultiIndex
inplace : bool, default False
If True, do operation inplace and return None.
errors : {‘ignore’, ‘raise’}, default ‘raise’
If ‘ignore’, suppress error and existing labels are dropped.Returns:
dropped : type of caller
Signature: obj.rank(axis=0, method=’average’, numeric_only=None, na_option=’keep’, ascending=True, pct=False)
Docstring: Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of
those valuesParameters:
axis: {0 or ‘index’, 1 or ‘columns’}, default 0
index to direct ranking
method : {‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}
* average: average rank of group
* min: lowest rank in group
* max: highest rank in group
* first: ranks assigned in order they appear in the array
* dense: like ‘min’, but rank always increases by 1 between groups
numeric_only : boolean, default None
Include only float, int, boolean data. Valid only for DataFrame or
Panel objects
na_option : {‘keep’, ‘top’, ‘bottom’}
* keep: leave NA values where they are
* top: smallest rank if ascending
* bottom: smallest rank if descending
ascending : boolean, default True
False for ranks by high (1) to low (N)
pct : boolean, default False
Computes percentage rank of dataReturns:
ranks : same type as caller
主要说明一下rank的method参数。rank函数是用来给元素排序的:
obj
Out[206]:
0 7
1 -5
2 7
3 4
4 2
5 0
6 4
dtype: int64
"""
没有则默认为average。obj中有两个4,两个7,按排名算则一次占据4、5、6、7四个名次:
1)当为average时,两个4的名次为(4+5)/2=4.5,连个7的名次为(6+7)/2=6.5(若有3个4,排名分别为4、5、6,则三个4的名次为(4+5+6)/3=5)。
2)当为max时,两个4的名次为两者中较大的名次,即为5;同理两个7的名次为7。
3)当为min时,两个4的名次为两者中较小的名次,即为4;同理两个7的名次为6。
4)当为first时,在原Series中排名靠前的占据靠前的名次,排名靠后的占据靠后的名次。
"""
obj.rank()
Out[207]:
0 6.5
1 1.0
2 6.5
3 4.5
4 3.0
5 2.0
6 4.5
dtype: float64
obj.rank(method='max')
Out[208]:
0 7.0
1 1.0
2 7.0
3 5.0
4 3.0
5 2.0
6 5.0
dtype: float64
obj.rank(method='min')
Out[209]:
0 6.0
1 1.0
2 6.0
3 4.0
4 3.0
5 2.0
6 4.0
dtype: float64
obj.rank(method='first')
Out[210]:
0 6.0
1 1.0
2 7.0
3 4.0
4 3.0
5 2.0
6 5.0
dtype: float64
P132 DataFrame选取行列
书上说 obj[val] 用来选取DataFrame的单个列或一组列,其方法是通过具体的columns名查询,而并不能用单纯的数字来索引:
data['one'] #用columns名查询
Out[60]:
Ohio 0
Colorado 0
Utah 8
New York 12
Name: one, dtype: int32
data[0] #用数字代替索引
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
61-c0c8b06be82d> in <module>()
----> 1 data[0] #用数字代替索引
E:\Enthought\hzk\User\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1990 return self._getitem_multilevel(key)
1991 else:
-> 1992 return self._getitem_column(key)
1993
1994 def _getitem_column(self, key):
E:\Enthought\hzk\User\lib\site-packages\pandas\core\frame.pyc in _getitem_column(self, key)
1997 # get column
1998 if self.columns.is_unique:
-> 1999 return self._get_item_cache(key)
2000
2001 # duplicate columns & possible reduce dimensionality
E:\Enthought\hzk\User\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
1343 res = cache.get(item)
1344 if res is None:
-> 1345 values = self._data.get(item)
1346 res = self._box_item_values(item, values)
1347 cache[item] = res
E:\Enthought\hzk\User\lib\site-packages\pandas\core\internals.pyc in get(self, item, fastpath)
3223
3224 if not isnull(item):
-> 3225 loc = self.items.get_loc(item)
3226 else:
3227 indexer = np.arange(len(self.items))[isnull(self.items)]
E:\Enthought\hzk\User\lib\site-packages\pandas\indexes\base.pyc in get_loc(self, key, method, tolerance)
1876 return self._engine.get_loc(key)
1877 except KeyError:
-> 1878 return self._engine.get_loc(self._maybe_cast_indexer(key))
1879
1880 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4027)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3891)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)()
KeyError: 0
data['Ohio'] #用行index索引(行)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
62-5dd4df56835a> in <module>()
----> 1 data['Ohio'] #用行index索引(行)
E:\Enthought\hzk\User\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1990 return self._getitem_multilevel(key)
1991 else:
-> 1992 return self._getitem_column(key)
1993
1994 def _getitem_column(self, key):
E:\Enthought\hzk\User\lib\site-packages\pandas\core\frame.pyc in _getitem_column(self, key)
1997 # get column
1998 if self.columns.is_unique:
-> 1999 return self._get_item_cache(key)
2000
2001 # duplicate columns & possible reduce dimensionality
E:\Enthought\hzk\User\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
1343 res = cache.get(item)
1344 if res is None:
-> 1345 values = self._data.get(item)
1346 res = self._box_item_values(item, values)
1347 cache[item] = res
E:\Enthought\hzk\User\lib\site-packages\pandas\core\internals.pyc in get(self, item, fastpath)
3223
3224 if not isnull(item):
-> 3225 loc = self.items.get_loc(item)
3226 else:
3227 indexer = np.arange(len(self.items))[isnull(self.items)]
E:\Enthought\hzk\User\lib\site-packages\pandas\indexes\base.pyc in get_loc(self, key, method, tolerance)
1876 return self._engine.get_loc(key)
1877 except KeyError:
-> 1878 return self._engine.get_loc(self._maybe_cast_indexer(key))
1879
1880 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4027)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3891)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)()
KeyError: 'Ohio'
data[:3] #这样显示的是筛选后的行信息
Out[63]:
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
data[2:3] #同上
Out[64]:
one two three four
Utah 8 9 10 11
P133 有点有趣的现象
与计算机存储数据有关:
s1
Out[82]:
a 7.3
c -2.5
d 3.4
e 1.5
dtype: float64
s1.a
Out[83]: 7.2999999999999998
s1['a']
Out[84]: 7.2999999999999998
s1.c
Out[85]: -2.5
s1.d
Out[86]: 3.3999999999999999
s1.e
Out[87]: 1.5
P134 add函数
书上的例子并不好,并不能显示add函数的全貌:
df1 = DataFrame(arange(20).reshape((5,4)),columns=list('abcd'))
df2 = DataFrame(arange(24).reshape((4,6)),columns=list('abcdef'))
df1
Out[121]:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
df2
Out[122]:
a b c d e f
0 0 1 2 3 4 5
1 6 7 8 9 10 11
2 12 13 14 15 16 17
3 18 19 20 21 22 23
df1+df2
Out[123]:
a b c d e f
0 0.0 2.0 4.0 6.0 NaN NaN
1 10.0 12.0 14.0 16.0 NaN NaN
2 20.0 22.0 24.0 26.0 NaN NaN
3 30.0 32.0 34.0 36.0 NaN NaN
4 NaN NaN NaN NaN NaN NaN
df1.add(df2,fill_value=0)
Out[124]:
a b c d e f
0 0.0 2.0 4.0 6.0 4.0 5.0
1 10.0 12.0 14.0 16.0 10.0 11.0
2 20.0 22.0 24.0 26.0 16.0 17.0
3 30.0 32.0 34.0 36.0 22.0 23.0
4 16.0 17.0 18.0 19.0 NaN NaN
df2.add(df1,fill_value=0)
Out[125]:
a b c d e f
0 0.0 2.0 4.0 6.0 4.0 5.0
1 10.0 12.0 14.0 16.0 10.0 11.0
2 20.0 22.0 24.0 26.0 16.0 17.0
3 30.0 32.0 34.0 36.0 22.0 23.0
4 16.0 17.0 18.0 19.0 NaN NaN
用内省方法查看add函数的参数:
Signature: df1.add(other, axis=’columns’, level=None, fill_value=None)
Docstring: Addition of dataframe and other, element-wise (binary operator
add
). Equivalent todataframe + other
, but with support
to substitute a fill_value for missing data in one of the inputs.Parameters:
other : Series, DataFrame, or constant
axis : {0, 1, ‘index’, ‘columns’}
For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame
locations are missing, the result will be missing
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level
可以看到“If both DataFrame locations are missing, the result will be missing”,shape为(5,4)和(4,6)两个DataFrame相加时,在(5,5)和(5,6)位置上的两个元素均没有值,故相加后依然为NaN。
P139 order函数和sort_函数
警告:不赞成使用order函数。
obj.order()
-c:1: FutureWarning: order is deprecated, use sort_values(...)
Out[186]:
2 -3
3 2
0 4
1 7
dtype: int64
obj.sort_values()
Out[187]:
2 -3
3 2
0 4
1 7
dtype: int64
警告:不赞成在sort_index函数中使用参数by。
frame.sort_index(by='b')
-c:1: FutureWarning: by argument to sort_index is deprecated, pls use .sort_values(by=...)
Out[194]:
a b c
2 0 -3 2
3 1 2 1
0 0 4 4
1 1 7 3
frame.sort_values(by='b')
Out[195]:
a b c
2 0 -3 2
3 1 2 1
0 0 4 4
1 1 7 3
sort_index函数中已经没有by参数了:
Signature: rame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’,
sort_remaining=True, by=None)Docstring: Sort object by labels (along an axis)
Parameters:
axis : index, columns to direct sorting
level : int or level name or list of ints or list of level names
if not None, sort on values in specified index level(s)
ascending : boolean, default True
Sort ascending vs. descending
inplace : bool
if True, perform operation in-place
kind : {quicksort
,mergesort
,heapsort
}
Choice of sorting algorithm. See also ndarray.np.sort for more
information.mergesort
is the only stable algorithm. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position : {‘first’, ‘last’}
first
puts NaNs at the beginning,last
puts NaNs at the end
sort_remaining : bool
if true and sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified levelReturns:
sorted_obj : DataFrame
sort_values函数:
Signature: frame.sort_values(by, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’)
Docstring: Sort by the values along either axis
Parameters:
by : string name or list of names which refer to the axis items
axis : index, columns to direct sorting
ascending : bool or list of bool
Sort ascending vs. descending. Specify list for multiple sort
orders. If this is a list of bools, must match the length of
the by.
inplace : bool
if True, perform operation in-place
kind : {quicksort
,mergesort
,heapsort
}
Choice of sorting algorithm. See also ndarray.np.sort for more
information.mergesort
is the only stable algorithm. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position : {‘first’, ‘last’}
first
puts NaNs at the beginning,last
puts NaNs at the endReturns:
sorted_obj : DataFrame
P145 import web
不过仍可用。
import pandas.io.data as web
E:\Enthought\hzk\User\lib\site-packages\pandas\io\data.py:35: FutureWarning:
The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
FutureWarning)
书本上的函数读不出数据:
for ticker in ['APPL','IBM','MSFT','GOOG']:
all_data[ticker]=web.get_data_yahoo(ticker,'1/1/2010','1/1/2011')
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
input -254-4cf257dad11f> in <module>()
1 for ticker in ['APPL','IBM','MSFT','GOOG']:
----> 2 all_data[ticker]=web.get_data_yahoo(ticker,'1/1/2010','1/1/2011')
3
E:\Enthought\hzk\User\lib\site-packages\pandas\io\data.pyc in get_data_yahoo(symbols, start, end, retry_count, pause, adjust_price, ret_index, chunksize, interval)
438 raise ValueError("Invalid interval: valid values are 'd', 'w', 'm' and 'v'")
439 return _get_data_from(symbols, start, end, interval, retry_count, pause,
--> 440 adjust_price, ret_index, chunksize, 'yahoo')
441
442
E:\Enthought\hzk\User\lib\site-packages\pandas\io\data.pyc in _get_data_from(symbols, start, end, interval, retry_count, pause, adjust_price, ret_index, chunksize, source)
379 # If a single symbol, (e.g., 'GOOG')
380 if isinstance(symbols, (compat.string_types, int)):
--> 381 hist_data = src_fn(symbols, start, end, interval, retry_count, pause)
382 # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
383 elif isinstance(symbols, DataFrame):
E:\Enthought\hzk\User\lib\site-packages\pandas\io\data.pyc in _get_hist_yahoo(sym, start, end, interval, retry_count, pause)
222 '&g=%s' % interval +
223 '&ignore=.csv')
--> 224 return _retry_read_url(url, retry_count, pause, 'Yahoo!')
225
226
E:\Enthought\hzk\User\lib\site-packages\pandas\io\data.pyc in _retry_read_url(url, retry_count, pause, name)
199
200 raise IOError("after %d tries, %s did not "
--> 201 "return a 200 for url %r" % (retry_count, name, url))
202
203
IOError: after 3 tries, Yahoo! did not return a 200 for url 'http://ichart.finance.yahoo.com/table.csv?s=APPL&a=0&b=1&c=2010&d=0&e=1&f=2011&g=d&ignore=.csv'
可更改如下:
for ticker in ['AAPL','IBM','MSFT','GOOG']:
all_data[ticker]=web.DataReader(ticker,'yahoo','1/1/2000','1/1/2010')
P147 计数
结果与书上略不一样:
pd.value_counts(obj.values,sort=False)
Out[292]:
a 3 """这里顺序不一样"""
c 3
b 2
d 1
dtype: int64
pd.value_counts(obj.values)
Out[293]:
c 3
a 3
b 2
d 1
dtype: int64
Signature: data.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False)
Docstring: Return object with labels on given axis omitted where alternately any or all of the data are missing
Parameters:
axis : {0 or ‘index’, 1 or ‘columns’}, or tuple/list thereof
Pass tuple or list to drop on multiple axes
how : {‘any’, ‘all’}
* any : if any NA values are present, drop that label
* all : if all values are NA, drop that label
thresh : int, default None
int value : require that many non-NA values
subset : array-like
Labels along other axis to consider, e.g. if you are dropping rows
these would be a list of columns to include
inplace : boolean, default False
If True, do operation inplace and return None.Returns:
dropped : DataFrame
Signature: df.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
Docstring: Fill NA/NaN values using the specified method
Parameters:
value : scalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a
dict/Series/DataFrame of values specifying which value to use for
each index (for a Series) or column (for a DataFrame). (values not
in the dict/Series/DataFrame will not be filled). This value cannot
be a list.
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap
axis : {0, 1, ‘index’, ‘columns’}
inplace : boolean, default False
If True, fill in place. Note: this will modify any
other views on this object, (e.g. a no-copy slice for a column in a
DataFrame).
limit : int, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap
with more than this number of consecutive NaNs, it will only be
partially filled. If method is not specified, this is the maximum
number of entries along the entire axis where NaNs will be filled.
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible,
or the string ‘infer’ which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible)Returns:
filled : DataFrame
P150 df.ix
注意:
#这里是5
df[:5]
Out[40]:
0 1 2
0 -0.085390 NaN NaN
1 0.502172 NaN NaN
2 -1.382911 NaN NaN
3 0.037798 NaN 0.535017
4 0.358564 NaN 0.036123
#这里是4
df.ix[:4]
Out[41]:
0 1 2
0 -0.085390 NaN NaN
1 0.502172 NaN NaN
2 -1.382911 NaN NaN
3 0.037798 NaN 0.535017
4 0.358564 NaN 0.036123
P155 分级排序
内层为level1,外层为level0:
frame
Out[96]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
frame.sortlevel(1)
Out[97]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
b 1 6 7 8
a 2 3 4 5
b 2 9 10 11
frame.sortlevel(0)
Out[98]:
state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
frame.sortlevel(1,axis=1)
Out[99]:
state Colorado Ohio
color Green Green Red
key1 key2
a 1 2 0 1
2 5 3 4
b 1 8 6 7
2 11 9 10
frame.sortlevel(0,axis=1)
Out[100]:
state Colorado Ohio
color Green Green Red
key1 key2
a 1 2 0 1
2 5 3 4
b 1 8 6 7
2 11 9 10