python dataframe 工作学习笔记1:

 一:apply axis=0/1理解:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

1.axis=0 带行索引传到function按列apply,结果输出为每一列apply function的值

2.axis=1 带列索引传到function按行apply,结果输出为每一行apply function的值

>>> df = pd.DataFrame([[4, 9],] * 3, columns=['A', 'B'])
>>> df
   A  B
0  4  9
1  4  9
2  4  9

1例 axis=0:

>>> df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64

2例 axis=1:

>>> df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

 二:属性loc,at ,iloc,iat:

loc,at :标签名访问

一维行名:['a']      二维行列名:[['row_name',...],['colname',...]]      三维行元祖列名:[(row1_name,row1_1_name),'col_name']   

行范围:[row1_name:row2_name], both the start and the stop are included

iloc,iat:标签index访问[1,2]

三:杂记

1.返回True的行:

df.loc[[False, False, True]]

2.行范围筛选,红色部分去掉

df.loc[df['shield'] > 6, ['max_speed']]
3.dataframe 行列index默认从0开始,可设置

4.However, when an axis is integer based, ONLY label based access and not positional access is supported. Thus, in such cases, it’s usually better to be explicit and use .iloc or .loc.

5.dataframe 转numpy.ndarray :

df.values

6.df添加行df.append() 添加行列df.add()

7.索引为时间的时序数据:

提取时刻数据:at_time

提取时间区间数据:DataFrame.between_time(start_timeend_timeinclude_start=Trueinclude_end=True)

8.python类似三目运算符形式:a=b if b>a else a  combine()合并DF

s1 if s1.sum() < s2.sum() else s2
>>> df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, lambda s1, s2: s1 if s1.sum() < s2.sum() else s2)
   A  B
0  0  3
1  0  3

9.where()除了筛选还可以填充不符合条件的值

10. pandas.to_json()  pandas.read_json()  共有参数orient设置读写数据格式df,series,numpy,table.....


你可能感兴趣的:(python dataframe 工作学习笔记1:)