df.at['20190102', 'A']
File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 449, in pandas._libs.index.DatetimeEngine.get_loc
File "pandas/_libs/index.pyx", line 455, in pandas._libs.index.DatetimeEngine._date_check_type
KeyError: '20190102'
通过位置选择
选择某行
df.iloc[3]
Out[71]:
A -0.485980
B -1.281454
C 0.354063
D -1.418858
Name: 2019-01-04 00:00:00, dtype: float64
iloc[]方法的参数,必须是数值。
选择指定行列的数据
df.iloc[3:5, 0:2]
Out[72]:
A B
2019-01-04 -0.485980 -1.281454
2019-01-05 -1.122717 -2.789041
df.iloc[:,:]
Out[73]:
A B C D
2019-01-01 0.671622 0.785726 0.392435 0.874692
2019-01-02 -2.420703 -1.116208 -0.346070 0.785941
2019-01-03 1.364425 -0.947641 2.386880 0.585372
2019-01-04 -0.485980 -1.281454 0.354063 -1.418858
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
2019-01-06 0.221597 -0.753038 -1.741256 0.287280
df.iloc[[1, 2, 4], [0, 2]]
Out[74]:
A C
2019-01-02 -2.420703 -0.346070
2019-01-03 1.364425 2.386880
2019-01-05 -1.122717 -0.791812
df[df.A > 0]
Out[77]:
A B C D
2019-01-01 0.671622 0.785726 0.392435 0.874692
2019-01-03 1.364425 -0.947641 2.386880 0.585372
2019-01-06 0.221597 -0.753038 -1.741256 0.287280
筛选出符合要求的数据
df[df > 0]
Out[78]:
A B C D
2019-01-01 0.671622 0.785726 0.392435 0.874692
2019-01-02 NaN NaN NaN 0.785941
2019-01-03 1.364425 NaN 2.386880 0.585372
2019-01-04 NaN NaN 0.354063 NaN
2019-01-05 NaN NaN NaN NaN
2019-01-06 0.221597 NaN NaN 0.287280
不符合要求的数据均会被赋值为空NaN。
使用isin()方法筛选
df2 = df.copy()
df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three']
df2
Out[88]:
A B C D E
2019-01-01 0.671622 0.785726 0.392435 0.874692 one
2019-01-02 -2.420703 -1.116208 -0.346070 0.785941 one
2019-01-03 1.364425 -0.947641 2.386880 0.585372 two
2019-01-04 -0.485980 -1.281454 0.354063 -1.418858 three
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345 four
2019-01-06 0.221597 -0.753038 -1.741256 0.287280 three
df2['E'].isin(['two', 'four'])
Out[89]:
2019-01-01 False
2019-01-02 False
2019-01-03 True
2019-01-04 False
2019-01-05 True
2019-01-06 False
Freq: D, Name: E, dtype: bool
df2[df2['E'].isin(['two', 'four'])]
Out[90]:
A B C D E
2019-01-03 1.364425 -0.947641 2.386880 0.585372 two
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345 four
s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates)
s
Out[122]:
2019-01-01 1.0
2019-01-02 3.0
2019-01-03 5.0
2019-01-04 NaN
2019-01-05 6.0
2019-01-06 8.0
Freq: D, dtype: float64
s = s.shift(2)
s
Out[125]:
2019-01-01 NaN
2019-01-02 NaN
2019-01-03 1.0
2019-01-04 3.0
2019-01-05 5.0
2019-01-06 NaN
Freq: D, dtype: float64
这里将s的值移动两个,那么空出的部分会自动使用NaN填充。
不同维度间的运算,pandas会自动扩展维度:
df.sub(s, axis='index')
Out[128]:
A B C D F
2019-01-01 NaN NaN NaN NaN NaN
2019-01-02 NaN NaN NaN NaN NaN
2019-01-03 0.364425 -1.947641 1.386880 4.0 1.0
2019-01-04 -3.485980 -4.281454 -2.645937 2.0 0.0
2019-01-05 -6.122717 -7.789041 -5.791812 0.0 -1.0
2019-01-06 NaN NaN NaN NaN NaN
应用
通过apply()方法,可以对数据进行逐一操作:
累计求和
df.apply(np.cumsum)
Out[130]:
A B C D F
2019-01-01 0.000000 0.000000 0.392435 5 NaN
2019-01-02 -2.420703 -1.116208 0.046365 10 1.0
2019-01-03 -1.056278 -2.063849 2.433245 15 3.0
2019-01-04 -1.542258 -3.345303 2.787307 20 6.0
2019-01-05 -2.664975 -6.134345 1.995495 25 10.0
2019-01-06 -2.443377 -6.887383 0.254239 30 15.0
这里使用了apply()方法调用np.cumsum方法,也可直接使用df.cumsum():
df.cumsum()
Out[133]:
A B C D F
2019-01-01 0.000000 0.000000 0.392435 5.0 NaN
2019-01-02 -2.420703 -1.116208 0.046365 10.0 1.0
2019-01-03 -1.056278 -2.063849 2.433245 15.0 3.0
2019-01-04 -1.542258 -3.345303 2.787307 20.0 6.0
2019-01-05 -2.664975 -6.134345 1.995495 25.0 10.0
2019-01-06 -2.443377 -6.887383 0.254239 30.0 15.0
自定义方法
通过自定义函数,配合apply()方法,可以实现更多数据处理:
df.apply(lambda x: x.max() - x.min())
Out[134]:
A 3.785129
B 2.789041
C 4.128136
D 0.000000
F 4.000000
dtype: float64
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B': ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C': np.random.randn(8),
'D': np.random.randn(8)})
df
Out[166]:
A B C D
0 foo one -1.252153 0.172863
1 bar one 0.238547 -0.648980
2 foo two 0.756975 0.195766
3 bar three -0.933405 -0.320043
4 foo two -0.310650 -1.388255
5 bar two 1.568550 -1.911817
6 foo one -0.340290 -2.141259
按A列分组并使用sum函数进行计算:
df.groupby('A').sum()
Out[167]:
C D
A
bar 0.873692 -2.880840
foo -1.817027 -5.833961
这里由于B列无法应用sum函数,所以直接被忽略了。
按A、B列分组并使用sum函数进行计算:
df.groupby(['A', 'B']).sum()
Out[168]:
C D
A B
bar one 0.238547 -0.648980
three -0.933405 -0.320043
two 1.568550 -1.911817
foo one -1.592443 -1.968396
three -0.670909 -2.673075
two 0.446325 -1.192490
这样就有了一个多层index的结果集。
整形 Reshaping
堆叠 Stack
python的zip函数可以将对象中对应的元素打包成一个个的元组:
tuples = list(zip(['bar', 'bar', 'baz', 'baz',
'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two',
'one', 'two', 'one', 'two']))
tuples
Out[172]:
[('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')]
## 设置两级索引
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index
Out[174]:
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
codes=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
names=['first', 'second'])
## 创建DataFrame
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])
df
Out[176]:
A B
first second
bar one -0.501215 -0.947993
two -0.828914 0.232167
baz one 1.245419 1.006092
two 1.016656 -0.441073
foo one 0.479037 -0.500034
two -1.113097 0.591696
qux one -0.014760 -0.320735
two -0.648743 1.499899
## 选取DataFrame
df2 = df[:4]
df2
Out[179]:
A B
first second
bar one -0.501215 -0.947993
two -0.828914 0.232167
baz one 1.245419 1.006092
two 1.016656 -0.441073
使用stack()方法,可以通过堆叠的方式将二维数据变成为一维数据:
stacked = df2.stack()
stacked
Out[181]:
first second
bar one A -0.501215
B -0.947993
two A -0.828914
B 0.232167
baz one A 1.245419
B 1.006092
two A 1.016656
B -0.441073
dtype: float64
对应的逆操作为unstacked()方法:
stacked.unstack()
Out[182]:
A B
first second
bar one -0.501215 -0.947993
two -0.828914 0.232167
baz one 1.245419 1.006092
two 1.016656 -0.441073
stacked.unstack(1)
Out[183]:
second one two
first
bar A -0.501215 -0.828914
B -0.947993 0.232167
baz A 1.245419 1.016656
B 1.006092 -0.441073
stacked.unstack(0)
Out[184]:
first bar baz
second
one A -0.501215 1.245419
B -0.947993 1.006092
two A -0.828914 1.016656
B 0.232167 -0.441073
unstack()默认对最后一层级进行操作,也可通过输入参数指定。
表格转置
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,
'B': ['A', 'B', 'C'] * 4,
'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
'D': np.random.randn(12),
'E': np.random.randn(12)})
df
Out[190]:
A B C D E
0 one A foo -0.933264 -2.387490
1 one B foo -0.288101 0.023214
2 two C foo 0.594490 0.418505
3 three A bar 0.450683 1.939623
4 one B bar 0.243897 -0.965783
5 one C bar -0.705494 -0.078283
6 two A foo 1.560352 0.419907
7 three B foo 0.199453 0.998711
8 one C foo 1.426861 -1.108297
9 one A bar -0.570951 -0.022560
10 two B bar -0.350937 -1.767804
11 three C bar 0.983465 0.065792
通过pivot_table()方法可以很方便的进行行列的转换:
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
Out[191]:
C bar foo
A B
one A -0.570951 -0.933264
B 0.243897 -0.288101
C -0.705494 1.426861
three A 0.450683 NaN
B NaN 0.199453
C 0.983465 NaN
two A NaN 1.560352
B -0.350937 NaN
C NaN 0.594490
df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
"raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']})
df
Out[255]:
id raw_grade
0 1 a
1 2 b
2 3 b
3 4 a
4 5 a
5 6 e
df['grade'] = df['raw_grade'].astype('category')
df['grade']
Out[257]:
0 a
1 b
2 b
3 a
4 a
5 e
Name: grade, dtype: category
Categories (3, object): [a, b, e]
重命名分类:cat
df["grade"].cat.categories = ["very good", "good", "very bad"]
df['grade']
Out[269]:
0 very good
1 good
2 good
3 very good
4 very good
5 very bad
Name: grade, dtype: category
Categories (3, object): [very good, good, very bad]
重分类:
df['grade'] = df['grade'].cat.set_categories(["very bad", "bad", "medium","good", "very good"])
df['grade']
Out[271]:
0 very good
1 good
2 good
3 very good
4 very good
5 very bad
Name: grade, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]
排列
df.sort_values(by="grade")
Out[272]:
id raw_grade grade
5 6 e very bad
1 2 b good
2 3 b good
0 1 a very good
3 4 a very good
4 5 a very good
分组
df.groupby("grade").size()
Out[273]:
grade
very bad 1
bad 0
medium 0
good 2
very good 3
dtype: int64
>>> if pd.Series([False, True, False]):
... print("I was true")
Traceback
...
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
spring
【控制反转(IOC)/依赖注入(DI)】:
由容器控制程序之间的关系,而非传统实现中,由程序代码直接操控。这也就是所谓“控制反转”的概念所在:控制权由应用代码中转到了外部容器,控制权的转移,是所谓反转。
简单的说:对象的创建又容器(比如spring容器)来执行,程序里不直接new对象。
Web
【单点登录(SSO)】:SSO的定义是在多个应用系统中,用户
Bellman-Ford算法(根据发明者 Richard Bellman 和 Lester Ford 命名)是求解单源最短路径问题的一种算法。单源点的最短路径问题是指:给定一个加权有向图G和源点s,对于图G中的任意一点v,求从s到v的最短路径。有时候这种算法也被称为 Moore-Bellman-Ford 算法,因为 Edward F. Moore zu 也为这个算法的发展做出了贡献。
与迪科
Microsoft .NET Framework 3.5 Service Pack 1(完整软件包)
http://www.microsoft.com/zh-cn/download/details.aspx?id=25150
Microsoft .NET Framework 3.5 Service Pack 1 是一个累积更新,包含很多基于 .NET Framewo
public final class ViewStub extends View
java.lang.Object
android.view.View
android.view.ViewStub
类摘要: ViewStub 是一个隐藏的,不占用内存空间的视图对象,它可以在运行时延迟加载布局资源文件。当 ViewSt