pandas学习二
import numpy as np
import pandas as pd
date=pd.date_range('20200408',periods=6)
df=pd.DataFrame(np.arange(24).reshape((6,4)),index=date,columns=['a','b','c','d'])
date
DatetimeIndex(['2020-04-08', '2020-04-09', '2020-04-10', '2020-04-11',
'2020-04-12', '2020-04-13'],
dtype='datetime64[ns]', freq='D')
df
|
a |
b |
c |
d |
2020-04-08 |
0 |
1 |
2 |
3 |
2020-04-09 |
4 |
5 |
6 |
7 |
2020-04-10 |
8 |
9 |
10 |
11 |
2020-04-11 |
12 |
13 |
14 |
15 |
2020-04-12 |
16 |
17 |
18 |
19 |
2020-04-13 |
20 |
21 |
22 |
23 |
print(df['a'],df.a)
2020-04-08 0
2020-04-09 4
2020-04-10 8
2020-04-11 12
2020-04-12 16
2020-04-13 20
Freq: D, Name: a, dtype: int32 2020-04-08 0
2020-04-09 4
2020-04-10 8
2020-04-11 12
2020-04-12 16
2020-04-13 20
Freq: D, Name: a, dtype: int32
df[0:3]
|
a |
b |
c |
d |
2020-04-08 |
0 |
1 |
2 |
3 |
2020-04-09 |
4 |
5 |
6 |
7 |
2020-04-10 |
8 |
9 |
10 |
11 |
用loc进行纯标签名输出
df.loc['20200408']
a 0
b 1
c 2
d 3
Name: 2020-04-08 00:00:00, dtype: int32
df.loc[:,['a','b']]
|
a |
b |
2020-04-08 |
0 |
1 |
2020-04-09 |
4 |
5 |
2020-04-10 |
8 |
9 |
2020-04-11 |
12 |
13 |
2020-04-12 |
16 |
17 |
2020-04-13 |
20 |
21 |
df.loc['20200408',['a','b']]
a 0
b 1
Name: 2020-04-08 00:00:00, dtype: int32
用iloc进行纯索引输出
df.iloc[3,1]
13
df.iloc[3:5,1:3]
|
b |
c |
2020-04-11 |
13 |
14 |
2020-04-12 |
17 |
18 |
df.iloc[[1,3,5],[0,3]]
|
a |
d |
2020-04-09 |
4 |
7 |
2020-04-11 |
12 |
15 |
2020-04-13 |
20 |
23 |
用ix进行标签名+索引混合输出 (ix以及被放弃了)
df.ix[:3,['a','c']]
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
"""Entry point for launching an IPython kernel.
|
a |
c |
2020-04-08 |
0 |
2 |
2020-04-09 |
4 |
6 |
2020-04-10 |
8 |
10 |
数据筛选
df[df<8]
|
a |
b |
c |
d |
2020-04-08 |
0.0 |
1.0 |
2.0 |
3.0 |
2020-04-09 |
4.0 |
5.0 |
6.0 |
7.0 |
2020-04-10 |
NaN |
NaN |
NaN |
NaN |
2020-04-11 |
NaN |
NaN |
NaN |
NaN |
2020-04-12 |
NaN |
NaN |
NaN |
NaN |
2020-04-13 |
NaN |
NaN |
NaN |
NaN |
df
|
a |
b |
c |
d |
2020-04-08 |
0 |
0 |
0 |
0 |
2020-04-09 |
4 |
5 |
6 |
7 |
2020-04-10 |
8 |
9 |
10 |
11 |
2020-04-11 |
12 |
13 |
14 |
15 |
2020-04-12 |
16 |
17 |
18 |
19 |
2020-04-13 |
20 |
21 |
22 |
23 |
df[df.a<3]=0
df
|
a |
b |
c |
d |
2020-04-08 |
0 |
0 |
0 |
0 |
2020-04-09 |
4 |
5 |
6 |
7 |
2020-04-10 |
8 |
9 |
10 |
11 |
2020-04-11 |
12 |
13 |
14 |
15 |
2020-04-12 |
16 |
17 |
18 |
19 |
2020-04-13 |
20 |
21 |
22 |
23 |
df.b[df.a<5]=1
df
|
a |
b |
c |
d |
2020-04-08 |
0 |
1 |
0 |
0 |
2020-04-09 |
4 |
1 |
6 |
7 |
2020-04-10 |
8 |
9 |
10 |
11 |
2020-04-11 |
12 |
13 |
14 |
15 |
2020-04-12 |
16 |
17 |
18 |
19 |
2020-04-13 |
20 |
21 |
22 |
23 |
df['e']=np.nan
df
|
a |
b |
c |
d |
e |
2020-04-08 |
0 |
1 |
0 |
0 |
NaN |
2020-04-09 |
4 |
1 |
6 |
7 |
NaN |
2020-04-10 |
8 |
9 |
10 |
11 |
NaN |
2020-04-11 |
12 |
13 |
14 |
15 |
NaN |
2020-04-12 |
16 |
17 |
18 |
19 |
NaN |
2020-04-13 |
20 |
21 |
22 |
23 |
NaN |