pandas学习(二)

pandas学习二

import numpy as np
import pandas as pd
#定义初始数据
date=pd.date_range('20200408',periods=6)
df=pd.DataFrame(np.arange(24).reshape((6,4)),index=date,columns=['a','b','c','d'])
date
DatetimeIndex(['2020-04-08', '2020-04-09', '2020-04-10', '2020-04-11',
               '2020-04-12', '2020-04-13'],
              dtype='datetime64[ns]', freq='D')
df
a b c d
2020-04-08 0 1 2 3
2020-04-09 4 5 6 7
2020-04-10 8 9 10 11
2020-04-11 12 13 14 15
2020-04-12 16 17 18 19
2020-04-13 20 21 22 23
#输出某列数据
print(df['a'],df.a)
2020-04-08     0
2020-04-09     4
2020-04-10     8
2020-04-11    12
2020-04-12    16
2020-04-13    20
Freq: D, Name: a, dtype: int32 2020-04-08     0
2020-04-09     4
2020-04-10     8
2020-04-11    12
2020-04-12    16
2020-04-13    20
Freq: D, Name: a, dtype: int32
#输出某些行数据
df[0:3]
a b c d
2020-04-08 0 1 2 3
2020-04-09 4 5 6 7
2020-04-10 8 9 10 11

用loc进行纯标签名输出

#选取某行 
df.loc['20200408']
a    0
b    1
c    2
d    3
Name: 2020-04-08 00:00:00, dtype: int32
#输出ab列的所有数据
df.loc[:,['a','b']]
a b
2020-04-08 0 1
2020-04-09 4 5
2020-04-10 8 9
2020-04-11 12 13
2020-04-12 16 17
2020-04-13 20 21
df.loc['20200408',['a','b']]
a    0
b    1
Name: 2020-04-08 00:00:00, dtype: int32

用iloc进行纯索引输出

#输出第三行第一位的数据
df.iloc[3,1]
13
#输出多行多列数据
df.iloc[3:5,1:3]
b c
2020-04-11 13 14
2020-04-12 17 18
#指定具体索引输出
df.iloc[[1,3,5],[0,3]]
a d
2020-04-09 4 7
2020-04-11 12 15
2020-04-13 20 23

用ix进行标签名+索引混合输出 (ix以及被放弃了)

df.ix[:3,['a','c']]
D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
a c
2020-04-08 0 2
2020-04-09 4 6
2020-04-10 8 10

数据筛选

#筛选出小于8的数据
df[df<8]
a b c d
2020-04-08 0.0 1.0 2.0 3.0
2020-04-09 4.0 5.0 6.0 7.0
2020-04-10 NaN NaN NaN NaN
2020-04-11 NaN NaN NaN NaN
2020-04-12 NaN NaN NaN NaN
2020-04-13 NaN NaN NaN NaN
df
a b c d
2020-04-08 0 0 0 0
2020-04-09 4 5 6 7
2020-04-10 8 9 10 11
2020-04-11 12 13 14 15
2020-04-12 16 17 18 19
2020-04-13 20 21 22 23
#筛选出a里面小于3的值,该值所在的行全部设置为0
df[df.a<3]=0
df
a b c d
2020-04-08 0 0 0 0
2020-04-09 4 5 6 7
2020-04-10 8 9 10 11
2020-04-11 12 13 14 15
2020-04-12 16 17 18 19
2020-04-13 20 21 22 23
#筛选出a里面小于5的值,该值所在的b列全部设置为1
df.b[df.a<5]=1
df
a b c d
2020-04-08 0 1 0 0
2020-04-09 4 1 6 7
2020-04-10 8 9 10 11
2020-04-11 12 13 14 15
2020-04-12 16 17 18 19
2020-04-13 20 21 22 23
#新加一列空值
df['e']=np.nan
df
a b c d e
2020-04-08 0 1 0 0 NaN
2020-04-09 4 1 6 7 NaN
2020-04-10 8 9 10 11 NaN
2020-04-11 12 13 14 15 NaN
2020-04-12 16 17 18 19 NaN
2020-04-13 20 21 22 23 NaN

你可能感兴趣的:(Python)