Dataframe既有行索引也有列索引,可以被看做由Series组成的字典。
df = pd.DataFrame(np.random.randint(100,size =12).reshape(3,4),
index = ['one','two','three'],
columns = ['a','b','c','d'])
print(df)
============================
a b c d
one 35 35 17 50
two 53 4 51 23
three 82 12 51 97
# 按照列名选择列,只选择一列输出Series,选择多列输出Dataframe
data1 = df['a']
data2 = df[['a','c']]
print(data1,type(data1))
print(data2,type(data2))
============================
one 35
two 53
three 82
Name: a, dtype: int32 <class 'pandas.core.series.Series'>
a c
one 35 17
two 53 51
three 82 51 <class 'pandas.core.frame.DataFrame'>
# 按照index选择行,只选择一行输出Series,选择多行输出Dataframe
data3 = df.loc['one']
data4 = df.loc[['one','two']]
print(data3,type(data3))
print(data4,type(data4))
============================
a 35
b 35
c 17
d 50
Name: one, dtype: int32 <class 'pandas.core.series.Series'>
a b c d
one 35 35 17 50
two 53 4 51 23 <class 'pandas.core.frame.DataFrame'>
df[ ]默认选择列,[ ]中写列名(所以一般数据colunms都会单独制定,不会用默认数字列名,以免和index冲突)。单选列结果为Series,多选列结果为Dataframe。选取列名不能超出源数据列名,不然报错
data1 = df['a']
data2 = df[['b','c']]
print(data1)
print(data2)
============================
one 35
two 53
three 82
Name: a, dtype: int32
b c
one 35 17
two 4 51
three 12 51
df[]中为数字时,默认选择行,且只能进行切片操作,不能单独选择(df[0]),输出结果为Dataframe,即便只选择一行。df[]不能通过索引标签名来选择行(df[‘one’])
data3 = df[:1]
print(data3,type(data3))
============================
a b c d
one 35 35 17 50 <class 'pandas.core.frame.DataFrame'>
df1 = pd.DataFrame(np.random.randint(100,size = 16).reshape(4,4),
index = ['one','two','three','four'],
columns = ['a','b','c','d'])
print(df1)
============================
a b c d
one 81 16 59 87
two 16 7 66 70
three 18 28 68 59
four 50 87 98 73
df1.loc[['one','three'],['a','b']]
============================
a b
one 81 16
three 18 28
如果索引的标签不存在,结果则会用NaN代替。
df1.loc[['one','two','five'],['a','b','x']]
============================
a b x
one 81.0 16.0 NaN
two 16.0 7.0 NaN
five NaN NaN NaN
df1.loc['one':'three','a':'c']
============================
a b c
one 81 16 59
two 16 7 66
three 18 28 68
与df.loc[ ]用法不同的是,该方法通过行列的位置来定位,从0开始计,左闭右开,在用法思路上和loc方法类似。
df1 = pd.DataFrame(np.random.randint(100,size = 16).reshape(4,4),
index = ['one','two','three','four'],
columns = ['a','b','c','d'])
print(df1)
============================
a b c d
one 81 16 59 87
two 16 7 66 70
three 18 28 68 59
four 50 87 98 73
df1.iloc[[1,2],[1,2]]
b c
two 69 36
three 35 45
选取索引和列表类似,也可以用负数索引
df1.iloc[:,-1]
============================
one 29
two 72
three 37
four 99
Name: d, dtype: int32
df1.iloc[1:2,1:2]
============================
b c
two 69 36
three 35 45
print(df1.iloc[::2])
============================
a b c d
one 40 19 37 29
three 74 35 45 37
布尔型索引的用法与Series结构中的用法类似。示例:
df = pd.DataFrame(np.random.randint(100,size = 16).reshape(4,4),
index = ['one','two','three','four'],
columns = ['a','b','c','d'])
print(df)
============================
a b c d
one 3 94 79 46
two 43 46 79 60
three 54 56 77 24
four 85 24 59 73
df > 50
==============================
a b c d
one False True True False
two False False True True
three True True True False
four True False True True
df[df > 50]
==============================
a b c d
one NaN 94.0 79 NaN
two NaN NaN 79 60.0
three 54.0 56.0 77 NaN
four 85.0 NaN 59 73.0
可以看出,对整个DataFrame采用布尔索引操作时,如果判断为True则返回原数据,False返回值为NaN。
也可以对具体的行列采用布尔索引,示例如下:
df[df['a'] > 50]
==============================
a b c d
three 54 56 77 24
four 85 24 59 73
df.loc[['one','three']] > 50
==============================
a b c d
one False True True False
three True True True False
df[df.loc[['one','three']] > 50]
==============================
a b c d
one NaN 94.0 79.0 NaN
two NaN NaN NaN NaN
three 54.0 56.0 77.0 NaN
four NaN NaN NaN NaN
创建Dataframe(4 * 4,值为0-100的随机数),通过索引得到部分值
① 索引得到b,c列的所有值
② 索引得到第三第四行的数据
③ 按顺序索引得到two,one行的值
④ 索引得到大于50的值
data = np.random.randint(100, size = 16).reshape((4,4))
inx = ['one','two','three','four']
col = list('abcd')
df = pd.DataFrame(data,index=inx,columns=col)
print(df)
print('-'*10)
print(df[['b','c']])
print('-'*10)
print(df.iloc[2:4])
print('-'*10)
print(df.loc[['two','one']])
print('-'*10)
b = df > 50
print(df[b])
==============================
a b c d
one 20 23 74 94
two 39 32 7 39
three 84 6 32 75
four 53 47 46 25
----------
b c
one 23 74
two 32 7
three 6 32
four 47 46
----------
a b c d
three 84 6 32 75
four 53 47 46 25
----------
a b c d
two 39 32 7 39
one 20 23 74 94
----------
a b c d
one NaN NaN 74.0 94.0
two NaN NaN NaN NaN
three 84.0 NaN NaN 75.0
four 53.0 NaN NaN NaN