遍历查询或修改dataframe的元素值,通常使用loc和iloc函数实现定位元素,对比在遍历应用中,loc和iloc的使用方法和区别。
loc函数:通过行索引 “Index” 中的具体值来取行数据(按column名访问,或者通过条件访问)
iloc函数:通过行号来取行数据(按行号和列号访问,不能用列名访问)
注:loc是location的意思,iloc中的i是integer的意思,仅接受整数作为参数。
loc官网说明,主要用于列标签访问
.loc is primarily label based, but may also be used with a boolean
array.
iloc官网说明,主要基于整数位置访问。
.iloc is primarily integer position based (from 0 to length-1 of the
axis), but may also be used with a boolean array. .iloc will raise
IndexError if a requested indexer is out-of-bounds, except slice
indexers which allow out-of-bounds indexing. (this conforms with
Python/NumPy slice semantics). Allowed inputs are:
#日期做索引
df = pd.DataFrame(abs(np.random.randn(10, 4)), index=pd.date_range('1/1/2023', periods=10),
columns=list('ABCD'))
df.index.name='date'
i = 1
#按日期索引遍历
for d in df.index :
#偶数行A列赋值0
if i%2 ==0 :
df.loc[df.index==d,'A'] = 0
i += 1
#索引列值最小值A列赋值100
df.loc[df.index==df.index.min() ,['A']]=100
print(df)
pre=100
#A列如果是0 ,向下赋值
for d in df.index :
if df.loc[d,'A']== 0.0 :
df.loc[d,'A']=pre
pre=df.loc[d,'A']
print(df)
效果如下:
A B C D
date
2023-01-01 100.000000 1.532303 1.667700 0.799870
2023-01-02 0.000000 0.183089 1.239692 1.321370
2023-01-03 0.741102 0.046467 1.132106 0.019921
2023-01-04 0.000000 1.051709 0.236322 1.521744
2023-01-05 0.385593 0.533345 0.762100 0.654683
2023-01-06 0.000000 1.244160 0.433445 1.050108
2023-01-07 0.243023 2.255967 0.165955 0.287973
2023-01-08 0.000000 0.454799 1.382565 0.732341
2023-01-09 0.497839 0.371737 0.366683 0.524772
2023-01-10 0.000000 0.677077 0.542580 1.384272
A B C D
date
2023-01-01 100.000000 1.532303 1.667700 0.799870
2023-01-02 100.000000 0.183089 1.239692 1.321370
2023-01-03 0.741102 0.046467 1.132106 0.019921
2023-01-04 0.741102 1.051709 0.236322 1.521744
2023-01-05 0.385593 0.533345 0.762100 0.654683
2023-01-06 0.385593 1.244160 0.433445 1.050108
2023-01-07 0.243023 2.255967 0.165955 0.287973
2023-01-08 0.243023 0.454799 1.382565 0.732341
2023-01-09 0.497839 0.371737 0.366683 0.524772
2023-01-10 0.497839 0.677077 0.542580 1.384272
df = pd.DataFrame(abs(np.random.randn(10, 4)), columns=list('ABCD'))
print(df)
df.index.name='no'
#默认数字序列索引
for i in df.index :
df.loc[df.index==i,['A','C']] = i
#D列赋值空值
df['D']= np.NaN
#D列偶数行赋值
for i in df.index :
if i%2 :
df.loc[df.index==i,['D']] = i
print(df)
#用fillna函数向前填充
df1 = df['D'].fillna(method='bfill')
print(df1)
效果如下:
A B C D
0 1.355061 1.784947 0.530280 0.343836
1 0.591961 1.587958 0.700280 0.096845
2 0.945876 1.036163 0.903821 0.161356
3 1.144042 1.162818 0.148023 1.971303
4 0.424846 0.960678 0.891586 1.687668
5 0.441317 2.275049 0.168477 0.297483
6 0.791475 0.894168 1.309116 1.826531
7 0.349400 0.878078 1.748874 2.238486
8 0.501033 0.608020 0.346233 2.553355
9 0.795990 1.267664 0.565392 1.510390
A B C D
no
0 0.0 1.784947 0.0 NaN
1 1.0 1.587958 1.0 1.0
2 2.0 1.036163 2.0 NaN
3 3.0 1.162818 3.0 3.0
4 4.0 0.960678 4.0 NaN
5 5.0 2.275049 5.0 5.0
6 6.0 0.894168 6.0 NaN
7 7.0 0.878078 7.0 7.0
8 8.0 0.608020 8.0 NaN
9 9.0 1.267664 9.0 9.0
no
0 1.0
1 1.0
2 3.0
3 3.0
4 5.0
5 5.0
6 7.0
7 7.0
8 9.0
9 9.0
Name: D, dtype: float64
基于日期和默认索引,因为只能使用位置参数,所以索引无差别。
df = pd.DataFrame(abs(np.random.randn(10, 4)), index=pd.date_range('1/1/2023', periods=10),
columns=list('ABCD'))
df.index.name='date'
#按行数循环
for i in range(df.shape[0]) :
#偶数行,2,3列是C、D列
if i%2 == 0 :
df.iloc[[i],[2,3]] = 0
#第一行的CD列赋值100
df.iloc[[0],[2,3]] = 100
#
#print(df.iloc[[0],[3]].values[0][0])
print(df)
for i in range(df.shape[0]) :
# 偶数行赋值C列向下填充
if df.iloc[[i],[2]].values[0][0] == 0.0 :
df.iloc[[i],[2]]=pre_c
pre_c=df.iloc[[i],[2]]
# 偶数行赋值D列向下填充
if df.iloc[[i],[3]].values[0][0] == 0.0 :
df.iloc[[i],[3]]=pre_d
pre_d=df.iloc[[i],[3]]
print(df)
效果如下:
A B C D
date
2023-01-01 1.040645 0.369780 100.000000 100.000000
2023-01-02 1.850851 1.422875 0.066909 1.137934
2023-01-03 0.321779 0.376273 0.000000 0.000000
2023-01-04 0.316248 1.198039 1.707555 0.539617
2023-01-05 0.350327 0.144577 0.000000 0.000000
2023-01-06 0.396593 1.054268 0.791154 0.898749
2023-01-07 0.685409 1.286553 0.000000 0.000000
2023-01-08 0.366570 0.997236 1.534733 0.689972
2023-01-09 0.417907 0.823729 0.000000 0.000000
2023-01-10 1.316604 0.867192 0.514058 0.945503
A B C D
date
2023-01-01 1.040645 0.369780 100.000000 100.000000
2023-01-02 1.850851 1.422875 0.066909 1.137934
2023-01-03 0.321779 0.376273 0.066909 1.137934
2023-01-04 0.316248 1.198039 1.707555 0.539617
2023-01-05 0.350327 0.144577 1.707555 0.539617
2023-01-06 0.396593 1.054268 0.791154 0.898749
2023-01-07 0.685409 1.286553 0.791154 0.898749
2023-01-08 0.366570 0.997236 1.534733 0.689972
2023-01-09 0.417907 0.823729 1.534733 0.689972
2023-01-10 1.316604 0.867192 0.514058 0.945503
注意:
df.iloc[[0],[3]]的数据类型
print(type(df.iloc[[0],[3]]))
print(type(df.iloc[[0],[3]].values[0]))
print(type(df.iloc[[0],[3]].values[0][0]))
分别是dataframe,数组,浮点
对比说明:
print(df.iloc[0:2,1:3])
print(df.loc[0:2,['C','D']])
iloc,行选择0:2,不包括第二行,1:3列,1-3对应BCD列,筛选后不包括D列
loc,行选择0:2,包括第二行。
结果:
B C
no
0 1.141945 0.0
1 1.010452 1.0
C D
no
0 0.0 NaN
1 1.0 1.0
2 2.0 NaN