keys指定的列将被设置为索引
import pandas as pd
data = pd.DataFrame([['Alice', 'Math', 93], ['Bob', 'Physics', 98], ['Chris', 'Chemistry', 96], ['David', 'Biology', 90]],
columns=['Name', 'Subject', 'Score'])
print(data)
print('\n')
data1 = data.set_index(keys='Name')
print(data1)
将设置为索引的列删除,默认为True
import pandas as pd
data = pd.DataFrame([['Alice', 'Math', 93], ['Bob', 'Physics', 98], ['Chris', 'Chemistry', 96], ['David', 'Biology', 90]],
columns=['Name', 'Subject', 'Score'])
print(data)
print('\n')
data1 = data.set_index(keys='Name')
print(data1)
print('\n')
data2 = data.set_index(keys='Name', drop=False)
print(data2)
原索引是否保留。True为保留,默认为False
import pandas as pd
data = pd.DataFrame([['Alice', 'Math', 93], ['Bob', 'Physics', 98], ['Chris', 'Chemistry', 96], ['David', 'Biology', 90]],
columns=['Name', 'Subject', 'Score'])
print(data)
print('\n')
data1 = data.set_index(keys='Name')
print(data1)
print('\n')
data2 = data.set_index(keys='Name', append=True)
print(data2)
是否在原DataFrame上修改,默认为False
import pandas as pd
data = pd.DataFrame([['Alice', 'Math', 93], ['Bob', 'Physics', 98], ['Chris', 'Chemistry', 96], ['David', 'Biology', 90]],
columns=['Name', 'Subject', 'Score'])
print(data)
print('\n')
data1 = data.set_index(keys='Name')
print(data1)
print('\n')
data2 = data.set_index(keys='Name', inplace=True)
print(data2)
print(data)
是否检查索引有无重复,默认为False,若设置为True会影响程序性能,慎用
import pandas as pd
data = pd.DataFrame([['Alice', 'Math', 93], ['Bob', 'Physics', 98], ['Chris', 'Chemistry', 96], ['Chris', 'Biology', 90]],
columns=['Name', 'Subject', 'Score'])
print(data)
print('\n')
data1 = data.set_index(keys='Name')
print(data1)
print('\n')
data2 = data.set_index(keys='Name', verify_integrity=True)
print(data2)
输出:
注意:虽然输出的DataFrame中Name与其他列的列名不在同一行,好像属于两个不同的level,其实并不是,如果我们再用reset_index将Name取消索引,并尝试将它插入第二级(level=1),我们会发现Name最后和其他列名同属第一级(关于参数的含义请参考reset_index函数详解):
import pandas as pd
data = pd.DataFrame([['Alice', 'Math', 93], ['Bob', 'Physics', 98], ['Chris', 'Chemistry', 96], ['Chris', 'Biology', 90]],
columns=['Name', 'Subject', 'Score'])
print(data)
print('\n')
data1 = data.set_index(keys='Name')
print(data1)
print('\n')
data2 = data1.reset_index(col_level=2)
print(data2)