pandas.read_csv() 参数 names整理

pandas 官方文档

 

names : array-like, default None

用于结果的列名列表,如果数据文件中没有列标题行,就需要执行header=None。默认列表中不能出现重复,除非设定参数mangle_dupe_cols=True。

 

Age Gender Education EducationField MaritalStatus Income OverTime
37 Male 4 Life Sciences Divorced 5993 No
54 Female 4 Life Sciences Divorced 10502 No
34 Male 3 Life Sciences Single 6074 Yes
39 Female 1 Life Sciences Married 12742 No
28 Male 3 Medical Divorced 2596 No
24 Female 1 Medical Married 4162 Yes
29 Male 5 Other Single 3983 No
36 Male 2 Medical Married 7596 No
33 Female 4 Medical Married 2622 No
import pandas as pd  

1.1

data = pd.read_csv('./train.csv',
                   names=['new_0','new_1','new_2','new_3','new_4','new_5','new_6']
                   )

print(data.head(5))

输出结果:

  new_0   new_1      new_2           new_3          new_4   new_5     new_6
0   Age  Gender  Education  EducationField  MaritalStatus  Income  OverTime
1    37    Male          4   Life Sciences       Divorced    5993        No
2    54  Female          4   Life Sciences       Divorced   10502        No
3    34    Male          3   Life Sciences         Single    6074       Yes
4    39  Female          1   Life Sciences        Married   12742        No

1.2

data = pd.read_csv('./train.csv',
                   header=None,
                   names=['new_0','new_1','new_2','new_3','new_4','new_5','new_6']
                   )

print(data.head(5))

输出结果:

  new_0   new_1      new_2           new_3          new_4   new_5     new_6
0   Age  Gender  Education  EducationField  MaritalStatus  Income  OverTime
1    37    Male          4   Life Sciences       Divorced    5993        No
2    54  Female          4   Life Sciences       Divorced   10502        No
3    34    Male          3   Life Sciences         Single    6074       Yes
4    39  Female          1   Life Sciences        Married   12742        No

1.3  header=2,  names=['new_0','new_1','new_2','new_3','new_4','new_5','new_6']

等于header=2,则第2行作为列名,Dataframe 从3行的数据开始,但names定义列名覆盖第2行的列名。

data = pd.read_csv('./train.csv',
                   header=2,
                   names=['new_0','new_1','new_2','new_3','new_4','new_5','new_6']
                   )

print(data.head(5))

输出结果:

   new_0   new_1  new_2          new_3     new_4  new_5 new_6
0     34    Male      3  Life Sciences    Single   6074   Yes
1     39  Female      1  Life Sciences   Married  12742    No
2     28    Male      3        Medical  Divorced   2596    No
3     24  Female      1        Medical   Married   4162   Yes
4     29    Male      5          Other    Single   3983    No

 

 

 

你可能感兴趣的:(python,pandas)