pandas丨数据读取与保存
- 读取excel文件: pandas.read_excel()
- 保存excel文件: pandas.to_excel()
pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None,
squeeze=False, dtype=None, engine=None, converters=None, true_values=None,
false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True,
verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None,
skip_footer=0, skipfooter=0, convert_float=True, mangle_dupe_cols=True, **kwds)
import pandas as pd
help(pd.read_excel)
Help on function read_excel in module pandas.io.excel._base:
read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True)
Read an Excel file into a pandas DataFrame.
Supports `xls`, `xlsx`, `xlsm`, `xlsb`, `odf`, `ods` and `odt` file extensions
read from a local filesystem or URL. Supports an option to read
a single sheet or a list of sheets.
- 常用参数:
- io:excel的路径,选中文件,鼠标右键,在"属性"中找到文件位置,再补充上文件名称,则为完整路径。注意反斜杠方向 ★★★★★
- sheet_name:工作表的名称。当不输入时,默认读取第一个工作表
- 不常用参数
- index_col :指定某一列为索引。index_col=1
- names :列名称,传入list数据
- header: 指定行作为列名,默认为第1行。header=[1,2]多级索引
- usecols: 读取指定列。usecols = [“A”,“B”]
- skiprows: 忽略前几行
pd.read_excel()常用参数
io
要读取文件所在的位置
import pandas as pd
data1 = pd.read_excel('C:/Users/yyz/Desktop/python数据分析基础/data/泰坦尼克数据.xlsx')
data1.head()
|
乘客ID |
是否存活 |
票类 |
姓名 |
性别 |
年龄 |
乘客兄弟姐妹个数 |
乘客父母/孩子的个数 |
票号 |
票价 |
仓位 |
登船港口 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
data2 = pd.read_excel('C:\\Users\\yyz\\Desktop\\python数据分析基础\\data\\泰坦尼克数据.xlsx')
data3 = pd.read_excel(r'C:\Users\yyz\Desktop\python数据分析基础\data\泰坦尼克数据.xlsx')
data3.head()
|
乘客ID |
是否存活 |
票类 |
姓名 |
性别 |
年龄 |
乘客兄弟姐妹个数 |
乘客父母/孩子的个数 |
票号 |
票价 |
仓位 |
登船港口 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
import os
os.chdir('C:/Users/yyz/Desktop/python数据分析基础/data/')
data4 = pd.read_excel('泰坦尼克数据.xlsx')
data4.head()
|
乘客ID |
是否存活 |
票类 |
姓名 |
性别 |
年龄 |
乘客兄弟姐妹个数 |
乘客父母/孩子的个数 |
票号 |
票价 |
仓位 |
登船港口 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
sheet_name
读取的工作表,可以是工作表名称, 也可以是工作表所在的位置,0 表示第1个.
data5 = pd.read_excel('泰坦尼克数据.xlsx',sheet_name='Sheet1')
data5.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
1 |
32 |
1 |
1 |
Spencer, Mrs. William Augustus (Marie Eugenie) |
female |
NaN |
1 |
0 |
PC 17569 |
146.5208 |
B78 |
C |
2 |
53 |
1 |
1 |
Harper, Mrs. Henry Sleeper (Myna Haxtun) |
female |
49.0 |
1 |
0 |
PC 17572 |
76.7292 |
D33 |
C |
3 |
98 |
1 |
1 |
Greenfield, Mr. William Bertram |
male |
23.0 |
0 |
1 |
PC 17759 |
63.3583 |
D10 D12 |
C |
4 |
195 |
1 |
1 |
Brown, Mrs. James Joseph (Margaret Tobin) |
female |
44.0 |
0 |
0 |
PC 17610 |
27.7208 |
B4 |
C |
data6 = pd.read_excel('泰坦尼克数据.xlsx',sheet_name=1)
data6.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
1 |
32 |
1 |
1 |
Spencer, Mrs. William Augustus (Marie Eugenie) |
female |
NaN |
1 |
0 |
PC 17569 |
146.5208 |
B78 |
C |
2 |
53 |
1 |
1 |
Harper, Mrs. Henry Sleeper (Myna Haxtun) |
female |
49.0 |
1 |
0 |
PC 17572 |
76.7292 |
D33 |
C |
3 |
98 |
1 |
1 |
Greenfield, Mr. William Bertram |
male |
23.0 |
0 |
1 |
PC 17759 |
63.3583 |
D10 D12 |
C |
4 |
195 |
1 |
1 |
Brown, Mrs. James Joseph (Margaret Tobin) |
female |
44.0 |
0 |
0 |
PC 17610 |
27.7208 |
B4 |
C |
pd.read_excel()不常用参数
index_col
指定哪一列为索引, 默认不设置
data7 = pd.read_excel('泰坦尼克数据.xlsx',index_col='乘客ID')
data7
|
是否存活 |
票类 |
姓名 |
性别 |
年龄 |
乘客兄弟姐妹个数 |
乘客父母/孩子的个数 |
票号 |
票价 |
仓位 |
登船港口 |
乘客ID |
|
|
|
|
|
|
|
|
|
|
|
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
887 |
0 |
2 |
Montvila, Rev. Juozas |
male |
27.0 |
0 |
0 |
211536 |
13.0000 |
NaN |
S |
888 |
1 |
1 |
Graham, Miss. Margaret Edith |
female |
19.0 |
0 |
0 |
112053 |
30.0000 |
B42 |
S |
889 |
0 |
3 |
Johnston, Miss. Catherine Helen "Carrie" |
female |
NaN |
1 |
2 |
W./C. 6607 |
23.4500 |
NaN |
S |
890 |
1 |
1 |
Behr, Mr. Karl Howell |
male |
26.0 |
0 |
0 |
111369 |
30.0000 |
C148 |
C |
891 |
0 |
3 |
Dooley, Mr. Patrick |
male |
32.0 |
0 |
0 |
370376 |
7.7500 |
NaN |
Q |
891 rows × 11 columns
names
指定列名
data8 = pd.read_excel('泰坦尼克数据.xlsx',
names=['变量1','变量2','变量3','变量4','变量5','变量6','变量7','变量8','变量9','变量10','变量11','变量12'])
data8.head()
|
变量1 |
变量2 |
变量3 |
变量4 |
变量5 |
变量6 |
变量7 |
变量8 |
变量9 |
变量10 |
变量11 |
变量12 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
data8.columns = ['乘客ID', '是否存活', '票类', '姓名', '性别', '年龄', '乘客兄弟姐妹个数',
'乘客父母/孩子的个数', '票号','票价', '仓位', '登船港口']
data8.head()
|
乘客ID |
是否存活 |
票类 |
姓名 |
性别 |
年龄 |
乘客兄弟姐妹个数 |
乘客父母/孩子的个数 |
票号 |
票价 |
仓位 |
登船港口 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
usecols
选择要读取的列
data7 = pd.read_excel('泰坦尼克数据.xlsx',usecols=['姓名','性别','年龄'])
data7.head()
|
姓名 |
性别 |
年龄 |
0 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
2 |
Heikkinen, Miss. Laina |
female |
26.0 |
3 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
4 |
Allen, Mr. William Henry |
male |
35.0 |
data8 = pd.read_excel('泰坦尼克数据.xlsx')[['姓名','性别','年龄']]
data8.head()
|
姓名 |
性别 |
年龄 |
0 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
2 |
Heikkinen, Miss. Laina |
female |
26.0 |
3 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
4 |
Allen, Mr. William Henry |
male |
35.0 |
header
设置列名所在的行
data9 = pd.read_excel('泰坦尼克数据.xlsx',header=1)
data9.head()
|
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22 |
1.1 |
0.1 |
A/5 21171 |
7.25 |
Unnamed: 10 |
S |
0 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
1 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
2 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
3 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
4 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
data10 = pd.read_excel('泰坦尼克数据.xlsx',header=None)
data10.head()
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
0 |
乘客ID |
是否存活 |
票类 |
姓名 |
性别 |
年龄 |
乘客兄弟姐妹个数 |
乘客父母/孩子的个数 |
票号 |
票价 |
仓位 |
登船港口 |
1 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22 |
1 |
0 |
A/5 21171 |
7.25 |
NaN |
S |
2 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
3 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26 |
0 |
0 |
STON/O2. 3101282 |
7.925 |
NaN |
S |
4 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35 |
1 |
0 |
113803 |
53.1 |
C123 |
S |
skiprows
忽略前几行:当前几行为空行或者其他不需要读取的数据时使用
data11 = pd.read_excel(r'C:\Users\yyz\Desktop\python数据分析基础\data\泰坦尼克数据.xlsx',
skiprows=1)
data11.head()
|
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22 |
1.1 |
0.1 |
A/5 21171 |
7.25 |
Unnamed: 10 |
S |
0 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
1 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
2 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
3 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
4 |
6 |
0 |
3 |
Moran, Mr. James |
male |
NaN |
0 |
0 |
330877 |
8.4583 |
NaN |
Q |
pd.to_excel()常用参数
import pandas as pd
data1 = pd.read_excel('C:/Users/yyz/Desktop/python数据分析基础/data/泰坦尼克数据.xlsx')
data1.head()
|
乘客ID |
是否存活 |
票类 |
姓名 |
性别 |
年龄 |
乘客兄弟姐妹个数 |
乘客父母/孩子的个数 |
票号 |
票价 |
仓位 |
登船港口 |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
data1.info()
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 乘客ID 891 non-null int64
1 是否存活 891 non-null int64
2 票类 891 non-null int64
3 姓名 891 non-null object
4 性别 891 non-null object
5 年龄 714 non-null float64
6 乘客兄弟姐妹个数 891 non-null int64
7 乘客父母/孩子的个数 891 non-null int64
8 票号 891 non-null object
9 票价 891 non-null float64
10 仓位 204 non-null object
11 登船港口 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
result1 = data1.pivot_table('姓名',
index='性别',
columns='登船港口',
aggfunc='count',
margins=True)
result2 = data1.pivot_table('姓名',
index='性别',
columns='票类',
aggfunc='count',
margins=True)
result1.head()
登船港口 |
C |
Q |
S |
All |
性别 |
|
|
|
|
female |
73 |
36 |
203 |
312 |
male |
95 |
41 |
441 |
577 |
All |
168 |
77 |
644 |
889 |
result2
票类 |
1 |
2 |
3 |
All |
性别 |
|
|
|
|
female |
94 |
76 |
144 |
314 |
male |
122 |
108 |
347 |
577 |
All |
216 |
184 |
491 |
891 |
result1.to_excel('C:/Users/yyz/Desktop/保存数据1.xlsx')
result2.to_excel('C:/Users/yyz/Desktop/保存数据2.xlsx',index=False)
result2.to_excel('C:/Users/yyz/Desktop/保存数据3.xlsx',sheet_name='汇总')
with pd.ExcelWriter('C:/Users/yyz/Desktop/保存数据4.xlsx') as writer:
result1.to_excel(writer,sheet_name='第1个表')
result2.to_excel(writer,sheet_name='第2个表')
pandas丨读取csv、txt文件
- 当数据量比较大时, 一般会存储为csv或者txt格式文件;
- 读取方法:pandas.read_csv(), 括号内参数如下:
pandas.read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]], sep=’,’, delimiter=None,
header=‘infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None,
mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None,
false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None,
na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True,
parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None,
dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression=‘infer’,
thousands=None, decimal=b’.’, lineterminator=None, quotechar=’"’, quoting=0, doublequote=True,
escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True,
warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False,
float_precision=None)
- 常用参数:
- filepath_or_buffer :文件路径 ,和读取excel中io参数一样
- sep :分隔符,默认逗号,其他特殊符号: ※
- 回车: \r,
- 换行: \n,
- 制表符: \t,
- 空白字符: \s
- 多个空白字符: \s+
- encoding :一般utf-8 或者 gbk
- 其他参数和pd.read_excel()参数类似
import pandas as pd
help(pd.read_csv)
Help on function read_csv in module pandas.io.parsers:
read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~AnyStr]], sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal: str = '.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
data1 = pd.read_csv('C:/Users/yyz/Desktop/python数据分析基础/data/titanic_train.csv')
data1.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
data2 = pd.read_csv('C:/Users/yyz/Desktop/python数据分析基础/data/titanic_train.txt',sep='\t')
data2.head()
|
PassengerId |
Survived |
Pclass |
Name |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
0 |
1 |
0 |
3 |
Braund, Mr. Owen Harris |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
1 |
2 |
1 |
1 |
Cumings, Mrs. John Bradley (Florence Briggs Th... |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
2 |
3 |
1 |
3 |
Heikkinen, Miss. Laina |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
3 |
4 |
1 |
1 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
4 |
5 |
0 |
3 |
Allen, Mr. William Henry |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
data2.sample(10).to_csv('C:/Users/yyz/Desktop/导出txt文件.txt',sep=':')
提升代码速度小技巧
- 微软拼音:设置→常规→开启“中文输入时使用英文标点”
- 搜狗拼音:设置→常用→开启“中文时使用英文标点”
扫码关注微信, 赠送《pandas数据读取与清洗》视频及课程代码!