python数据清洗工具、方法、过程整理归纳(二、数据清洗之文件读写——读取csv、Excel和MySQL数据)

文章目录

  • 3 文件读写
    • 3.1 CSV文件读写
    • 3.2 Excel的读写
    • 3.3 MySQL数据库交互

3 文件读写

3.1 CSV文件读写

  • pandas内置了10多种数据源读取函数,常见的就是CSV和EXCEL
  • 使用read_csv方式读取,结果为dataframe格式
  • 在读取csv文件时,文件名称尽量是英文
  • 参数较多,可以自行控制,但很多时候用默认参数
  • 读取csv时,注意编码,常用编码为utf-8、gbk、gbk2312和gb18030等
  • 使用to_csv方法快速保存
import numpy as np

import pandas as pd

import os#用于更改文件路径

os.getcwd()#当前文件路径

'D:\\code\\jupyter\\course'

os.chdir('D:\\code\\jupyter\\course\\代码和数据')#更改文件存放路径

baby = pd.read_csv('sam_tianchi_mum_baby.csv',encoding = 'utf-8')#默认是utf-8。read_csv会把数据的第一行当做表头即‘列索引’,行索引默认从0开始

baby.head(5)

	user_id 	birthday 	gender
0 	2757 	20130311 	1
1 	415971 	20121111 	0
2 	1372572 	20120130 	1
3 	10339332 	20110910 	0
4 	10642245 	20130213 	0

 order = pd.read_csv('meal_order_info.csv',encoding = 'gbk')#utf-8报错,可以尝试gbk.

order.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 945 entries, 0 to 944
Data columns (total 21 columns):
info_id               945 non-null int64
emp_id                945 non-null int64
number_consumers      945 non-null int64
mode                  0 non-null float64
dining_table_id       945 non-null int64
dining_table_name     945 non-null int64
expenditure           945 non-null int64
dishes_count          945 non-null int64
accounts_payable      945 non-null int64
use_start_time        945 non-null object
check_closed          0 non-null float64
lock_time             936 non-null object
cashier_id            0 non-null float64
pc_id                 0 non-null float64
order_number          0 non-null float64
org_id                945 non-null int64
print_doc_bill_num    0 non-null float64
lock_table_info       0 non-null float64
order_status          945 non-null int64
phone                 945 non-null int64
name                  945 non-null object
dtypes: float64(7), int64(11), object(3)
memory usage: 155.1+ KB

 order = pd.read_csv('meal_order_info.csv',encoding = 'gbk',dtype = {'info_id':str, 'emp_id':str})#希望把数值型读成字符串类型

order.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 945 entries, 0 to 944
Data columns (total 21 columns):
info_id               945 non-null object
emp_id                945 non-null object
number_consumers      945 non-null int64
mode                  0 non-null float64
dining_table_id       945 non-null int64
dining_table_name     945 non-null int64
expenditure           945 non-null int64
dishes_count          945 non-null int64
accounts_payable      945 non-null int64
use_start_time        945 non-null object
check_closed          0 non-null float64
lock_time             936 non-null object
cashier_id            0 non-null float64
pc_id                 0 non-null float64
order_number          0 non-null float64
org_id                945 non-null int64
print_doc_bill_num    0 non-null float64
lock_table_info       0 non-null float64
order_status          945 non-null int64
phone                 945 non-null int64
name                  945 non-null object
dtypes: float64(7), int64(9), object(5)
memory usage: 155.1+ KB

order.head(10)

	info_id 	emp_id 	number_consumers 	mode 	dining_table_id 	dining_table_name 	expenditure 	dishes_count 	accounts_payable 	use_start_time 	... 	lock_time 	cashier_id 	pc_id 	order_number 	org_id 	print_doc_bill_num 	lock_table_info 	order_status 	phone 	name
0 	417 	1442 	4 	NaN 	1501 	1022 	165 	5 	165 	2016/8/1 11:05:36 	... 	2016/8/1 11:11:46 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880641 	苗宇怡
1 	301 	1095 	3 	NaN 	1430 	1031 	321 	6 	321 	2016/8/1 11:15:57 	... 	2016/8/1 11:31:55 	NaN 	NaN 	NaN 	328 	NaN 	NaN 	1 	18688880174 	赵颖
2 	413 	1147 	6 	NaN 	1488 	1009 	854 	15 	854 	2016/8/1 12:42:52 	... 	2016/8/1 12:54:37 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880276 	徐毅凡
3 	415 	1166 	4 	NaN 	1502 	1023 	466 	10 	466 	2016/8/1 12:51:38 	... 	2016/8/1 13:08:20 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880231 	张大鹏
4 	392 	1094 	10 	NaN 	1499 	1020 	704 	24 	704 	2016/8/1 12:58:44 	... 	2016/8/1 13:07:16 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880173 	孙熙凯
5 	381 	1243 	4 	NaN 	1487 	1008 	239 	7 	239 	2016/8/1 13:15:42 	... 	2016/8/1 13:23:42 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880441 	沈晓雯
6 	429 	1452 	4 	NaN 	1501 	1022 	699 	15 	699 	2016/8/1 13:17:37 	... 	2016/8/1 13:34:18 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880651 	苗泽坤
7 	433 	1109 	8 	NaN 	1490 	1011 	511 	14 	511 	2016/8/1 13:38:27 	... 	2016/8/1 13:50:16 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880212 	李达明
8 	569 	1143 	6 	NaN 	1488 	1009 	326 	9 	326 	2016/8/1 17:06:20 	... 	2016/8/1 17:18:20 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880272 	陈有浩
9 	655 	1268 	8 	NaN 	1492 	1013 	263 	10 	263 	2016/8/1 17:32:27 	... 	2016/8/1 17:44:27 	NaN 	NaN 	NaN 	330 	NaN 	NaN 	1 	18688880466 	沈丹丹

10 rows × 21 columns

baby = pd.read_csv('baby_trade_history.csv',nrows = 100)#读取前100行

baby

	user_id 	auction_id 	cat_id 	cat1 	property 	buy_mount 	day
0 	786295544 	41098319944 	50014866 	50022520 	21458:86755362;13023209:3593274;10984217:21985... 	2 	20140919
1 	532110457 	17916191097 	50011993 	28 	21458:11399317;1628862:3251296;21475:137325;16... 	1 	20131011
2 	249013725 	21896936223 	50012461 	50014815 	21458:30992;1628665:92012;1628665:3233938;1628... 	1 	20131011
3 	917056007 	12515996043 	50018831 	50014815 	21458:15841995;21956:3494076;27000458:59723383... 	2 	20141023
4 	444069173 	20487688075 	50013636 	50008168 	21458:30992;13658074:3323064;1628665:3233941;1... 	1 	20141103
5 	152298847 	41840167463 	121394024 	50008168 	21458:3408353;13023209:727117752;22009:2741771... 	1 	20141103
6 	513441334 	19909384116 	50010557 	50008168 	25935:21991;1628665:29784;22019:34731;22019:20... 	1 	20121212
7 	297411659 	13540124907 	50010542 	50008168 	21458:60020529;25935:31381;1633959:27247291;16... 	1 	20121212
8 	82830661 	19948600790 	50013874 	28 	21458:11580;21475:137325 	1 	20121101
9 	475046636 	10368360710 	203527 	28 	22724:40168;22729:40278;21458:21817;2770200:24... 	1 	20121101
10 	734147966 	15307958346 	50018202 	38 	21458:3270827;7361532:28710594;7397093:7536994... 	2 	20121101
11 	68547330 	21162876126 	50012365 	122650008 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20121123
12 	697081418 	15898050723 	50013636 	50008168 	21458:19726868;1633959:179425852;13836282:1290... 	1 	20121123
13 	377550424 	15771663914 	50015841 	28 	1628665:3233941;1628665:3233942;3914866:11580;... 	1 	20121123
14 	88313935 	22532727492 	50013711 	50008168 	1628665:3233941;1628665:3233942;22019:3340598;... 	1 	20131005
15 	25918750 	16078389250 	50012359 	122650008 	21458:3405407;1633959:6186201;1628366:32799;81... 	1 	20131005
16 	350288528 	35086271572 	50010544 	50008168 	21458:61813;25935:21991;1628665:3233938;162866... 	1 	20131129
17 	348090113 	17436967558 	50009540 	50014815 	21458:21910;3110425:30696849;2191928:75373546;... 	1 	20131129
18 	1635282280 	36153356431 	50013207 	50008168 	1628665:29784;1628665:29799;2904342:31004;2201... 	1 	20131129
19 	530850018 	22058239899 	50024147 	28 	21458:205007542;43307470:5543413;2339128:62147... 	1 	20140210
20 	749507708 	19171641742 	50018860 	28 	21458:3602856;1628665:3233941;1628665:3233942;... 	1 	20140210
21 	201088567 	38564176352 	50013207 	50008168 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20140502
22 	469517728 	8232924597 	211122 	38 	21458:21782;36786:42781029;13023102:6999219;22... 	6 	20140502
23 	691367866 	17712372914 	121434042 	50014815 	21458:49341152;8021059:5525523;6851452:1398669... 	1 	20140804
24 	77193822 	35537441586 	50006520 	50014815 	22277:6262384;21458:30992;1628665:3233941;1628... 	2 	20140804
25 	605678021 	15502618744 	50010555 	50008168 	25935:31381;1628665:3233941;1628665:3233942;16... 	1 	20130226
26 	47702620 	26481508332 	121412034 	50014815 	21458:49341152;11057903:4036007;130475532:7537... 	1 	20140918
27 	763560371 	40945285800 	50012365 	122650008 	21458:30992;1628665:3233939;22007:30338;22007:... 	1 	20150201
28 	408028533 	35838498718 	50012442 	50008168 	21458:3596449;6811831:3446999;13023209:3446999... 	1 	20141009
29 	53566371 	27177784760 	121394024 	50008168 	21458:42090508;1628665:3233941;1628665:3233942... 	1 	20141009
... 	... 	... 	... 	... 	... 	... 	...
70 	113473924 	15486726090 	50014250 	28 	21458:30015090;1633959:43047819;1627584:28619;... 	1 	20120905
71 	117887031 	10956228163 	50012451 	50008168 	1628665:3233942;1628665:3233938;1628665:133527... 	1 	20120905
72 	468447138 	15550398428 	50012442 	50008168 	1628665:3233936;1628665:29782;1627349:11462;16... 	1 	20120905
73 	348660284 	10896577394 	50014250 	28 	1628665:29796;1628665:108579;1627584:11580;116... 	1 	20130525
74 	129642523 	23703880889 	50012364 	122650008 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20130525
75 	1708761610 	18560026026 	50016030 	50008168 	21458:30992;1628665:3233941;25935:31381;22019:... 	1 	20130525
76 	908702885 	15515470575 	50023591 	50022520 	21458:26309047;1633959:39224289;11697064:31617... 	1 	20130312
77 	151915451 	17305821144 	211122 	38 	21458:21782;36786:42781029;6933553:3313169;130... 	2 	20140104
78 	745002413 	36815797313 	50023645 	28 	1628665:82340;21475:11488282;21458:56610575;49... 	1 	20140104
79 	1046234868 	10799142007 	50023591 	50022520 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20121109
80 	810362779 	16933071954 	50010545 	50008168 	21458:57430303;1633959:2477;23150:45030;25935:... 	1 	20121109
81 	119784861 	20796936076 	50140021 	50008168 	21458:120325094;22019:2026;22019:34731;22019:3... 	1 	20121129
82 	277184180 	17734463967 	50010555 	50008168 	21458:3482405;8697758:26247633;1633959:3336498... 	1 	20121129
83 	648623529 	16590447919 	50010555 	50008168 	25935:31381;1628665:3233938;1628665:82340;1628... 	1 	20121129
84 	1085938456 	39009925227 	50013207 	50008168 	21458:21599;1628665:3233941;22121:30905;122217... 	1 	20140827
85 	2214390386 	40856437695 	50013636 	50008168 	21458:216724052;36628385:3480253;1628665:32339... 	1 	20140827
86 	346816172 	37132432638 	50013636 	50008168 	21458:216291676;13023209:3583497;35044286:1242... 	1 	20140827
87 	654037597 	13775864723 	50011993 	28 	21458:116116655;1633959:3276615;1628862:50276;... 	1 	20130513
88 	1667892062 	16767168507 	50158020 	50008168 	1628665:131622;13395135:21671;22019:3340598;22... 	1 	20130513
89 	277279277 	18024521052 	211122 	38 	21458:33516;33480:3238774;2653417:7353464;3359... 	12 	20130513
90 	1721792494 	36154660054 	50008845 	28 	21458:3400531;5653832:7049425;13023209:7049425... 	1 	20140312
91 	56549058 	26930668292 	50003700 	28 	21458:3351431;123273479:31526;1628665:3233941;... 	1 	20140312
92 	696527486 	37269469522 	50011993 	28 	21458:118564374;13023209:547499553;122218042:3... 	1 	20140718
93 	643153890 	17954181229 	50003700 	28 	123273479:41376163;21475:135183931;1628665:323... 	1 	20140718
94 	362976947 	39676108316 	50012375 	50022520 	1628665:29796;1628665:29799;21967:29774;21967:... 	1 	20140718
95 	1097191176 	39095838474 	50015841 	28 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20150203
96 	1107237181 	18979330679 	121382039 	50014815 	21458:3371752;13023209:282182273;21475:3779036... 	1 	20150203
97 	1090130969 	38473204110 	50012364 	122650008 	21458:30992;1628665:29778;1628665:29799;22007:... 	1 	20140604
98 	373997473 	24898348642 	50012442 	50008168 	1628665:29778;22009:29800;122217965:3227750;12... 	1 	20140604
99 	59135448 	20494104463 	50012375 	50022520 	21458:7902780;13023209:43969797;2397831:165611... 	1 	20140929

100 rows × 7 columns

pd.set_option('display.max_columns',20)#最多显示20列

pd.set_option('display.max_rows',100)#最多显示100行

baby

	user_id 	auction_id 	cat_id 	cat1 	property 	buy_mount 	day
0 	786295544 	41098319944 	50014866 	50022520 	21458:86755362;13023209:3593274;10984217:21985... 	2 	20140919
1 	532110457 	17916191097 	50011993 	28 	21458:11399317;1628862:3251296;21475:137325;16... 	1 	20131011
2 	249013725 	21896936223 	50012461 	50014815 	21458:30992;1628665:92012;1628665:3233938;1628... 	1 	20131011
3 	917056007 	12515996043 	50018831 	50014815 	21458:15841995;21956:3494076;27000458:59723383... 	2 	20141023
4 	444069173 	20487688075 	50013636 	50008168 	21458:30992;13658074:3323064;1628665:3233941;1... 	1 	20141103
5 	152298847 	41840167463 	121394024 	50008168 	21458:3408353;13023209:727117752;22009:2741771... 	1 	20141103
6 	513441334 	19909384116 	50010557 	50008168 	25935:21991;1628665:29784;22019:34731;22019:20... 	1 	20121212
7 	297411659 	13540124907 	50010542 	50008168 	21458:60020529;25935:31381;1633959:27247291;16... 	1 	20121212
8 	82830661 	19948600790 	50013874 	28 	21458:11580;21475:137325 	1 	20121101
9 	475046636 	10368360710 	203527 	28 	22724:40168;22729:40278;21458:21817;2770200:24... 	1 	20121101
10 	734147966 	15307958346 	50018202 	38 	21458:3270827;7361532:28710594;7397093:7536994... 	2 	20121101
11 	68547330 	21162876126 	50012365 	122650008 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20121123
12 	697081418 	15898050723 	50013636 	50008168 	21458:19726868;1633959:179425852;13836282:1290... 	1 	20121123
13 	377550424 	15771663914 	50015841 	28 	1628665:3233941;1628665:3233942;3914866:11580;... 	1 	20121123
14 	88313935 	22532727492 	50013711 	50008168 	1628665:3233941;1628665:3233942;22019:3340598;... 	1 	20131005
15 	25918750 	16078389250 	50012359 	122650008 	21458:3405407;1633959:6186201;1628366:32799;81... 	1 	20131005
16 	350288528 	35086271572 	50010544 	50008168 	21458:61813;25935:21991;1628665:3233938;162866... 	1 	20131129
17 	348090113 	17436967558 	50009540 	50014815 	21458:21910;3110425:30696849;2191928:75373546;... 	1 	20131129
18 	1635282280 	36153356431 	50013207 	50008168 	1628665:29784;1628665:29799;2904342:31004;2201... 	1 	20131129
19 	530850018 	22058239899 	50024147 	28 	21458:205007542;43307470:5543413;2339128:62147... 	1 	20140210
20 	749507708 	19171641742 	50018860 	28 	21458:3602856;1628665:3233941;1628665:3233942;... 	1 	20140210
21 	201088567 	38564176352 	50013207 	50008168 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20140502
22 	469517728 	8232924597 	211122 	38 	21458:21782;36786:42781029;13023102:6999219;22... 	6 	20140502
23 	691367866 	17712372914 	121434042 	50014815 	21458:49341152;8021059:5525523;6851452:1398669... 	1 	20140804
24 	77193822 	35537441586 	50006520 	50014815 	22277:6262384;21458:30992;1628665:3233941;1628... 	2 	20140804
25 	605678021 	15502618744 	50010555 	50008168 	25935:31381;1628665:3233941;1628665:3233942;16... 	1 	20130226
26 	47702620 	26481508332 	121412034 	50014815 	21458:49341152;11057903:4036007;130475532:7537... 	1 	20140918
27 	763560371 	40945285800 	50012365 	122650008 	21458:30992;1628665:3233939;22007:30338;22007:... 	1 	20150201
28 	408028533 	35838498718 	50012442 	50008168 	21458:3596449;6811831:3446999;13023209:3446999... 	1 	20141009
29 	53566371 	27177784760 	121394024 	50008168 	21458:42090508;1628665:3233941;1628665:3233942... 	1 	20141009
30 	69873877 	40133707057 	50010555 	50008168 	21458:30992;25935:31381;1628665:3233941;162866... 	1 	20141017
31 	1609185254 	42001753405 	121394024 	50008168 	21458:30992;1628665:3233942;1628665:3233936;16... 	1 	20141228
32 	1746148145 	41181827319 	50012365 	122650008 	21458:621749996;13023209:12868;122217803:30916... 	1 	20141228
33 	256475742 	39059292616 	121452056 	50008168 	1628665:29784;1628665:29782;122217801:50793479... 	1 	20140711
34 	405194127 	15462429573 	50007011 	50008168 	21458:35624651;1633959:7320293;1628665:3233941... 	1 	20120819
35 	938309370 	14149079479 	50023669 	28 	21458:4204704;11820090:105550653;11644036:2861... 	1 	20120819
36 	84258337 	14653740604 	50016704 	50022520 	21458:3394654;5261331:4377028;1633959:4377028;... 	1 	20120819
37 	14466144 	17610665576 	50011993 	28 	21458:104000;21475:137325 	1 	20130327
38 	177724549 	14228645401 	50018824 	38 	21475:108284;6933666:96059;33595:16453265;2145... 	1 	20130327
39 	727823869 	39674261411 	121466023 	50008168 	21458:14332755;1628665:3233941;1628665:3233942... 	2 	20140813
40 	659020106 	40484992676 	50011993 	28 	21458:16162126;13023209:10551667;122218042:605... 	1 	20140813
41 	46277938 	40070019945 	50006602 	50008168 	21458:29563;10984217:21985;13023209:3488197;21... 	1 	20140813
42 	827091396 	18678458676 	50010566 	50008168 	21458:46906;13023209:158751187;25935:21991;320... 	1 	20140911
43 	18100946 	38451267766 	121540027 	28 	21458:215485914;125501489:19689726;11945782:78... 	1 	20140911
44 	725813399 	40519533209 	50010544 	50008168 	21458:32270;13023209:669513679;25935:21991;162... 	1 	20140911
45 	1054852159 	19063296909 	50006235 	50008168 	1628665:3233941;21475:17106236;21475:17106365;... 	2 	20140703
46 	262519726 	19051046285 	121398041 	28 	11666049:40203;21458:3961150;17472269:13302841... 	1 	20140703
47 	87207277 	14234909614 	121470030 	50014815 	21458:30992;1628665:3233941;1628665:3233942;16... 	1 	20140703
48 	1053602675 	20252281923 	50013636 	50008168 	21458:216724052;1628665:29798;1628665:29796;25... 	1 	20140220
49 	103125167 	18426669796 	50018438 	50014815 	21458:46896;1628665:3233941;1628665:3233942;21... 	16 	20140220
50 	886492677 	19668429343 	50016704 	50022520 	21458:3662539;5261385:3351834;13023209:3351834... 	1 	20140628
51 	115566151 	14778919435 	50013187 	28 	1628665:3233938;1628665:29796;1628665:133527;1... 	1 	20140113
52 	55544814 	4917672059 	50015727 	50014815 	21458:4540492;1633959:58840623;7107736:3227806... 	4 	20131106
53 	1714403831 	22443564698 	50014129 	28 	21458:57737100;12102318:7282254;11945782:78135... 	1 	20131106
54 	723975586 	8096949165 	50023591 	50022520 	1628665:3233941;1628665:29798;1628665:3233938;... 	1 	20120911
55 	66451440 	9258781845 	50013636 	50008168 	21458:11580;1628665:3233941;1628665:3233936;16... 	1 	20120911
56 	47342027 	14066344263 	50013636 	50008168 	21458:21599;13585028:3416646;1628665:3233942;1... 	1 	20120911
57 	354780072 	17851314047 	50016704 	50022520 	21458:3394654;5261331:237777686;1633959:237777... 	1 	20130725
58 	1660751516 	12496195786 	50024842 	50008168 	1628665:3233941;25935:21991;13545112:43704;135... 	2 	20130725
59 	1981826945 	40793811285 	50010538 	50008168 	21458:37946447;13023209:696649694;25935:21990;... 	1 	20150108
60 	61003275 	36738992094 	50018831 	50014815 	21458:21899;7255169:61035386;7368343:7327107;1... 	3 	20150108
61 	848482116 	42178787281 	50010538 	50008168 	21458:31340;13023209:25581424;25935:21991;1628... 	1 	20150119
62 	405014302 	43130926446 	50012777 	50014815 	21458:46850;1628665:3233939;1628665:92012;1628... 	1 	20150119
63 	806635728 	38985185626 	121452056 	50008168 	21458:9398440;1628665:29784;122217801:3265977;... 	1 	20140615
64 	1970876909 	20197969079 	211122 	38 	6940834:29865;21458:3270820;1629375:3253542;32... 	1 	20141017
65 	605724983 	19747694834 	50006520 	50014815 	21458:30992 	12 	20141017
66 	2148300507 	41694440222 	50010549 	50008168 	115931637:36783070;25935:21990;1628665:3233941... 	1 	20141112
67 	818595619 	36424612559 	50013636 	50008168 	21458:99466824;13023209:3334185;120198214:3334... 	1 	20141112
68 	442760655 	36611607467 	50016704 	50022520 	1628665:3233941;1628665:29790;1628665:3233936;... 	1 	20141228
69 	1026379511 	19281156237 	50012375 	50022520 	21458:3731805;1633959:14267607;2397831:1656121... 	1 	20120905
70 	113473924 	15486726090 	50014250 	28 	21458:30015090;1633959:43047819;1627584:28619;... 	1 	20120905
71 	117887031 	10956228163 	50012451 	50008168 	1628665:3233942;1628665:3233938;1628665:133527... 	1 	20120905
72 	468447138 	15550398428 	50012442 	50008168 	1628665:3233936;1628665:29782;1627349:11462;16... 	1 	20120905
73 	348660284 	10896577394 	50014250 	28 	1628665:29796;1628665:108579;1627584:11580;116... 	1 	20130525
74 	129642523 	23703880889 	50012364 	122650008 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20130525
75 	1708761610 	18560026026 	50016030 	50008168 	21458:30992;1628665:3233941;25935:31381;22019:... 	1 	20130525
76 	908702885 	15515470575 	50023591 	50022520 	21458:26309047;1633959:39224289;11697064:31617... 	1 	20130312
77 	151915451 	17305821144 	211122 	38 	21458:21782;36786:42781029;6933553:3313169;130... 	2 	20140104
78 	745002413 	36815797313 	50023645 	28 	1628665:82340;21475:11488282;21458:56610575;49... 	1 	20140104
79 	1046234868 	10799142007 	50023591 	50022520 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20121109
80 	810362779 	16933071954 	50010545 	50008168 	21458:57430303;1633959:2477;23150:45030;25935:... 	1 	20121109
81 	119784861 	20796936076 	50140021 	50008168 	21458:120325094;22019:2026;22019:34731;22019:3... 	1 	20121129
82 	277184180 	17734463967 	50010555 	50008168 	21458:3482405;8697758:26247633;1633959:3336498... 	1 	20121129
83 	648623529 	16590447919 	50010555 	50008168 	25935:31381;1628665:3233938;1628665:82340;1628... 	1 	20121129
84 	1085938456 	39009925227 	50013207 	50008168 	21458:21599;1628665:3233941;22121:30905;122217... 	1 	20140827
85 	2214390386 	40856437695 	50013636 	50008168 	21458:216724052;36628385:3480253;1628665:32339... 	1 	20140827
86 	346816172 	37132432638 	50013636 	50008168 	21458:216291676;13023209:3583497;35044286:1242... 	1 	20140827
87 	654037597 	13775864723 	50011993 	28 	21458:116116655;1633959:3276615;1628862:50276;... 	1 	20130513
88 	1667892062 	16767168507 	50158020 	50008168 	1628665:131622;13395135:21671;22019:3340598;22... 	1 	20130513
89 	277279277 	18024521052 	211122 	38 	21458:33516;33480:3238774;2653417:7353464;3359... 	12 	20130513
90 	1721792494 	36154660054 	50008845 	28 	21458:3400531;5653832:7049425;13023209:7049425... 	1 	20140312
91 	56549058 	26930668292 	50003700 	28 	21458:3351431;123273479:31526;1628665:3233941;... 	1 	20140312
92 	696527486 	37269469522 	50011993 	28 	21458:118564374;13023209:547499553;122218042:3... 	1 	20140718
93 	643153890 	17954181229 	50003700 	28 	123273479:41376163;21475:135183931;1628665:323... 	1 	20140718
94 	362976947 	39676108316 	50012375 	50022520 	1628665:29796;1628665:29799;21967:29774;21967:... 	1 	20140718
95 	1097191176 	39095838474 	50015841 	28 	1628665:3233941;1628665:3233942;1628665:323393... 	1 	20150203
96 	1107237181 	18979330679 	121382039 	50014815 	21458:3371752;13023209:282182273;21475:3779036... 	1 	20150203
97 	1090130969 	38473204110 	50012364 	122650008 	21458:30992;1628665:29778;1628665:29799;22007:... 	1 	20140604
98 	373997473 	24898348642 	50012442 	50008168 	1628665:29778;22009:29800;122217965:3227750;12... 	1 	20140604
99 	59135448 	20494104463 	50012375 	50022520 	21458:7902780;13023209:43969797;2397831:165611... 	1 	20140929



baby.to_csv('al.csv',encoding = 'utf-8',index=False)#保存为utf-8格式,下次读取就得用utf-8. 默认为utf-8,可不写。 索引不写入

3.2 Excel的读写

  • 使用read_excel方法读取,结果为dataframe格式
  • 读取Excel文件和csv文件参数大致一样,但要考虑工作表sheet页
  • 参数较多,可自行控制,但很多时候用默认参数
  • 读取Excel时,注意编码,常用编码为utf-8、gbk、gbk2312和gb18030等
  • 使用to_excel快速保存为xlsx格式
df1 = pd.read_excel('meal_order_detail.xlsx',encoding = 'utf-8', sheet_name='meal_order_detail1')

df1.head(10)

	detail_id 	order_id 	dishes_id 	logicprn_name 	parent_class_name 	dishes_name 	itemis_add 	counts 	amounts 	cost 	place_order_time 	discount_amt 	discount_reason 	kick_back 	add_inprice 	add_info 	bar_code 	picture_file 	emp_id
0 	2956 	417 	610062 	NaN 	NaN 	蒜蓉生蚝 	0 	1 	49 	NaN 	2016-08-01 11:05:36 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/104001.jpg 	1442
1 	2958 	417 	609957 	NaN 	NaN 	蒙古烤羊腿\r\n\r\n\r\n 	0 	1 	48 	NaN 	2016-08-01 11:07:07 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/202003.jpg 	1442
2 	2961 	417 	609950 	NaN 	NaN 	大蒜苋菜 	0 	1 	30 	NaN 	2016-08-01 11:07:40 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/303001.jpg 	1442
3 	2966 	417 	610038 	NaN 	NaN 	芝麻烤紫菜 	0 	1 	25 	NaN 	2016-08-01 11:11:11 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/105002.jpg 	1442
4 	2968 	417 	610003 	NaN 	NaN 	蒜香包 	0 	1 	13 	NaN 	2016-08-01 11:11:30 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/503002.jpg 	1442
5 	1899 	301 	610019 	NaN 	NaN 	白斩鸡 	0 	1 	88 	NaN 	2016-08-01 11:15:57 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/204002.jpg 	1095
6 	1902 	301 	609991 	NaN 	NaN 	香烤牛排\r\n 	0 	1 	55 	NaN 	2016-08-01 11:19:12 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/201001.jpg 	1095
7 	1906 	301 	609983 	NaN 	NaN 	干锅田鸡 	0 	1 	88 	NaN 	2016-08-01 11:22:21 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/205003.jpg 	1095
8 	1907 	301 	609981 	NaN 	NaN 	桂圆枸杞鸽子汤 	0 	1 	48 	NaN 	2016-08-01 11:22:53 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/205001.jpg 	1095
9 	1908 	301 	610030 	NaN 	NaN 	番茄有机花菜 	0 	1 	32 	NaN 	2016-08-01 11:23:56 	NaN 	NaN 	NaN 	0 	NaN 	NaN 	caipu/304004.jpg 	1095

df1 = pd.read_excel('meal_order_detail.xlsx',encoding = 'utf-8', sheet_name=0)

import os

os.getcwd()

'D:\\code\\jupyter\\course

df1.to_excel('asdf.xlsx',index = False, sheet_name = 'one')#工作簿名称sheet_name

3.3 MySQL数据库交互

  • 使用sqlalchemy建立连接
  • 需要知道数据库的相关参数,如数据库IP地址、用户名和密码等
  • 通过pandas中read_sql函数读入,读取完以后是dataframe格式
  • 通过dataframe的to_sql方法保存
    python数据清洗工具、方法、过程整理归纳(二、数据清洗之文件读写——读取csv、Excel和MySQL数据)_第1张图片

import pandas as pd

import pymysql

from sqlalchemy import create_engine

conn = create_engine('mysql+pymysql://root:123456@localhost:3306/meeting')

sql = 'select * from employee'

df1 = pd.read_sql(sql,con = conn)

df1.head()

	employeeid 	employeename 	username 	phone 	email 	status 	departmentid 	password 	role
0 	8 	王晓华 	wangxh 	13671075406 	wang@qq.com 	1 	1 	1 	1
1 	9 	林耀坤 	li56 	13671075406 	yang@qq.com 	1 	2 	1 	2
2 	10 	熊杰文 	xiongjw 	134555555 	xiong@qq.com 	1 	3 	1 	2
3 	11 	王敏 	wangmin 	1324554321 	wangm@qq.com 	0 	4 	1 	2
4 	12 	林耀坤 	linyk 	1547896765 	kun@qq.com 	1 	7 	1 	2

def query(table):

    host = 'localhost'

    user = 'root'

    password = '123456'

    database = 'meeting'

    port = 3306

    conn = create_engine('mysql+pymysql://{}:{}@{}:{}/{}'.format(user,password,host,port,database))

    sql = 'select * from '+ table

    result = pd.read_sql(sql,con = conn)

    return result

df2 = query('meetingroom')

df2

	roomid 	roomnum 	roomname 	capacity 	status 	description
0 	5 	101 	第一会议室 	15 	0 	公共会议室
1 	6 	102 	第二会议室 	5 	0 	管理部门会议室
2 	7 	103 	第三会议室 	12 	0 	市场部专用会议室
3 	8 	401 	第四会议室 	15 	0 	公共会议室
4 	9 	201 	第五会议室 	15 	0 	最大会议室
5 	10 	601 	第六会议室 	12 	0 	需要提前三天预定

​

​

import os

os.chdir('D:\code\jupyter\course\代码和数据')

df = pd.read_csv('baby_trade_history.csv')

try:

    df.to_sql('testdf',con = conn, index = False, if_exists='replace')

except:

    print('error')


python数据清洗工具、方法、过程整理归纳(二、数据清洗之文件读写——读取csv、Excel和MySQL数据)_第2张图片欢迎阅读数据清洗系列文章python数据清洗工具、方法、过程整理归纳

  • 一、数据清洗之常用工具——numpy,pandas
  • 二、数据清洗之文件读写——读取csv、Excel和MySQL数据
  • 三、数据清洗之数据表操作——数据筛选、增加删除、查找修改、数据整理和层次化索引
  • 四、数据清洗之数据转换——日期格式数据处理、高阶函数数据处理、字符串数据处理
  • 五、数据清洗之数据统计——数据分组运算、聚合函数使用、分组对象和apply函数、透视图与交叉表
  • 六、数据清洗之数据预处理(一)——重复值处理、缺失值处理
  • 七、数据清洗之数据预处理(二)——异常值处理、数据离散化处理
  • 八、总结

你可能感兴趣的:(#,数据清洗)