基本要素
前天好不容易成功给用户把数据全库导出,今天用户又告知导出的数据无法导入,首先就问用户有什么错误提示,给我的回答是就一个‘作业"SYSTEM"."SYS_IMPORT_FULL_03" 因致命错误于 xxxx elapsed 0 03:01:06 停止’,其他什么提示都没有,信息量太少了,这个处理起来还挺麻烦的。
问题分析
步骤一:首先还是添加跟踪信息
还是得靠自己,还好咱们会点跟踪技巧,具体方法见我之前的帖How to Diagnose Oracle Data Pump-如何给数据泵添加诊断信息,来吧,看下跟踪后的结果orcl_dm00_10856.trc,错误提示的位置如下:
KUPM:13:38:03.378: Log message received from worker DG,KUPC$C_1_20141208103757,KUPC$A_1_103800203000000,MCP,510,Y
KUPM:13:38:03.378: . . 导入了 "ZLHIS"."体检套餐计价" 576.4 KB 20817 行
KUPM:13:38:03.378: ****OUT DISPATCH, request type=3031, response type =2041
*** 2014-12-08 13:38:04.392
KUPM:13:38:04.392: Client count is: 1
KUPM:13:38:04.392: In check_workers...
KUPM:13:38:04.392: Live worker count is: 1
KUPM:13:38:04.392: In set_longops
KUPM:13:38:04.392: Work so far is: 107805.74195098876953125
KUPM:13:38:04.392: Checking for resumable waits
KUPM:13:39:05.435: Client count is: 1
*** 2014-12-08 13:39:05.435
KUPM:13:39:05.435: In check_workers...
KUPM:13:39:05.435: Live worker count is: 0
KUPM:13:39:05.435: worker id is:
KUPM:13:39:05.435: Worker error is: 0
KUPM:13:39:05.435: Exited main loop...
KUPM:13:39:05.435: Returned to MAIN
KUPV:13:39:05.435: Update request for job: SYSTEM.SYS_IMPORT_FULL_03, func: 1
KUPM:13:39:05.435: Entered state: STOPPING
KUPM:13:39:05.435: keeping master because job is restartable
KUPM:13:39:05.435: Final job_info_flags = 1
KUPM:13:39:05.435: Log message received from MCP
KUPM:13:39:05.435: 作业 "SYSTEM"."SYS_IMPORT_FULL_03" 因致命错误于 星期一 12月 8 13:39:05 2014 elapsed 0 03:01:06 停止
KUPM:13:39:05.498: In RESPOND_TO_START
KUPM:13:39:05.498: In check_workers...
KUPM:13:39:05.498: Live worker count is: 0
KUPM:13:39:05.498: worker id is:
KUPM:13:39:05.498: Worker error is: 0
结果非常令人失望,也没什么有价值的错误提示。
步骤二:查看下alert日志
这时候想到查看下alert日志,也许里面能给一些提示,打开日志果然发现了非常有价值的信息,如下:
Tue Dec 09 02:00:00 2014
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Tue Dec 09 02:00:12 2014
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x0] [PC:0x14575B408, kpodpals()+5174]
ERROR: Unable to normalize symbol name for the following short stack (at offset 213):
dbgexProcessError()+200<-dbgeExecuteForError()+65<-dbgePostErrorKGE()+2269<-dbkePostKGE_kgsf()+77<-kgeade()+562<-kgerelv()+151<-kgerev()+45<-kgerec5()+60<-sss_xcpt_EvalFilterEx()+1862<-sss_xcpt_EvalFilter()+174<-.1.4_5+59<-0000000077C985A8<-0000000077CA9D0D<-0000000077C991AF<-0000000077CD1278<-kpodpals()+5174<-kpodpp()+4946<-opiodr()+1631<-kpoodr()+699<-xupirtrc()+2833<-upirtrc()+117<-kpurcsc()+150<-kpudpxp_ctxPrepare()+19418<-OCIDirPathPrepare()+11<-kupd_initDirPath()+1048<-kupdls()+2863<-spefcifa()+3937<-spefmccallstd()+532<-pextproc()+47<-PGOSF493_peftrusted()+134<-psdexsp()+297<-rpiswu2()+3039<-psdextp()+951<-pefccal()+785<-pefcal()+225<-pevm_FCAL()+164<-pfrinstr_FCAL()+69<-pfrrun_no_tool()+77<-pfrrun()+1241<-plsql_run()+903<-peicnt()+328<-kkxexe()+616<-opiexe()+20959<-kpoal8()+2397<-opiodr()+1631<-kpoodr()+699<-xupirtrc()+2833<-upirtrc()+117<-kpurcsc()+150<-kpuexec()+10984
Errors in file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_dw00_5304.trc (incident=8636):
ORA-07445: 出现异常错误: 核心转储 [kpodpals()+5174] [ACCESS_VIOLATION] [ADDR:0x0] [PC:0x14575B408] [UNABLE_TO_READ] []
Incident details in: D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\incident\incdir_8636\orcl_dw00_5304_i8636.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue Dec 09 02:00:17 2014
Dumping diagnostic data in directory=[cdmp_20141209020017], requested by (instance=1, osid=5304 (DW00)), summary=[incident=8636].
Tue Dec 09 02:00:19 2014
Sweep [inc][8636]: completed
Sweep [inc2][8636]: completed
Tue Dec 09 02:00:55 2014
有了这个信息,更进一步的日志分析我这里就不再叙述,因为也没什么价值信息,我们这里就抓住ORA-07445 [kpodpals()+5174]不放,这种核心错误,一般99%是Oracle的BUG引起,通过Oracle的官方信息,果然发现了一篇文档:
ORA-7445 [kpodpals] During DataPump Import (文档 ID 1096837.1)
SYMPTOMS
You perform a DataPump import and this breaks with errors:
#> impdp system/password directory=dpu dumpfile=a_table.dmp table_exists_action=replace
Import: Release 10.2.0.1.0 - Production on Wednesday, 21 April, 2010 9:21:43
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "SYSTEM"."SYS_IMPORT_FULL_01": system/******** directory=dpu
dumpfile=a_table.dmp table_exists_action=replace
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
ORA-39014: One or more workers have prematurely exited.
ORA-39029: worker 1 with process name "DW01" prematurely terminated
ORA-31672: Worker process DW01 died unexpectedly.
Job "SYSTEM"."SYS_IMPORT_FULL_01" stopped due to fatal error at 09:23:32
CAUSE
This is addressed in Bug 9626756. A no-name column "<space>" is included in the table definition.
The imported table is defined as:
create table a_table
(
id number,
" " varchar2(10), -- " " means "<one space>"
text varchar2(10)
);
SOLUTION
1. Don't use columns like "<space>" in the source database
- OR -
2. If a table has such columns, then exclude the table during import with:
exclude=table:\"IN ('A_TABLE')\"
原因就是有表的字段是空格,坑爹啊,居然有这么创建表的,接下来我们就要查询下我们系统中是否真的存在这样的表。
解决过程
步骤一:查询表字段
通过上门的SQL语句,一查询果然有有空字段,表名是TX_诊疗项目,该表不是我们系统带的表,是由于之前渠道做系统切换的时候,保存的用户旧系统数据的表,真是害死人啊。
步骤二:排除表重新导入
有两种方式解决,1.在正式库中对表进行调整或者重建,2.导入的时排除问题表,经过沟通决定采用第二种方法,排除表
impdp system/xxxxx DIRECTORY=dp full=y DUMPFILE=wzyfull20141205b_01.dmp logfile=impdp1209.log trace=4a0300 exclude=TABLE:\"IN \(\'ZLHIS.TX_诊疗项目\'\)\",SCHEMA:\"IN\(\'SYS\',\'SYSTEM\',\'OUTLN\',\'MGMT_VIEW\',\'FLOWS_FILES\',\'MDSYS\',\'ORDSYS\',\'EXFSYS\',\'DBSNMP\',\'WMSYS\',\'WKSYS\',\'WK_TEST\',\'CTXSYS\',\'ANONYMOUS\',\'SYSMAN\',\'XDB\',\'WKPROXY\',\'ORDPLUGINS\',\'FLOWS_030000\',\'OWBSYS\',\'SI_INFORMTN_SCHEMA\',\'OLAPSYS\',\'SCOTT\',\'ORACLE_OCM\'\)\"
经过3个多小时的漫长等待,数据成功导入。
关键知识点
数据泵日志跟踪:通过在导出导入时,添加trace参数,产生跟踪日志文件
ORA-7445 [kpodpals]: Bug 9626756.在一个表中包含一个没有名字的全是空格的字段