sqlplus应当是DBA 1.0时代使用最为频繁的管理工具,经常有经验丰富的老DBA会提到自己敲过几万次的sqlplus:),但有的时候这个吃饭家伙也会不好用,偶尔还会出现Segmentation fault错误,亦或者彻底hang住。在这里我介绍几种应对sqlplus无法正常使用的应对方法: 1.出现Segmentation fault,这种情况下一般是sqlplus 2进制文件被损坏了,可以通过重新build一个sqlplus来解决问题
[oracle@rh2 bin]$ sqlplus Segmentation fault /* 使用$ORACLE_HOME/sqlplus/lib目录下的make文件,编译一个新的sqlplus */ [oracle@rh2 ~]$ make -f $ORACLE_HOME/sqlplus/lib/ins_sqlplus.mk newsqlplus Linking /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus rm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus gcc -o /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus -m64 -L/s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/ -L/s01/oracle/product/11.2.0/dbhome_1/lib/ -L/s01/oracle/product/11.2.0/dbhome_1/lib/stubs/ /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/s0afimai.o -lsqlplus -lclntsh `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnnz11 -lzt11 -lztkg11 -lztkg11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lmm -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/ldflags` -lncrypt11 -lnsgr11 -lnzjs11 -ln11 -lnl11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist` -Wl,-rpath,/s01/oracle/product/11.2.0/dbhome_1/lib -lm -lpthread `cat /s01/oracle/product/11.2.0/dbhome_1/lib/sysliblist` -ldl -lm -lpthread -L/s01/oracle/product/11.2.0/dbhome_1/lib /bin/chmod 755 /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus rm -f /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus mv -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus /bin/chmod 751 /s01/oracle/product/11.2.0/dbhome_1/bin/sqlplus rm -f /s01/oracle/product/11.2.0/dbhome_1/sqlplus/lib/libsqlplus.so rm -rf /s01/oracle/product/11.2.0/dbhome_1/sqlplus/bin/sqlplus [oracle@rh2 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.2.0 Production on Wed May 11 21:38:21 2011 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options2.出现sqlplus之后hang住的现象,hang的原因存在多种可能: 1)instance hanging数据库实例hang住,这种情况下sqlplus无法正常登陆到正hang的实例,而登陆到其他实例是可以的;若在10g以后版本中可以使用-prelim选项登陆实例,使用该选项登陆后无法执行普通的SQL语句,但可以使用oradebug内部调试工具,通过oradebug收集必要的hanganalyze信息后,可以进一步判断hang住的原因并决定下一步的操作。
[oracle@rh2 ~]$ sqlplus / as sysdba
.............................we suspend here!!!
[oracle@rh2 ~]$ sqlplus -prelim / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Wed May 11 21:46:27 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump hanganalyze 4;
Statement processed.
SQL> oradebug dump systemstate 266;
Statement processed.
SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ora_23436.trc -- where dump resides
将以上trc文件提交给Oracle Support或者资深的Oracle技术人员,以便他们分析出实例hang住的原因,通过调整参数或者修复bug可以避免再次出现类似的状况。 2)一执行sqlplus就出现挂起现象,甚至没有登陆任何数据库。一般这种情况是在读取sqlplus 2进制文件或其相关的共享库文件(.so文件)时遇到了问题,或者是在实际system call系统调用execve("sqlplus")时遇到了错误,一般我们可以使用系统跟踪工具strace(Linux)或truss(Unix)工具来分析这种挂起现象:
/* Unix */ truss -o sqlplus_hang.log sqlplus /* Linux */ strace -o sqlplus_hang.log sqlplus head -10 sqlplus_hang.log execve("/s01/db_1/bin/sqlplus", ["sqlplus"], [/* 28 vars */]) = -1 ENOEXEC (Exec format error)可以看到以上strace记录中发现了调用execve函数(execve() executes the program pointed to by filename)运行sqlplus程序时出现了ENOEXEC错误,该ENOEXEC错误代码说明我们正在执行一个格式无效的可执行文件,具体的解释如下:
This error indicates that a request has been made to execute a file which, although it has the appropriate permissions, does not start with a valid magic number. A magic number is the first two bytes in a file, used to determine what type of file it is. You tried to execute a file that is not in a valid executable format. The most common format for binary programs under linux is called ELF. Note that your shell will run ascii files that have the executable bit set as a shell script (ie run it as shell commands). You can reproduce this by doing $ dd if=/dev/random of=myfile bs=1k count=1 $ chmod +x myfile $ ./myfile zsh: exec format error: ./myfile Note that there is a very slight possibility that you could create a valid program that does something bad to your system!! Note, you can have user defined ways of running programs using Linux's binfmt_misc. See /usr/src/linux/Documentation/binfmt_misc.txtto be continued ............