一、确认现场故障,将故障分门别类 2012/10/23~2012/10/24
二、针对某一类别的故障,进行处理--并跟踪反馈,已知故障类别:
1、LABEL设置问题导致系统进不去或者启动不了。 2012/10/23 修正2台,并进行观察
2、IE无法登陆进去。 2012/10/23 修改mysql的配置,增加日志级别,着重分析什么因素导致数据全乱
-----------------------------------------------------由技术支持来跟踪------------------------------------------
3、BIOS升级部分,由研华硬件人员来确定问题:
3.1、找到新上项目,跟踪新上项目是否存在硬盘问题。
针对二:
解决思路如下:每一种类别抽取2台机器进行维修,然后就近替换附近的工控机,
磁盘故障问题:
问题描述:
/1: ******* FILE SYSTEM WAS MODIFIED ****
/1: ******* REBOOT LINUX *****
/1:106352/2050272 files (0.2% NO-contiguous), 597367/2048287 blocks
192.168.0.208
what:磁盘故障,系统无法进入 why:磁盘只读,且只读发生在系统分区 how:fsck修复,然后修改fstab的配置
192.168.10.206
what:ie登陆不进去,提示
登录人数超过最大设置,请联系管理员修改最大登录数值或稍后登录!
TSMIS-2.1-1
why:数据库损坏,值被变更。
how:修改日志级别,确认数据库的值为什么被变更。 简洁修复方法用:使用肖东光的升级包
[mysqld]---开启日志
log=/var/log/mysqld_common.log
log-error=/var/log/mysqld_err.log
log-bin=/var/log/mysqld_bin.bin
碰到问题:
解决问题:
【转自】http://blog.haohtml.com/archives/9202
今天突然收到消息机房的一台服务器的mysql无法启动了,首先检查了一下mysql的错误日志,发现最后出现以下错误:
020101 00:42:21 mysqld started
/usr/local/mysql/libexec/mysqld: File './mysql-bin.index' not found (Errcode: 13)
020101 0:42:21 [ERROR] Aborting
020101 0:42:21 [Note] /usr/local/mysql/libexec/mysqld: Shutdown complete
提示./mysql-bin.index无法找到(由于mysql开启了bin日志功能),到数据库根目录查看该文件是存在的,可能是文件权限的问题,查看了数据库根目录的权限是700,所有者和用户组都是root,可能是上次转移数据库的时候不小心修改了文件夹的权限。
解决方法:
chgrp -R mysql /usr/local/mysql/data && chown -R mysql /usr/local/mysql/data
chgrp -R mysql /usr/share/TSMIS/mysql/test && chown -R mysql /usr/share/TSMIS/mysql/test
--
- 090211 0:23:19 [ERROR] Could not use /var/lib/mysql/testbox-04-slow.log for logging (error 13). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
- 090211 0:23:20 InnoDB: Started; log sequence number 0 43656
复制代码
mysql对 /var/log权限不够,需要新建一个文件夹如 /var/log/mysqllogdir 然后把所有生成的日志指向这里。
还需要:
--》default-character-set=utf8 注释掉,不然日志没法读取
选项恢复为default-character-set=utf8
搞定之后,通过
show variables like 'log_bin'; 确认日志是否打开,打开时有如下提示:
mysql> show variables like 'log_bin';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin | ON |
+---------------+-------+
然后升级web升级包。
--------------------------------解决步骤如下
1.把日志级别都打开,特别是common日志和bin的二进制日志
2.统计出现web故障的机器,10分钟内重启的次数,重启的原因,确认数据库损坏与重启是否有直接关系。
-----------------每一样机器都重新打一个包,方便升级--------------------------------
web升级的包:,
goldway.war不需要更变,配置相关的表,重新构建
备份数据库中的数据以及日志信息
打开二进制日志,修改my.cnf的配置(新建日志文件夹、修改my.cnf的配置,修改default字符集配置)
针对系统无法进入的升级包:
Version: '5.0.77' socket: '/usr/share/TSMIS/mysql/mysql.sock' port: 3306 Source distribution
120923 13:06:42 [Note] Found 69 of 70 rows when repairing './test/functionality'
120923 13:06:42 [Note] Found 130 of 173 rows when repairing './test/role_functionality'
正常的有173条--默认的数据,但是数据修复的时候,把它修复成130条
Sep 23 13:06:41 TSMIS avahi-daemon[2550]: Host name conflict, retrying with <TSMIS-5>
Sep 23 13:06:41 TSMIS avahi-daemon[2550]: Registering new address record for fe80::222:46ff:fe14:213 on eth0.
Sep 23 13:06:41 TSMIS avahi-daemon[2550]: Registering new address record for 192.168.10.178 on eth0.
Sep 23 13:06:41 TSMIS avahi-daemon[2550]: Registering new address record for fe80::222:46ff:fe14:5a07 on eth1.
Sep 23 13:06:41 TSMIS avahi-daemon[2550]: Registering new address record for 192.168.6.33 on eth1.
Sep 23 13:06:41 TSMIS avahi-daemon[2550]: Registering HINFO record with values 'I686'/'LINUX'.
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Withdrawing address record for fe80::222:46ff:fe14:213 on eth0.
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Withdrawing address record for 192.168.10.178 on eth0.
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Withdrawing address record for 192.168.6.33 on eth1.
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Host name conflict, retrying with <TSMIS-6>
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Registering new address record for fe80::222:46ff:fe14:213 on eth0.
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Registering new address record for 192.168.10.178 on eth0.
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Registering new address record for fe80::222:46ff:fe14:5a07 on eth1.
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Registering new address record for 192.168.6.33 on eth1.
Sep 23 13:06:42 TSMIS avahi-daemon[2550]: Registering HINFO record with values 'I686'/'LINUX'.
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Withdrawing address record for fe80::222:46ff:fe14:213 on eth0.
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Withdrawing address record for 192.168.10.178 on eth0.
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Withdrawing address record for 192.168.6.33 on eth1.
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Host name conflict, retrying with <TSMIS-7>
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Registering new address record for fe80::222:46ff:fe14:213 on eth0.
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Registering new address record for 192.168.10.178 on eth0.
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Registering new address record for fe80::222:46ff:fe14:5a07 on eth1.
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Registering new address record for 192.168.6.33 on eth1.
Sep 23 13:06:43 TSMIS avahi-daemon[2550]: Registering HINFO record with values 'I686'/'LINUX'.
相减,等于 2304,接下去83886081,接下去67108863,左右相减是256
---权限问题(登陆不上以及其他相关),归根到底是数据库修复时引发
privilege_functionality 表
role_functionality 表。---导致缺项
120922 08:24:05 mysqld started
然后 120922 08:24:10 ~ 120922 08:24:11 进行ftp修复,日志信息如下:
found 2 of 6 rows when repairing 表。
a -o -A --
auto-repair 而不是 -r,--r
『----调查最后一条数据在07:58:25,然后重启的原因是ip冲突』
(12591|2946448272)LM_ERROR,Sat Sep 22 2012 07:58:06.274104,CallBack.cpp,3449: 2012-09-22 07:58:05.210 | VN=1081,DT=0,PT=1,PN=1,RN=1,TN=1,VT
=0,Plate=ÎÞ³µÅÆ
(12591|2946448272)LM_ERROR,Sat Sep 22 2012 07:58:26.397562,CallBack.cpp,3449: 2012-09-22 07:58:25.315 | VN=1082,DT=0,PT=1,PN=1,RN=1,TN=1,VT
=0,Plate=ÎÞ³µÅÆ
对应的control日志:可以看出不是因为进程导致的重启
1 2012 15:15:45.897007,ProcessControl.cpp,190: SpawnAllChildren() App diskMaintain is already running
(2703|3086395088)LM_DEBUG,Sat Sep 22 2012 08:01:30.494150,../GDW_ConfigManager.cpp,136: NOW Initial
(2703|3086395088)LM_DEBUG,Sat Sep 22 2012 08:01:30.544336,../GDW_ConfigManager.cpp,143: Get the handle of sql
(2703|3086395088)LM_DEBUG,Sat Sep 22 2012 08:01:30.545242,../GDW_ConfigManager.cpp,151: initial sql
(2703|3086395088)LM_DEBUG,Sat Sep 22 2012 08:01:30.545368,../GDW_ConfigManager.cpp,179: Load
(2703|3086395088)LM_DEBUG,Sat Sep 22 2012 08:01:30.545429,../GDW_ConfigManager.cpp,136: NOW Initial
(2703|3086395088)LM_DEBUG,Sat Sep 22 2012 08:01:30.545486,../GDW_ConfigManager.cpp,143: Get the handle of sql
对应的message日志
o action taken because this happened within the defend interval
Sep 21 16:41:00 TSMIS last message repeated 4 times
Sep 22 07:29:05 TSMIS ipwatchd[2805]: MAC address 0:22:46:14:9d:66 causes IP conflict with address 192.168.10.208 set on interface eth0 - p
assive mode - reply not sent
Sep 22 07:36:41 TSMIS last message repeated 2 times
Sep 22 07:43:46 TSMIS ipwatchd[2805]: MAC address 0:22:46:14:9d:66 causes IP conflict with address 192.168.10.208 set on interface eth0 - p
assive mode - reply not sent
Sep 22 07:52:35 TSMIS ipwatchd[2805]: MAC address 0:22:46:14:9d:66 causes IP conflict with address 192.168.10.208 set on interface eth0 - p
assive mode - reply not sent
Sep 22 07:59:35 TSMIS ipwatchd[2805]: MAC address 0:22:46:14:9d:66 causes IP conflict with address 192.168.10.208 set on interface eth0 - p
assive mode - reply not sent
Sep 22 08:01:04 TSMIS syslogd 1.4.1: restart.
机器不断重启,除非load_watchdog.log watchdog_daemon
有机器磁盘只读
cmdService=192.168.11.249
dataManage=10.83.59.71
一台 缺少etc