MySQL: 诡异的MySQL server has gone away及其解决

在Mysql执行show status,通常更关注缓存效果、进程数等,往往忽略了两个值:

Variable_name Value
Aborted_clients 3792
Aborted_connects 376

通常只占query的0.0x%,所以并不为人所重视。而且在传统Web应用上,query错误对用户而言影响并不大,只是重新刷新一下页面就OK了。最近的基础改造中,把很多应用作为service运行,无法提示用户重新刷新,这种情况下,可能就会影响到服务的品质。

通过程序脚本的日志跟踪,主要报错信息为“MySQL server has gone away”。官方的解释是:

The most common reason for the MySQL server has gone away error is that the server timed out and closed the connection.

Some other common reasons for the MySQL server has gone away error are:

  • You (or the db administrator) has killed the running thread with a KILL statement or a mysqladmin kill command.

  • You tried to run a query after closing the connection to the server. This indicates a logic error in the application that should be corrected.

  • A client application running on a different host does not have the necessary privileges to connect to the MySQL server from that host.

  • You got a timeout from the TCP/IP connection on the client side. This may happen if you have been using the commands: mysql_options(..., MYSQL_OPT_READ_TIMEOUT,...) or mysql_options(..., MYSQL_OPT_WRITE_TIMEOUT,...). In this case increasing the timeout may help solve the problem.

  • You have encountered a timeout on the server side and the automatic reconnection in the client is disabled (the reconnect flag in the MYSQL structure is equal to 0).

  • You are using a Windows client and the server had dropped the connection (probably because wait_timeout expired) before the command was issued.

    The problem on Windows is that in some cases MySQL doesn't get an error from the OS when writing to the TCP/IP connection to the server, but instead gets the error when trying to read the answer from the connection.

    In this case, even if the reconnect flag in the MYSQL structure is equal to 1, MySQL does not automatically reconnect and re-issue the query as it doesn't know if the server did get the original query or not.

    The solution to this is to either do a mysql_ping on the connection if there has been a long time since the last query (this is what MyODBC does) or set wait_timeout on the mysqld server so high that it in practice never times out.

  • You can also get these errors if you send a query to the server that is incorrect or too large. If mysqld receives a packet that is too large or out of order, it assumes that something has gone wrong with the client and closes the connection. If you need big queries (for example, if you are working with big BLOB columns), you can increase the query limit by setting the server's max_allowed_packet variable, which has a default value of 1MB. You may also need to increase the maximum packet size on the client end. More information on setting the packet size is given in Section A.1.2.9, “Packet too large”.

    An INSERT or REPLACE statement that inserts a great many rows can also cause these sorts of errors. Either one of these statements sends a single request to the server irrespective of the number of rows to be inserted; thus, you can often avoid the error by reducing the number of rows sent per INSERT or REPLACE.

  • You also get a lost connection if you are sending a packet 16MB or larger if your client is older than 4.0.8 and your server is 4.0.8 and above, or the other way around.

  • It is also possible to see this error if hostname lookups fail (for example, if the DNS server on which your server or network relies goes down). This is because MySQL is dependent on the host system for name resolution, but has no way of knowing whether it is working — from MySQL's point of view the problem is indistinguishable from any other network timeout.

    You may also see the MySQL server has gone away error if MySQL is started with the --skip-networking option.

    Another networking issue that can cause this error occurs if the MySQL port (default 3306) is blocked by your firewall, thus preventing any connections at all to the MySQL server.

  • You can also encounter this error with applications that fork child processes, all of which try to use the same connection to the MySQL server. This can be avoided by using a separate connection for each child process.

  • You have encountered a bug where the server died while executing the query.


据此分析,可能原因有3:

1,Mysql服务端与客户端版本不匹配。

2,Mysql服务端配置有缺陷或者优化不足

3,需要改进程序脚本

通过更换多个服务端与客户端版本,发现只能部分减少报错,并不能完全解决。排除1。

对服务端进行了彻底的优化,也未能达到理想效果。在timeout的取值设置上,从经验值的10,到PHP默认的60,进行了多次尝试。而Mysql官方默认值(8小时)明显是不可能的。从而对2也进行了排除。(更多优化的经验分享,将在以后整理提供)

针对3对程序代码进行分析,发现程序中大量应用了类似如下的代码(为便于理解,用原始api描述):

$conn=mysql_connect( ... ... );

... ... ... ...

if(!$conn){ //reconnect

$conn=mysql_connect( ... ... );

}

mysql_query($sql, $conn);

这段代码的含义,与Mysql官方建议的方法思路相符[ If you have a script, you just have to issue the query again for the client to do an automatic reconnection. ]。在实际分析中发现,if(!$conn)并不是可靠的,程序通过了if(!$conn)的检验后,仍然会返回上述错误。

对程序进行了改写:

if(!conn){ // connect ...}

elseif(!mysql_ping($conn)){ // reconnect ... }

mysql_query($sql, $conn);

经实际观测,MySQL server has gone away的报错基本解决。

BTW: 附带一个关于 reconnect 的疑问,

在php4x+client3x+mysql4x的旧环境下,reconnet的代码:

$conn=mysql_connect(...) 可以正常工作。

但是,在php5x+client4x+mysql4x的新环境下,$conn=mysql_connect(...)返回的$conn有部分情况下不可用。需要书写为:

mysql_close($conn);

$conn=mysql_connect(...);

返回的$conn才可以正常使用。原因未明。未做深入研究,也未见相关讨论。或许mysql官方的BUG汇报中会有吧。

~~呵呵~~

你可能感兴趣的:(MySQL: 诡异的MySQL server has gone away及其解决)