服务器定时器处理要注意的问题
今天早上,例行巡查服务器,我用strace命令跟踪服务器进程的时候, 发现有几个服务器进程出现了死锁情况, gdb继续跟进,显示如下:
我曾经想在我的服务器代码中尽量减少对象的构造/析构, 但是想了一下, 这个策略不是治本的办法, 这意味着我必须在写代码的时候处处小心, 今天可能在A处出现死锁, 明天可能会在B处出现.而且, 由于使用的是C++, 一些局部对象的构造和析构是不可避免的.
于是, 解决这个问题的思路就改变为:尽量的简单化定时器处理操作.目前我想到的一个策略时, 一个定时器被触发的时候, 置一个标志位, 而不是在在触发的时候调用相应的处理函数, 然后在服务器的主循环中判断是否被置位, 如果是的话再去调用相关的处理函数.
于是, 原来的思路就是:
修改之后的思路是:
这是大概的模型上面的改变.这是我目前能想到的处理该问题的最好办法, 如果哪位有更好的办法欢迎补充.
gdb) bt
# 0 0x00ff9410 in __kernel_vsyscall ()
# 1 0x004d593e in __lll_mutex_lock_wait () from / lib / libc.so. 6
# 2 0x00465b38 in _L_lock_14080 () from / lib / libc.so. 6
# 3 0x00464df4 in free () from / lib / libc.so. 6
# 4 0x006c7691 in operator delete () from / usr / lib / libstdc ++ .so. 6
# 5 0x08059cfb in __gnu_cxx::new_allocator < std::_List_node < TTimeEvent > > ::deallocate ( this = 0x98e0064 , __p = 0x98e1218 )
at / usr / lib / gcc / i386 - redhat - linux / 4.1 . 1 / .. / .. / .. / .. / include / c ++/ 4.1 . 1 / ext / new_allocator.h: 94
# 6 0x08059d20 in std::_List_base < TTimeEvent, std::allocator < TTimeEvent > > ::_M_put_node ( this = 0x98e0064 , __p = 0x98e1218 )
at / usr / lib / gcc / i386 - redhat - linux / 4.1 . 1 / .. / .. / .. / .. / include / c ++/ 4.1 . 1 / bits / stl_list.h: 320
# 7 0x08059d81 in std::list < TTimeEvent, std::allocator < TTimeEvent > > ::_M_erase ( this = 0x98e0064 , __position =
{_M_node = 0x98e1218 }) at / usr / lib / gcc / i386 - redhat - linux / 4.1 . 1 / .. / .. / .. / .. / include / c ++/ 4.1 . 1 / bits / stl_list.h: 1150
# 8 0x08059db3 in std::list < TTimeEvent, std::allocator < TTimeEvent > > ::pop_front ( this = 0x98e0064 )
at / usr / lib / gcc / i386 - redhat - linux / 4.1 . 1 / .. / .. / .. / .. / include / c ++/ 4.1 . 1 / bits / stl_list.h: 747
# 9 0x08059334 in CTimerManager::Process ( this = 0x98e0058 ) at src / timermanager.cpp: 168
# 10 0x080597dd in Process (nSigNo = 14 ) at src / timermanager.cpp: 199
# 11 < signal handler called >
# 12 0x004612ba in _int_free () from / lib / libc.so. 6
# 13 0x00464e00 in free () from / lib / libc.so. 6
# 14 0x006c7691 in operator delete () from / usr / lib / libstdc ++ .so. 6
# 15 0x006a424d in std:: string ::_Rep::_M_destroy () from / usr / lib / libstdc ++ .so. 6
# 16 0x0069e40f in std::basic_stringbuf < char , std::char_traits < char > , std::allocator < char > > :: ~ basic_stringbuf ()
from / usr / lib / libstdc ++ .so. 6
# 17 0x0069fd7f in std::basic_stringstream < char , std::char_traits < char > , std::allocator < char > > :: ~ basic_stringstream ()
from / usr / lib / libstdc ++ .so. 6
# 18 0x080524ea in CDBMoudle::Insert ( this = 0x98e6e80 , tFixkey = @ 0xbfe1c6ec ) at src / dbmoudle.cpp: 59
# 19 0x08051718 in CConnectionTask::ProcFixContent ( this = 0x98e6510 ) at src / connectiontask.cpp: 218
# 20 0x0805196e in CConnectionTask::HandleRead ( this = 0x98e6510 ) at src / connectiontask.cpp: 86
# 21 0x08051a0f in CConnectionTask::Handle ( this = 0x98e6510 , nEvent = 1 ) at src / connectiontask.cpp: 52
# 22 0x080585af in IServer::Run ( this = 0xbfe1c7e0 ) at src / server.cpp: 133
# 23 0x08055328 in main (argc = 2 , argv = 0xbfe1c8b4 ) at src / main.cpp: 19
在#13处,调用free函数, 然后进入与之相关的libc函数调用中.但是, 在这个调用还没有完结之前被定时器管理模块中断, 进入了定时器处理的部分, 在在这个处理中同样调用了free函数, 于是出现了死锁的情况--因为malloc/free函数族不是可重入的, 在 这里有一篇相关的文章.
# 0 0x00ff9410 in __kernel_vsyscall ()
# 1 0x004d593e in __lll_mutex_lock_wait () from / lib / libc.so. 6
# 2 0x00465b38 in _L_lock_14080 () from / lib / libc.so. 6
# 3 0x00464df4 in free () from / lib / libc.so. 6
# 4 0x006c7691 in operator delete () from / usr / lib / libstdc ++ .so. 6
# 5 0x08059cfb in __gnu_cxx::new_allocator < std::_List_node < TTimeEvent > > ::deallocate ( this = 0x98e0064 , __p = 0x98e1218 )
at / usr / lib / gcc / i386 - redhat - linux / 4.1 . 1 / .. / .. / .. / .. / include / c ++/ 4.1 . 1 / ext / new_allocator.h: 94
# 6 0x08059d20 in std::_List_base < TTimeEvent, std::allocator < TTimeEvent > > ::_M_put_node ( this = 0x98e0064 , __p = 0x98e1218 )
at / usr / lib / gcc / i386 - redhat - linux / 4.1 . 1 / .. / .. / .. / .. / include / c ++/ 4.1 . 1 / bits / stl_list.h: 320
# 7 0x08059d81 in std::list < TTimeEvent, std::allocator < TTimeEvent > > ::_M_erase ( this = 0x98e0064 , __position =
{_M_node = 0x98e1218 }) at / usr / lib / gcc / i386 - redhat - linux / 4.1 . 1 / .. / .. / .. / .. / include / c ++/ 4.1 . 1 / bits / stl_list.h: 1150
# 8 0x08059db3 in std::list < TTimeEvent, std::allocator < TTimeEvent > > ::pop_front ( this = 0x98e0064 )
at / usr / lib / gcc / i386 - redhat - linux / 4.1 . 1 / .. / .. / .. / .. / include / c ++/ 4.1 . 1 / bits / stl_list.h: 747
# 9 0x08059334 in CTimerManager::Process ( this = 0x98e0058 ) at src / timermanager.cpp: 168
# 10 0x080597dd in Process (nSigNo = 14 ) at src / timermanager.cpp: 199
# 11 < signal handler called >
# 12 0x004612ba in _int_free () from / lib / libc.so. 6
# 13 0x00464e00 in free () from / lib / libc.so. 6
# 14 0x006c7691 in operator delete () from / usr / lib / libstdc ++ .so. 6
# 15 0x006a424d in std:: string ::_Rep::_M_destroy () from / usr / lib / libstdc ++ .so. 6
# 16 0x0069e40f in std::basic_stringbuf < char , std::char_traits < char > , std::allocator < char > > :: ~ basic_stringbuf ()
from / usr / lib / libstdc ++ .so. 6
# 17 0x0069fd7f in std::basic_stringstream < char , std::char_traits < char > , std::allocator < char > > :: ~ basic_stringstream ()
from / usr / lib / libstdc ++ .so. 6
# 18 0x080524ea in CDBMoudle::Insert ( this = 0x98e6e80 , tFixkey = @ 0xbfe1c6ec ) at src / dbmoudle.cpp: 59
# 19 0x08051718 in CConnectionTask::ProcFixContent ( this = 0x98e6510 ) at src / connectiontask.cpp: 218
# 20 0x0805196e in CConnectionTask::HandleRead ( this = 0x98e6510 ) at src / connectiontask.cpp: 86
# 21 0x08051a0f in CConnectionTask::Handle ( this = 0x98e6510 , nEvent = 1 ) at src / connectiontask.cpp: 52
# 22 0x080585af in IServer::Run ( this = 0xbfe1c7e0 ) at src / server.cpp: 133
# 23 0x08055328 in main (argc = 2 , argv = 0xbfe1c8b4 ) at src / main.cpp: 19
我曾经想在我的服务器代码中尽量减少对象的构造/析构, 但是想了一下, 这个策略不是治本的办法, 这意味着我必须在写代码的时候处处小心, 今天可能在A处出现死锁, 明天可能会在B处出现.而且, 由于使用的是C++, 一些局部对象的构造和析构是不可避免的.
于是, 解决这个问题的思路就改变为:尽量的简单化定时器处理操作.目前我想到的一个策略时, 一个定时器被触发的时候, 置一个标志位, 而不是在在触发的时候调用相应的处理函数, 然后在服务器的主循环中判断是否被置位, 如果是的话再去调用相关的处理函数.
于是, 原来的思路就是:
//
该函数在定时器到时的时候被触发
void signal()
{
// 定时器处理函数
dosomething();
}
while ( 1 )
{
服务器主循环;
}
void signal()
{
// 定时器处理函数
dosomething();
}
while ( 1 )
{
服务器主循环;
}
修改之后的思路是:
int
violate g_alarm
=
0
;
// 该函数在定时器到时的时候被触发
void signal()
{
g_alarm = 1 ;
}
while ( 1 )
{
服务器主循环;
if (g_alarm)
{
// 定时器处理函数
dosomething();
g_alarm = 0 ;
}
}
// 该函数在定时器到时的时候被触发
void signal()
{
g_alarm = 1 ;
}
while ( 1 )
{
服务器主循环;
if (g_alarm)
{
// 定时器处理函数
dosomething();
g_alarm = 0 ;
}
}
这是大概的模型上面的改变.这是我目前能想到的处理该问题的最好办法, 如果哪位有更好的办法欢迎补充.