由 tpp.c:63: __pthread_tpp_change_priority failed 引发的思考

早上在测试一个程序的时候,同事发现了我们的程序出现了 tpp.c:63: __pthread_tpp_change_priority failed的错误报告。当时同事查了这方面的资料,发现是线程互斥锁(pthread_mutex_t)的缘故,并假设了出错的原因。关于这个我们还讨论了一阵。自己有些地方想不明白,于是向同事要了一些链接看了看。

因为这一段程序是我写的,所以想把自己出错的原因找到。Google了一下,找到了一些解释(见注释1)。

下边的这张图就是自己使用mutex的程序结构:

由 tpp.c:63: __pthread_tpp_change_priority failed 引发的思考_第1张图片

按照自己的理解,两个线程会交互运行。当时同事说不是。在这一点我们有争执。但是查过了一些资料,发现这样确实存在不是交互运行的情况。

于是自己在自己的PC和ARM平台上做了测试。

测试如下:

代码:                                                  

#include <stdio.h>           
#include <pthread.h>
#include <string.h>

pthread_mutex_t   sMutex;

void * test_pthread1(void * arg)
{
    while(1)
    {  
        pthread_mutex_lock(&sMutex);
        sleep(1);
        printf("%s: %d\n", __func__, __LINE__);
        pthread_mutex_unlock(&sMutex);
#ifdef T_SLEEP
       usleep(1);
#endif
    }  
}

void * test_pthread2(void * arg)
{
    while(1)
    {           pthread_mutex_lock(&sMutex);
        sleep(1);                                                              
        printf("%s: %d\n", __func__, __LINE__);
        pthread_mutex_unlock(&sMutex);
#ifdef  T_SLEEP
       usleep(1);
#endif
    }
}

int main(int arg, char **argv)
{    pthread_t  sPthread1;
    pthread_t  sPthread2;
    pthread_mutexattr_t  sTemp;

#ifdef  SET_ATTRUBITE
    pthread_mutexattr_init(&sTemp);
    pthread_mutexattr_settype(&sTemp, PTHREAD_MUTEX_RECURSIVE_NP);
    pthread_mutex_init(&sMutex, &sTemp);
#else
    pthread_mutex_init(&sMutex, NULL);
#endif 

    pthread_create(&sPthread1, NULL, test_pthread1, NULL);
    pthread_create(&sPthread2, NULL, test_pthread2, NULL);

    pthread_join(sPthread1, NULL);
    pthread_join(sPthread2, NULL);

QUIT:
    return 0;
}

嵌入式软硬件平台:
系统:
Linux version 2.6.32.20 (root@slax) (gcc version 4.2.3) #562 Fri Jun 15 12:21:42
CST 2012
CPU:
Processor       : ARM926EJ-S rev 5 (v5l)
BogoMIPS        : 66.15
Features        : swp half fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 5

Hardware        : Atmel AT91SAM9260-EK
Revision        : 0000
Serial          : 0000000000000000
GCC version:
Using built-in specs.
Target: arm-none-linux-gnueabi
Configured with: /scratch/paul/arm/src/gcc-4.2/configure --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --target=arm-none-linux-gnueabi --enable-shared --enable-threads --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --with-gnu-as --with-gnu-ld --prefix=/opt/codesourcery --enable-languages=c,c++ --enable-symvers=gnu --enable-__cxa_atexit --with-versuffix=CodeSourcery Sourcery G++ Lite 2007q1-10 --with-pkgversion=CodeSourcery Sourcery G++ Lite 2007q1-10 --with-bugurl=https://support.codesourcery.com/GNUToolchain/ --disable-nls --with-sysroot=/opt/codesourcery/arm-none-linux-gnueabi/libc --with-build-sysroot=/scratch/paul/arm/install/arm-none-linux-gnueabi/libc --enable-poison-system-directories --with-build-time-tools=/scratch/paul/arm/install/arm-none-linux-gnueabi/bin --with-build-time-tools=/scratch/paul/arm/install/arm-none-linux-gnueabi/bin
Thread model: posix
gcc version 4.2.0 20070413 (prerelease) (CodeSourcery Sourcery G++ Lite 2007q1-10)
Libc version:
GNU C Library stable release version 2.7, by Roland McGrath et al.
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.2.3.
Compiled on a Linux >>2.6.24.4<< system on 2008-04-22.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        Native POSIX Threads Library by Ulrich Drepper et al
        Support for some architectures added on, not maintained in glibc core.
        BIND-8.2.3-T5B
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
PC软硬件平台:
系统:
Linux version 2.6.27.5-117.fc10.i686 ([email protected]) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP Tue Nov 18 12:19:59 EST 2008
CPU:
processor     : 0
vendor_id     : GenuineIntel
cpu family     : 6
model          : 22
model name     : Intel(R) Celeron(R) CPU          420  @ 1.60GHz
stepping     : 1
cpu MHz          : 1595.996
cache size     : 512 KB
fdiv_bug     : no
hlt_bug          : no
f00f_bug     : no
coma_bug     : no
fpu          : yes
fpu_exception     : yes
cpuid level     : 10
wp          : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss nx constant_tsc up arch_perfmon pebs bts pni ssse3
bogomips     : 3191.99
clflush size     : 64
power management:
Gcc:
使用内建 specs。
目标:i386-redhat-linux
配置为:../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-cpu=generic --build=i386-redhat-linux
线程模型:posix
gcc 版本 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) 
Libc version:
GNU C Library stable release version 2.9, by Roland McGrath et al.
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.3.2 20081105 (Red Hat 4.3.2-7).
Compiled on a Linux >>2.6.18-92.1.10.el5<< system on 2008-11-13.
Available extensions:
     The C stubs add-on version 2.1.2.
     crypt add-on version 2.1 by Michael Glad and others
     GNU Libidn by Simon Josefsson
     Native POSIX Threads Library by Ulrich Drepper et al
     BIND-8.2.3-T5B
     RT using linux kernel aio
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

互斥锁双线程测试:
没有设置宏定义的时候
ARM平台:
root@Vancount:tmp# ./arm_test_mutex
test_pthread2: 26
test_pthread2: 26
test_pthread2: 26
test_pthread2: 26
test_pthread2: 26
test_pthread2: 26
PC平台:
[tt@vacount lin]$ ./test_mutex
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14

设置宏定义 SET_ATTRIBUTE
#define SET_ATTRIBUTE
ARM平台:
root@Vancount:tmp# ./arm_test_mutex
test_pthread2: 26
test_pthread2: 26
test_pthread2: 26
test_pthread2: 26
test_pthread2: 26

PC平台:
[tt@vacount lin]$ ./test_mutex
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14
test_pthread2: 26

设置宏定义 T_SLEEP
#define T_SLEEP
ARM平台:
root@Vancount:tmp# ./arm_test_mutex
test_pthread2: 26
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14

PC平台:
[tt@vacount lin]$ ./test_mutex
test_pthread1: 14
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14
test_pthread2: 26
test_pthread1: 14
test_pthread2: 26
(同时对于信号量也进行了测试,信号量的测试结果同互斥锁)

发现这里PC平台上,和ARM平台上的多线程运行结果有很大的不同。在PC上从第一个运行的线程来看,是先创建的线程先运行;而ARM平台上是后创建的线程先运行。这里不知道是否是因为GCC版本不同造成的。同时关于线程间的切换可以看到,在ARM平台上如果在解锁后不调用usleep(),两个线程只有一个线程在运行。

对于不同平台的程序测试要注意!

在单核CPU下,Linux的多线程切换,有时候并不是和我们想象的那样。

tpp.c下出错的源代码片段(完整见注):

61:   assert (new_prio == -1
62:          || (new_prio >= __sched_fifo_min_prio
63:              && new_prio <= __sched_fifo_max_prio));
64:   assert (previous_prio == -1
65:          || (previous_prio >= __sched_fifo_min_prio
66:              && previous_prio <= __sched_fifo_max_prio));
在有一篇帖子中看到作者的调试记录,发现是在63行处,是new_prio的值太大了,导致assert出错。这里是NPTL线程库中的函数出现了问题。

看到解决方案是设置mutex的属性,由默认的快速互斥锁修改为递归互斥锁就可以解决这个问题。

> The initialisation of the mutex is being done as follows:
>
> ---
> pthread_mutexattr_t mutexattr;
>
Just guessing: no pthread_mutexattr_init()!

> // Set the mutex as a recursive mutex
> pthread_mutexattr_settype(&mutexattr, PTHREAD_MUTEX_RECURSIVE_NP);

(……)


注1:

几个链接:

http://sourceware.org/ml/libc-help/2008-05/msg00071.html
问题的具体描述。
http://sourceware.org/ml/libc-help/2008-05/msg00072.html
http://sourceware.org/bugzilla/show_bug.cgi?id=3610
问题的解答。
上边的帖子中提到,是因为mutex的属性问题,默认的mutex属性是fast性。多次重复加锁可能出问题。所以如果修改成递归属性,则会避免这个问题。
http://topic.csdn.net/u/20100317/10/ffb0994d-ea2d-49b4-94c6-ecf8aef1ca31.html
关于互斥锁的一个讨论。
http://www.xxlinux.com/linux/article/development/soft/20090424/16485.html
Linux多线程编程的一个挺好的帖子
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/nptl/tpp.c?annotate=1.1.2.2&cvsroot=glibc
tpp.c源代码网址,tpp.c是Libc中NPTL(Native POSIX Thread Library)的一个文件。

你可能感兴趣的:(linux,gcc,测试,null,library,平台)