1、修改下面代码(configure)
armv7*)
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: ok (${host_cpu})" >&5
$as_echo "ok (${host_cpu})" >&6; }
ARCH_MAX="arm"
;;
修改为:
armv7*|arm)
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: ok (${host_cpu})" >&5
$as_echo "ok (${host_cpu})" >&6; }
ARCH_MAX="arm"
;;
2、修改gcc_version
./configure:gcc_version=`${CC} --version \
./configure:gcc_version=4.6.3
CC=arm-hisiv100nptl-linux-gcc ./configure --host=arm-linux --prefix=/home/kongjun/mywork/hi_test/valgrind/arm_valgrind
编译过程:
1.
wget http://valgrind.org/downloads/valgrind-3.9.0.tar.bz2
tar xvf valgrind-3.9.0.tar.bz2
cd valgrind-3.9.0
apt-get install automake
./autogen.sh
2.
3.
4.
一、Memcheck-tools
1. 可查找的错误类型:
1) Illegal read or write errors
--read-var-info=<yes|no> [default:no],如果这个选项被打开,应用程序将运行的更慢,但是能够给出能多的错误细节。如下:
--read-var-info=no
==15516== Uninitialised byte(s) found during client check request
==15516== at 0x400633: croak (varinfo1.c:28)
==15516== by 0x4006B2: main (varinfo1.c:55)
==15516== Address 0x60103b is 7 bytes inside data symbol "global_i2"
==15516==
==15516== Uninitialised byte(s) found during client check request
==15516== at 0x400633: croak (varinfo1.c:28)
==15516== by 0x4006BC: main (varinfo1.c:56)
==15516== Address 0x7fefffefc is on thread 1's stack
--read-var-info=yes:
==15522== Uninitialised byte(s) found during client check request
==15522== at 0x400633: croak (varinfo1.c:28)
==15522== by 0x4006B2: main (varinfo1.c:55)
==15522== Location 0x60103b is 0 bytes inside global_i2[7],
==15522== a global variable declared at varinfo1.c:41
==15522==
==15522== Uninitialised byte(s) found during client check request
==15522== at 0x400633: croak (varinfo1.c:28)
==15522== by 0x4006BC: main (varinfo1.c:56)
==15522== Location 0x7fefffefc is 0 bytes inside local var "local"
==15522== declared at varinfo1.c:46, in frame #1 of thread 1
2) Use of uninitialized values
--track-origins=yes 可以得到更为详细的错误信息(特别针对使用未初始化的变量时)
3) Use of uninitialized or unaddressable values in system calls
Code example:
int main( void )
{
char* arr = malloc(10);
int* arr2 = malloc(sizeof(int));
write( 1 /* stdout */, arr, 10 );
exit(arr2[0]);
}
Valgrind给出的信息:
Syscall param write(buf) points to uninitialised byte(s)
at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
at 0x259852B0: malloc (vg_replace_malloc.c:130)
by 0x80483F1: main (a.c:5)
Syscall param exit(error_code) contains uninitialised byte(s)
at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
by 0x8048426: main (a.c:8)
4) Illegal frees
Valgrind将会跟踪program通过malloc/new分配的内存,因此他可以确切的指导当前的free/delete操作是否合法。以下是重复free的一个例子。
Invalid free()
at 0x4004FFDF: free (vg_clientmalloc.c:577)
by 0x80484C7: main (tests/doublefree.c:10)
Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
at 0x4004FFDF: free (vg_clientmalloc.c:577)
by 0x80484C7: main (tests/doublefree.c:10)
5) When a heap block is freed with an inappropriate deallocation function
下面是一个用new[]分配但是用free释放的code example
Mismatched free() / delete / delete []
at 0x40043249: free (vg_clientfuncs.c:171)
by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
at 0x4004318C: operator new[](unsigned int) (vg_clientfuncs.c:152)
by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
6)Overlapping source and destination blocks
==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
==27492== at 0x40026CDC: memcpy (mc_replace_strmem.c:71)
==27492== by 0x804865A: main (overlap.c:40)
7)Memory leak detection
Pointer chain AAA Category BBB Category
------------- ------------ ------------
(1) RRR ------------> BBB DR
(2) RRR ---> AAA ---> BBB DR IR
(3) RRR BBB DL
(4) RRR AAA ---> BBB DL IL
(5) RRR ------?-----> BBB (y)DR, (n)DL
(6) RRR ---> AAA -?-> BBB DR (y)IR, (n)DL
(7) RRR -?-> AAA ---> BBB (y)DR, (n)DL (y)IR, (n)IL
(8) RRR -?-> AAA -?-> BBB (y)DR, (n)DL (y,y)IR, (n,y)IL, (_,n)DL
(9) RRR AAA -?-> BBB DL (y)IL, (n)DL
Pointer chain legend:
- RRR: a root set node or DR block
- AAA, BBB: heap blocks
- --->: a start-pointer
- -?->: an interior-pointer
Category legend:
- DR: Directly reachable
- IR: Indirectly reachable
- DL: Directly lost
- IL: Indirectly lost
- (y)XY: it's XY if the interior-pointer is a real pointer
- (n)XY: it's XY if the interior-pointer is not a real pointer
- (_)XY: it's XY in either case
--show-reachable=yes如果这样设置,cases 1,2,4 and 9 才会被定位。
--leak-check=full,memcheck将针对definitely lost or probably lost给出更为详细的信息,甚至包括分配的地点。
2. Memcheck命令行选项:
1) --leak-check=<no|summary|yes|full> [default: summary]
如果是summary,则只是给出最后leak的汇总,如果是yes或者是full的话,将会给出比较详细的leak信息。
2) --leak-resolution=<low|med|high> [default: high]
用于合并leak信息来源的backtraces,如果low,当有两层匹配的时候就可以合并,mid是四层,high要求必须完全比配。该选项不会影响Memcheck查找leak的能力,只会影响结果的显示方式。
3) --show-reachable=<yes|no> [default: no]
如果设置为yes,将获取全部的内存分配状况。
4) --track-origins=<yes|no> [default: no]
如果设置为yes,对于为初始化的变量(from heap or stack)能够非常好的定位到错误源,但是这也将导致程序的整体运行速度变慢和更大的内存开销。
5) --freelist-vol=<number> [default: 20000000]
<number>是按照byte来计算的,这个数字越大,检测出对已释放内存的无效访问的可能性越高。
Cachegrind:
Cachegrind通过模拟cpu的1,3级缓存,收集应用程序运行时关于cpu的一些统计数据,最后在将明细数据和汇总信息打印出来。
1. 以下是cpu统计数据的一些术语缩写:
I cache reads (Ir, which equals the number of instructions executed), I1 cache read misses (I1mr) and LL cache instruction read misses (ILmr).
D cache reads (Dr, which equals the number of memory reads), D1 cache read misses (D1mr), and LL cache data read misses (DLmr).
D cache writes (Dw, which equals the number of memory writes), D1 cache write misses (D1mw), and LL cache data write misses (DLmw).
Conditional branches executed (Bc) and conditional branches mispredicted (Bcm).
Indirect branches executed (Bi) and indirect branches mispredicted (Bim).
Note that D1 total accesses is given by D1mr + D1mw, and that LL total accesses is given by ILmr + DLmr + DLmw.
2. 执行方式:
valgrind --tool=cachegrind your_application
以下为程序输出的统计信息:
==31751== I refs: 27,742,716
==31751== I1 misses: 276
==31751== LLi misses: 275
==31751== I1 miss rate: 0.0%
==31751== LLi miss rate: 0.0%
==31751==
==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)
==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)
==31751== LLd misses: 23,085 ( 3,987 rd + 19,098 wr)
==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)
==31751== LLd miss rate: 0.1% ( 0.0% + 0.4%)
==31751==
==31751== LL misses: 23,360 ( 4,262 rd + 19,098 wr)
==31751== LL miss rate: 0.0% ( 0.0% + 0.4%)
cachegrind的结果也会以输出文件的方式输出更多的细节,输出文件的缺省文件名是cachegrind.out.<pid>,其中<pid>是当前进程的pid。该文件名可以通过--cachegrind-out-file选择指定更可读的文件名,这个文件将会成为cg_annotate的输入。
3. cg_annotate:
cg_annotate <filename>
以下为cg_annotate执行后的统计信息的输出:
I1 cache: 65536 B, 64 B, 2-way associative
D1 cache: 65536 B, 64 B, 2-way associative
LL cache: 262144 B, 64 B, 8-way associative
Command: concord vg_to_ucode.c
Events recorded: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Events shown: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Threshold: 99%
Chosen for annotation:
Auto-annotation: off
以下为cg_annotate执行后的明细信息的输出(function by function):
--------------------------------------------------------------------------------
Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw file:function
--------------------------------------------------------------------------------
8,821,482 5 5 2,242,702 1,621 73 1,794,230 0 0 getc.c:_IO_getc
5,222,023 4 4 2,276,334 16 12 875,959 1 1 concord.c:get_word
2,649,248 2 2 1,344,810 7,326 1,385 . . . vg_main.c:strcmp
2,521,927 2 2 591,215 0 0 179,398 0 0 concord.c:hash
2,242,740 2 2 1,046,612 568 22 448,548 0 0 ctype.c:tolower
1,496,937 4 4 630,874 9,000 1,400 279,388 0 0 concord.c:insert
897,991 51 51 897,831 95 30 62 1 1 ???:???
598,068 1 1 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__flockfile
598,068 0 0 299,034 0 0 149,517 0 0 ../sysdeps/generic/lockfile.c:__funlockfile
598,024 4 4 213,580 35 16 149,506 0 0 vg_clientmalloc.c:malloc
446,587 1 1 215,973 2,167 430 129,948 14,057 13,957 concord.c:add_existing
341,760 2 2 128,160 0 0 128,160 0 0 vg_clientmalloc.c:vg_trap_here_WRAPPER
320,782 4 4 150,711 276 0 56,027 53 53 concord.c:init_hash_table
298,998 1 1 106,785 0 0 64,071 1 1 concord.c:create
149,518 0 0 149,516 0 0 1 0 0 ???:tolower@@GLIBC_2.0
149,518 0 0 149,516 0 0 1 0 0 ???:fgetc@@GLIBC_2.0
95,983 4 4 38,031 0 0 34,409 3,152 3,150 concord.c:new_word_node
85,440 0 0 42,720 0 0 21,360 0 0 vg_clientmalloc.c:vg_bogus_epilogue
注:以上数据中,如果某个column的value为dot,则意味着这个event在这个函数中没有发生。如果函数名中包含???:???,则不能从debug info中确定文件名,如果程序在编译的时候没有-g选项,将会有大量的这种未知信息。
4. line by line 计算:
cg_annotate <filename> concord.c,将输出concord.c基于line的统计数据,如下:
--------------------------------------------------------------------------------
-- User-annotated source: concord.c
--------------------------------------------------------------------------------
Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
. . . . . . . . . void init_hash_table(char *file_name, Word_Node *table[])
3 1 1 . . . 1 0 0 {
. . . . . . . . . FILE *file_ptr;
. . . . . . . . . Word_Info *data;
1 0 0 . . . 1 1 1 int line = 1, i;
. . . . . . . . .
5 0 0 . . . 3 0 0 data = (Word_Info *) create(sizeof(Word_Info));
. . . . . . . . .
4,991 0 0 1,995 0 0 998 0 0 for (i = 0; i < TABLE_SIZE; i++)
3,988 1 1 1,994 0 0 997 53 52 table[i] = NULL;
. . . . . . . . .
. . . . . . . . . /* Open file, check it. */
6 0 0 1 0 0 4 0 0 file_ptr = fopen(file_name, "r");
2 0 0 1 0 0 . . . if (!(file_ptr)) {
. . . . . . . . . fprintf(stderr, "Couldn't open '%s'.\n", file_name);
1 1 1 . . . . . . exit(EXIT_FAILURE);
. . . . . . . . . }
. . . . . . . . .
165,062 1 1 73,360 0 0 91,700 0 0 while ((line = get_word(data, line, file_ptr)) != EOF)
146,712 0 0 73,356 0 0 73,356 0 0 insert(data->;word, data->line, table);
. . . . . . . . .
4 0 0 1 0 0 2 0 0 free(data);
4 0 0 1 0 0 2 0 0 fclose(file_ptr);
3 0 0 2 0 0 . . . }
5. cg_diff file1 file2
用于比较两个输入文件的差异,这个工具可以用于在测试某个功能的性能,然后做出一些修改,然后比较前后的差异。
6. Cachegrind命令行选项:
--cache-sim=no|yes [yes]
指定是否收集cache accesses和miss counts
--branch-sim=no|yes [no]
指定是否收集branch instruction和misprediction counts
7. cg_annotate命令行选项:
--show=A,B,C [default: all, using order in cachegrind.out.<pid>]
指定需要显示的events columns,如(--show=D1mr,DLmr) or (--show=DLmr,DLmw)
--sort=A,B,C [default: order in cachegrind.out.<pid>]
指定function-by-function明细中排序需要基于的事件
--threshold=X [default: 0.1%]
对输出的数据进行过滤,只要超过该阈值的明细信息才会被数据。
Sets the threshold for the function-by-function summary. A function is shown if it accounts for more than X% of the counts for the primary sort event. If auto-annotating, also affects which files are annotated.
Note: thresholds can be set for more than one of the events by appending any events for the --sort option with a colon and a number (no spaces, though). E.g. if you want to see each function that covers more than 1% of LL read misses or 1% of LL write misses, use this option:
--sort=DLmr:1,DLmw:1
--auto=<no|yes> [default: no]
When enabled, automatically annotates every file that is mentioned in the function-by-function summary that can be found. Also gives a list of those that couldn't be found.
--context=N [default: 8]
Print N lines of context before and after each annotated line. Avoids printing large sections of source files that were not executed. Use a large number (e.g. 100000) to show all source lines.
-I<dir> --include=<dir> [default: none]
指定source file的搜索路径,可以通过多个-I/--include来指定更多的目录。
Callgrind:
1. 精确诊断部分代码片段:
--instr-atstart=no 在程序启动的时候将该选项设置为no, 这样程序就不会收集这些测试信息。当你准备开始测量你需要测量的代码片段时,再在另外的终端窗口中执行该命令 callgrind_control -i on 如果想要完成精确的测量,需要在该测量代码片段的前面定义该宏CALLGRIND_START_INSTRUMENTATION,在其后再定义CALLGRIND_STOP_INSTRUMENTATION。
2. 通过callgrind_control来dump指定函数的统计信息:
--dump-before=function:在进入该函数之前dump统计信息到文件;
--dump-after=function:在离开该函数之后dump统计信息到文件;
--zero-before=function:在进入该函数之前用0重置所有的计数器,在代码中添加该宏CALLGRIND_ZERO_STATS,可以更加精确的重置计数器为0.
以上选项可以被多次使用,以便指定多个函数。
3. Callgrind --cache-sim=yes 通过将该选项置为yes,可以模拟cache的行为,从而得到更多的关于cache的统计数据。
Callgrind --branch-sim=yes 通过将该选项置为yes,可以得到更多像低效的switch语句带来的性能问题。
4. Callgrind命令行选项:
1) --callgrind-out-file=<file>
指定profile data的输出文件,而不是缺省命名规则生成的文件。
2) --dump-line=<no|yes> [default: yes]
事件计数将以source line作为统计的粒度,但是要求源程序在编译的时候加入-g选项。
3) --collect-systime=<no|yes> [default: no]
This specifies whether information for system call times should be collected.
5. callgrind_annotate命令行选项:(大部分选项和cg_annotate相同,以下两个选项为callgrind_annotate独有)
1) --inclusive=<yes|no> [default: no]
在计算cost的时候,将callee的成本合并到caller的成本中。
2) --tree=<none|caller|calling|both> [default: none]
Print for each function their callers, the called functions or both.
Helgrind:
1. --track-lockorders=no|yes [default: yes]
是否在程序运行的过程中检测lock的加锁顺序,如果暂时不关心此类问题,可以考虑暂时关闭他
2. --read-var-info=yes
可以给出比较详细的变量声明地址
环境:ubuntu11.10
编译器:hisiv200的gcc版本4.4.1
代码:valgrind版本3.7
//============================================================
hisiv200 的工具目录在 /home/tomren/hisi-linux
注意设置工具链的时候使用绝对路径!不要使用相对路径,我开始使用~/hisi-linux/...会提示找不到编译器。
(我当时没注意它找不到编译器的提示,因为我反复执行单条命令是可以编译成功文件的)
命令如下:
CC='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-gcc' LD='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-ld' CXX='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-c++' AR='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-ar' CFLAGS="-I/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/arm-hisiv200-linux-gnueabi/target/usr/include/" LDFLAGS="-L/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/arm-hisiv200-linux-gnueabi/target/usr/lib" ./configure --host=armv7-a-linux --build=i686-pc-linux-gnu --prefix=/home/tomren/valgrind
下面这个只是为了看着工整点:
1 CC='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-gcc'
2 LD='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-ld'
3 CXX='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-c++'
4 AR='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-ar'
5 CFLAGS="-I/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/arm-hisiv200-linux-gnueabi/target/usr/include/"
6 LDFLAGS="-L/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/arm-hisiv200-linux-gnueabi/target/usr/lib"
7 ./configure --host=armv7-a-linux --build=i686-pc-linux-gnu --prefix=/home/tomren/valgrind
--host 是编译后运行的平台
--build 是编译器运行的平台
--prefix 是执行 make install 后安装的目录
如果使用的 CFLAGS 和 LDFLAGS 如果有多个路径,使用冒号“:”分割。
//============================================================
感觉编译成功也是运气吧,一点点试出来的。
如果把时间都算下来,我差不多试了2天多些,希望能帮助要在相同芯片上使用valgrind的朋友。
//============================================================
在这个正确编译版本之前,我还试了下面这个命令,因为我们的CodeBlocks下的CFLAGS设置了这些参数,
但我在这里也同样设置时编译过程中报错了,我并不知道原因,去掉后就正常编译过了。
CC='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-gcc' LD='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-ld' CXX='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-c++' AR='/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-ar' CFLAGS="-fpic -shared -fvisibility=hidden -marm -mcpu=cortex-a9 -mfpu=vfp -mfloat-abi=softfp -march=armv7-a -mtune=cortex-a9 -fsingle-precision-constant -fsigned-char -I/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/arm-hisiv200-linux-gnueabi/target/usr/include/:/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/arm-hisiv200-linux-gnueabi/include/c++/4.4.1" LDFLAGS="-L/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/arm-hisiv200-linux-gnueabi/target/usr/lib:/home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/arm-hisiv200-linux-gnueabi/lib" ./configure --host=armv7-a-linux --build=i686-pc-linux-gnu --prefix=/home/tomren/valgrind
//============================================================
此外我还修改了configure文件中的一处地方
我在
gcc_version=`${CC} --version \
| head -n 1 \
| $SED 's/i686-apple-darwin10//' \
| $SED 's/i686-apple-darwin11//' \
| $SED 's/^[^0-9]*\([0-9.]*\).*$/\1/'`
这句话的地方添了下面这句话,对gcc_version又强制赋值了一次。
gcc_version=4.4.1
gcc_version中的4.4.1是通过目标编译器的gcc --version得到的
之所以添上这句话,是因为通过第一条命令的出的gcc_version变量在后面会有问题,后面打印出来是200
(我输出过gcc_version是4.4.1,我并不知道后面使用时为什么是200)
所以默认情况下会出错,得到:
checking dependency style of /home/tomren/hisi-linux/x86-arm/arm-hisiv200-linux/bin/arm-hisiv200-linux-gnueabi-gcc... gcc3
checking for diff -u... yes
checking for a supported version of gcc... no (200)
configure: error: please use gcc >= 3.0 or clang >= 2.9
//============================================================
End :)