SystemTap Errors Introduce

SystemTap的常见错误大致可以分为两类.
一, 解析和语义阶段产生的错误
这类错误发生在systemtap解析stp脚本以及转换成C代码的阶段.
错误举例
1. 语义错误, 错误表现

parse error: expected foo, saw bar

例如, 缺失handler部分, 导致语义错误.

[root@db-172-16-3-150 share]# stap -e 'probe vfs.read                                
probe vfs.write'
parse error: expected one of '. , ( ? ! { = +='
        saw: keyword at :2:1
     source: probe vfs.write
             ^
parse error: expected one of '. , ( ? ! { = +='
        saw: EOF
2 parse errors.
Pass 1: parse failed.  [man error::pass1]

补充handler即可修正错误 : 

[root@db-172-16-3-150 share]# stap -e 'probe vfs.read {}                                
probe vfs.write {}'


2. 权限错误

parse error: embedded code in unprivileged script

例如, 在代码中使用了%{ embedded C code }%, 但是未使用stap -g选项会导致这个错误.

[root@db-172-16-3-150 share]# stap -e '   
function square:long (i:long) %{
  STAP_RETVALUE = STAP_ARG_i * STAP_ARG_i;
%}
probe begin {
  i=square(9)
  println(i)
  exit()
}'
parse error: embedded code in unprivileged script; need stap -g
        saw: embedded-code at :2:31
     source: function square:long (i:long) %{
                                           ^
1 parse error.
Pass 1: parse failed.  [man error::pass1]

使用-g选项修正错误.

[root@db-172-16-3-150 share]# stap -g -e '
function square:long (i:long) %{
  STAP_RETVALUE = STAP_ARG_i * STAP_ARG_i;
%}
probe begin {
  i=square(9)
  println(i)
  exit()
}'
81


3. 类型匹配错误

semantic error: type mismatch for identifier 'foo' ... string vs. long

例如 : 

[root@db-172-16-3-150 share]# stap -e '
probe begin {
  a = 10
  a = execname()
  println("a is:", a)
  exit()
}'
semantic error: type mismatch (long vs. string): identifier 'a' at :3:3
        source:   a = 10
                  ^
semantic error: type was first inferred here (string): identifier 'a' at :3:3
        source:   a = 10
                  ^
Pass 2: analysis failed.  [man error::pass2]

a开始=10, 是long类型, 后来又赋值execname(), 是string, 所以发生了不匹配的错误.
使用一致的类型修正即可.

[root@db-172-16-3-150 share]# stap -e '
probe begin {
  a = 10
  a = pid()     
  println("a is:", a)
  exit()
}'
a is:23014


4. 不能推测出变量的类型时, 会报这个错误.

semantic error: unresolved type for identifier 'foo'

例如, 在printf函数中使用了一个未初始化的变量.

[root@db-172-16-3-150 share]# stap -e '
probe begin {
  println("v is:", v)
  exit()
}'
WARNING: never-assigned local variable 'v' : identifier 'v' at :3:20
 source:   println("v is:", v)
                            ^
semantic error: unresolved type : identifier 'v' at :3:20
        source:   println("v is:", v)
                                   ^
semantic error: unresolved type : identifier 'println' at :3:3
        source:   println("v is:", v)
                  ^
Pass 2: analysis failed.  [man error::pass2]

变量初始化即可解决 : 

[root@db-172-16-3-150 share]# stap -e '
probe begin {
  v = 100
  println("v is:", v)
  exit()
}'
v is:100


5. 当赋值对象不是一个有效的变量或数组元素时, 会报如下错误.

semantic error: Expecting symbol or array index expression.

例如 : 

[root@db-172-16-3-150 share]# stap -e '
probe begin {
  println("hello") = 1
  exit()
}'
semantic error: Expecting symbol or array index expression: identifier 'println' at :3:3
        source:   println("hello") = 1
                  ^
Pass 2: analysis failed.  [man error::pass2]


6. 调用函数时, 传入的参数个数和函数参数个数不匹配.
或者是数组的索引个数不匹配时报错.

while searching for arity N function, semantic error: unresolved function call

例如 : 
函数参数个数不匹配

[root@db-172-16-3-150 share]# stap -e '
function add:long (a:long, b:long) {
  return a+b
}
global arr
probe begin {
  println("add(10): ", add(10))
  exit()
}'
WARNING: mismatched arity-2 function found: identifier 'add' at :2:10
 source: function add:long (a:long, b:long) {
                  ^
semantic error: unresolved arity-1 function: identifier 'add' at :7:24
        source:   println("add(10): ", add(10))
                                       ^
Pass 2: analysis failed.  [man error::pass2]

数组索引个数不匹配

[root@db-172-16-3-150 share]# stap -e '
global arr
probe begin {
  arr[1,2,3]="hello"
  println("arr: ", arr[1,2])
  exit()
}'
semantic error: inconsistent arity (3 vs 2): identifier 'arr' at :5:20
        source:   println("arr: ", arr[1,2])
                                   ^
semantic error: arity 3 first inferred here: identifier 'arr' at :4:3
        source:   arr[1,2,3]="hello"
                  ^
Pass 2: analysis failed.  [man error::pass2]


7. 当数组变量未定义为全局变量时报错,

semantic error: array locals not supported, missing global declaration?

例如 : 

[root@db-172-16-3-150 share]# stap -e '
probe begin {
  arr[1,2]= "hello"
  exit()
}'
semantic error: unresolved arity-2 global array arr, missing global declaration?: identifier 'arr' at :3:3
        source:   arr[1,2]= "hello"
                  ^
Pass 2: analysis failed.  [man error::pass2]


8. 在foreach中, 不允许修改数组的值, 否则会报错. 这样的限制是为了提高stap 一个handler的运行速度. 减少带来的性能问题.

semantic error: variable foo modi?ed during foreach iteration

例如 : 

[root@db-172-16-3-150 share]# stap -e '
global arr
probe begin {
  arr[1]="a"
  arr[2]="b"
  foreach(idx in arr) 
    arr[idx]="new"
  exit()
}'
semantic error: variable 'arr' modified during 'foreach' iteration: identifier 'arr' at :7:5
        source:     arr[idx]="new"
                    ^
Pass 2: analysis failed.  [man error::pass2]


9. 当event不存在或者在tapset库中无法找到时, 会报如下错误

semantic error: probe point mismatch at position N, while resolving probe point foo

例如 : 

[root@db-172-16-3-150 share]# stap -e '
probe test {
}'
semantic error: while resolving probe point: identifier 'test' at :2:7
        source: probe test {
                      ^
semantic error: probe point mismatch  (alternatives: __nd_syscall __nfs __scheduler __signal __tcpmib __vm _linuxmib _nfs _signal _sunrpc _syscall _vfs begin begin(number) end end(number) error error(number) generic ioblock ioblock_trace ioscheduler ioscheduler_trace ipmib irq_handler java(number) java(string) kernel kprobe kprocess linuxmib module(string) nd_syscall netdev netfilter never nfs nfsd perf process process(number) process(string) procfs procfs(string) scheduler scsi signal socket softirq stap staprun sunrpc syscall tcp tcpmib timer tty udp vfs vm workqueue): identifier 'test' at :2:7
        source: probe test {
                      ^
Pass 2: analysis failed.  [man error::pass2]


10. 当探针中的函数不存在时, 报如下错误. 例如kernel.function("test"), test函数不存在.

semantic error: no match for probe point, while resolving probe point foo

例如 : 

[root@db-172-16-3-150 share]# stap -e '
probe kernel.function("test") {
}'
semantic error: while resolving probe point: identifier 'kernel' at :2:7
        source: probe kernel.function("test") {
                      ^
semantic error: no match (similar functions: bs, del, dget, dput, eat)
Pass 2: analysis failed.  [man error::pass2]


11. 在handler中获取探针处的上下文变量(target variables)的值时, 可能由于变量值不可获取(或变量不存在等)报错 : 

semantic error: unresolved target-symbol expression

例如 : 

[root@db-172-16-3-150 share]# stap -e '
probe vfs.read {
  println($$vars)
  exit()
}'
file=0xffff8818169bc140 buf=0x7fff453edb70 count=0x2004 pos=0xffff88141aa27f48 ret=?

读取一个不存在的target variable将报错 : 

[root@db-172-16-3-150 share]# stap -e '
probe vfs.read {
  println($abc)  
  exit()
}'
semantic error: unable to find local 'abc', [man error::dwarf] dieoffset 0x125bd59 in kernel, near pc 0xffffffff81181610 in vfs_read fs/read_write.c (alternatives: $file $buf $count $pos $ret): identifier '$abc' at :3:11
        source:   println($abc)
                          ^
Pass 2: analysis failed.  [man error::pass2]

或者该变量的地址中无法获得相应的值.

[root@db-172-16-3-150 share]# stap -e '
probe vfs.read {
  println($ret)
  exit()
}'
semantic error: not accessible at this address [man error::dwarf] (0xffffffff81181610, dieoffset: 0x125bdbd): identifier '$ret' at :3:11
        source:   println($ret)
                          ^
Pass 2: analysis failed.  [man error::pass2]

这个错误也可能是由于代码优化导致的.

This may be a result of compiler optimization of the generated code.


12. 当安装的kernel-debuginfo包和运行的kernel版本不一致, 或者需要探针对应的包的debuginfo但是对应的debuginfo包版本不一致时可能产生如下类型的错误.

semantic error: libdw? failure

例如 : 

[root@db-172-16-3-150 share]# uname -r 
2.6.32-358.el6.x86_64
[root@db-172-16-3-150 share]# rpm -qa|grep kernel-debuginfo
kernel-debuginfo-2.6.32-358.23.2.el6.centos.plus.x86_64
kernel-debuginfo-common-x86_64-2.6.32-358.23.2.el6.centos.plus.x86_64
[root@db-172-16-3-150 share]# stap -e '         
probe vfs.read {
  println($$vars)
  exit()
}'
semantic error: while resolving probe point: identifier 'kernel' at /opt/systemtap/share/systemtap/tapset/linux/vfs.stp:768:18
        source: probe vfs.read = kernel.function("vfs_read")
                                 ^
semantic error: missing x86_64 kernel/module debuginfo [man warning::debuginfo] under '/lib/modules/2.6.32-358.el6.x86_64/build'
semantic error: while resolving probe point: identifier 'vfs' at :2:7
        source: probe vfs.read {
                      ^
semantic error: no match
Pass 2: analysis failed.  [man error::pass2]

安装与kernel版本对应的kernel-debuginfo包即可.

[root@db-172-16-3-150 share]# yum install -y kernel-debuginfo-2.6.32-358.el6.x86_64

或者本文第13条中的例子中如果使用了不同版本的debuginfo, 也是会报类似错误.

rpm -ivh coreutils-debuginfo.x86_64 0:8.4-19.el6_4.2 
[root@db-172-16-3-150 share]# rpm -qa|grep coreutils coreutils-debuginfo-8.4-19.el6_4.2.x86_64 coreutils-libs-8.4-19.el6.x86_64 coreutils-8.4-19.el6.x86_64 policycoreutils-2.0.83-19.30.el6.x86_64
[root@db-172-16-3-150 share]# stap -d /bin/ls --ldd -e 'probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}' -c "ls /"
WARNING: cannot find module /bin/ls debuginfo: No DWARF information found [man warning::debuginfo]
semantic error: while resolving probe point: identifier 'process' at :1:7
        source: probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}
                      ^
semantic error: no match
Pass 2: analysis failed.  [man error::pass2]


13. 当需要探针对应的包的debuginfo时, 但是该包未安装. 会产生类似如下错误.

semantic error: cannot find foo debuginfo

例如 : 

[root@db-172-16-3-150 pg93]# stap -d /bin/ls --ldd -e 'probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}' -c "ls /"
WARNING: cannot find module /bin/ls debuginfo: No DWARF information found [man warning::debuginfo]
semantic error: while resolving probe point: identifier 'process' at :1:7
        source: probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}
                      ^
semantic error: no match
Pass 2: analysis failed.  [man error::pass2]

安装对应的debuginfo即可解决
查找/bin/ls所在的包名

[root@db-172-16-3-150 pg93]# rpm -qf /bin/ls
coreutils-8.4-19.el6.x86_64

安装coreutils对于的debuginfo包.

[root@db-172-16-3-150 pg93]# yum install -y coreutils-debuginfo-8.4-19.el6.x86_64


二, 生产模块后, 模块在内核中运行阶段产生的错误和警告.
这类错误发生在运行时, staprun通过模块与内核交互, 采集数据的阶段.
错误举例
1. 执行过程中产生了多少错误以及跳过了多少probe.

WARNING: Number of errors: N, skipped probes: M

例如

[root@db-172-16-3-150 share]# stap -e '
probe begin {
  error("1.error funn\n")
}
probe end {
  printf("2.end probe\n")
}
probe error {
  printf("3.error probe\n")
}'
ERROR: 1.error funn
3.error probe
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]


2. 除数为0时报错

division by 0

例如

[root@db-172-16-3-150 share]# stap -e '
probe begin {
  println(10/0)
  exit()
}'
ERROR: division by 0 near operator '/' at :3:13
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]


3. 当统计类型变量中没有元素, 但是使用了@count, @sum以外的操作符(avg, min, max)时, 会报如下错误

aggregate element not found

例如

[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
  println(@count(s))   
  exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at :2:8
 source: global s
                ^
0
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
  println(@sum(s))  
  exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at :2:8
 source: global s
                ^
0
avg, min, max报错
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
  println(@avg(s))
  exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at :2:8
 source: global s
                ^
ERROR: empty aggregate near identifier '@avg' at :4:11
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  Try again with another '--vp 00001' option.
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
  println(@min(s))
  exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at :2:8
 source: global s
                ^
ERROR: empty aggregate near identifier '@min' at :4:11
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  Try again with another '--vp 00001' option.
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
  println(@max(s))
  exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at :2:8
 source: global s
                ^
ERROR: empty aggregate near identifier '@max' at :4:11
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  Try again with another '--vp 00001' option.


4. 数组中包含的索引个数超出数组初始化的元素个数时, 报错

aggregation overflow
Array overflow

例如 : 

[root@db-172-16-3-150 share]# stap -e '
global arr[10]
probe timer.ms(1) {
  arr[gettimeofday_ms()] <<< gettimeofday_ms()
}
probe timer.s(1) {
  foreach (i in arr) {
    println(@count(arr[i]))
  }
}'
ERROR: Array overflow, check size limit (10) near identifier 'arr' at :4:3
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

解决办法, 使用-D MAXMAPENTRIES=n 指定更大的元素初始值, 或者使用global arr[n] 定义更大的初始值.

5. 函数嵌套调用次数超出限制

MAXNESTING exceeded

例如

[root@db-172-16-3-150 share]# stap -e '
> function fibonacci(i) {
>     if (i < 1) error ("bad number")
>     if (i == 1) return 1
>     if (i == 2) return 2
>     return fibonacci (i-1) + fibonacci (i-2)
> }
> probe begin {
>   println(fibonacci(10))
>   exit()
> }
> '
89
[root@db-172-16-3-150 share]# stap -e '
function fibonacci(i) {
    if (i < 1) error ("bad number")
    if (i == 1) return 1
    if (i == 2) return 2
    return fibonacci (i-1) + fibonacci (i-2)
}
probe begin {
  println(fibonacci(100))
  exit()
}
'
ERROR: MAXNESTING exceeded near identifier 'fibonacci' at :2:10
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

解决办法, 使用-D MAXNESTING=n指定更大的允许嵌套次数

6. 当handler执行的语句数超出限制时报错

MAXACTION exceeded

例如 : 

[root@db-172-16-3-150 share]# stap -e '
> probe begin {
>   for(i=0;i<10000;i++) {
>   }
>   exit()
> }'
ERROR: MAXACTION exceeded near keyword at :3:3
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

解决办法, 使用-D MAXACTION=n 提高限制数.

7. 当地址不存在, 或者其他原因导致获取制定地址信息错误.

kernel/user string copy fault at ADDR

例如 : 

[root@db-172-16-3-150 share]# stap -e '
> probe begin {
>   println(user_string(123))
>   exit()
> }'
ERROR: user string copy fault -14 at 000000000000007b near identifier 'user_string_n' at /opt/systemtap/share/systemtap/tapset/uconversions.stp:120:10
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
[root@db-172-16-3-150 share]# stap -e '
probe begin {
  println(kernel_string(123))
  exit()
}'
ERROR: kernel string copy fault at 0x000000000000007b near identifier 'kernel_string' at /opt/systemtap/share/systemtap/tapset/linux/conversions.stp:18:10
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]
[root@db-172-16-3-150 share]# stap -e '
probe begin {
  println(kernel_int(123))   
  exit()
}'
ERROR: kernel int copy fault at 0x000000000000007b near identifier 'kernel_int' at /opt/systemtap/share/systemtap/tapset/linux/conversions.stp:198:10
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]


8. 取消引用上下文指针变量时的报错.

pointer dereference fault
There was a fault encountered during a pointer dereference operation such as a target variable evaluation.


[参考]
1.  https://sourceware.org/systemtap/SystemTap_Beginners_Guide/errors.html
2.  https://sourceware.org/systemtap/SystemTap_Beginners_Guide/runtimeerror.html
3.  https://sourceware.org/systemtap/wiki/TipExhaustedResourceErrors

你可能感兴趣的:(systemtap)