即上一篇ftrace(一)简介了ftrace对应的debug目录下的各个文件的用途,这一篇主要介绍可以配置几个常用的Tracers。
function
用于trace内核中的所有functions
function_graph
和function tracer类似,只是function graph以一种更加容易查看的方式来呈现函数调用关系。
类似与C代码的编写风格。
irqsoff
Trace关闭中断期间这段时间执行的代码,并且保存关闭中断的最大时间到tracing_max_latency中。
一般用来debug系统延迟,最好是使能latency-format option更加方便的查看trace信息。
实例:
# echo 0 > options/function-trace
# echo irqsoff > current_tracer
# echo 1 > tracing_on
# echo 0 > tracing_max_latency
# ls -ltr
[...]
# echo 0 > tracing_on
# cat trace
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 16 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: run_timer_softirq
# => ended at: run_timer_softirq
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
-0 0d.s2 0us+: _raw_spin_lock_irq <-run_timer_softirq
-0 0dNs3 17us : _raw_spin_unlock_irq <-run_timer_softirq
-0 0dNs3 17us+: trace_hardirqs_on <-run_timer_softirq
-0 0dNs3 25us :
=> _raw_spin_unlock_irq
=> run_timer_softirq
=> __do_softirq
=> call_softirq
=> do_softirq
=> irq_exit
=> smp_apic_timer_interrupt
=> apic_timer_interrupt
=> rcu_idle_exit
=> cpu_idle
=> rest_init
=> start_kernel
preemptoff
类似irqoff,主要是trace关闭抢占功能期间执行的代码。
实例:
# echo 0 > options/function-trace
# echo preemptoff > current_tracer
# echo 1 > tracing_on
# echo 0 > tracing_max_latency
# ls -ltr
[...]
# echo 0 > tracing_on
# cat trace
# tracer: preemptoff
#
# preemptoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 46 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: sshd-1991 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: do_IRQ
# => ended at: do_IRQ
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
sshd-1991 1d.h. 0us+: irq_enter <-do_IRQ
sshd-1991 1d..1 46us : irq_exit <-do_IRQ
sshd-1991 1d..1 47us+: trace_preempt_on <-do_IRQ
sshd-1991 1d..1 52us :
=> sub_preempt_count
=> irq_exit
=> do_IRQ
=> ret_from_intr
preemptirqsoff
和上面类似,trace irqsoff+preemptoff期间执行的代码。并记录时间。
实例:
# echo 0 > options/function-trace
# echo preemptirqsoff > current_tracer
# echo 1 > tracing_on
# echo 0 > tracing_max_latency
# ls -ltr
[...]
# echo 0 > tracing_on
# cat trace
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 100 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: ls-2230 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: ata_scsi_queuecmd
# => ended at: ata_scsi_queuecmd
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
ls-2230 3d... 0us+: _raw_spin_lock_irqsave <-ata_scsi_queuecmd
ls-2230 3...1 100us : _raw_spin_unlock_irqrestore <-ata_scsi_queuecmd
ls-2230 3...1 101us+: trace_preempt_on <-ata_scsi_queuecmd
ls-2230 3...1 111us :
=> sub_preempt_count
=> _raw_spin_unlock_irqrestore
=> ata_scsi_queuecmd
=> scsi_dispatch_cmd
=> scsi_request_fn
=> __blk_run_queue_uncond
=> __blk_run_queue
=> blk_queue_bio
=> generic_make_request
=> submit_bio
=> submit_bh
=> ext3_bread
=> ext3_dir_bread
=> htree_dirblock_to_tree
wakeup
trace并record 任务(最高优先级别)从wakeup函数到实际意义上的wakeup之间的最大延迟时间
实例:
# echo 0 > options/function-trace
# echo wakeup > current_tracer
# echo 1 > tracing_on
# echo 0 > tracing_max_latency
# chrt -f 5 sleep 1
# echo 0 > tracing_on
# cat trace
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 15 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: kworker/3:1H-312 (uid:0 nice:-20 policy:0 rt_prio:0)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
-0 3dNs7 0us : 0:120:R + [003] 312:100:R kworker/3:1H
-0 3dNs7 1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up
-0 3d..3 15us : __schedule <-schedule
-0 3d..3 15us : 0:120:R ==> [003] 312:100:R kworker/3:1H
wakeup_rt
trace并record 任务(RT tasks)从wakeup函数到实际意义上的wakeup之间的最大延迟时间
实例:
# echo 0 > options/function-trace
# echo wakeup_rt > current_tracer
# echo 1 > tracing_on
# echo 0 > tracing_max_latency
# chrt -f 5 sleep 1
# echo 0 > tracing_on
# cat trace
# tracer: wakeup
#
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 5 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: sleep-2389 (uid:0 nice:0 policy:1 rt_prio:5)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
-0 3d.h4 0us : 0:120:R + [003] 2389: 94:R sleep
-0 3d.h4 1us+: ttwu_do_activate.constprop.87 <-try_to_wake_up
-0 3d..3 5us : __schedule <-schedule
-0 3d..3 5us : 0:120:R ==> [003] 2389: 94:R sleep
nop
“trace nothing” tracer
上面配置的latency tracer,都可以配置对应的echo 1 > options/function-trace 来使能function trace的输出,只是为了防止此操作带来的延迟影响,我们一般都会选择关闭。
除了上面的tracer以外,还有一个很重要的功能就是event trace,从 2.6.30 开始,ftrace 支持 event tracer。这个并不是通过配置current_tracer来设置,而是通过/sys/kernel/debug/tracing/events目录来配置的。
当我们想要debug延迟的时候,function trace本身可能就会增加系统延时,此时我们可以禁止function trace,而利用event trace来debug,由此会降低trace功能引入的延迟。
这算是一个折中的方案。
举例说明:
当我们发现系统有延迟问题时,我们想要去debug此问题,首先想到使用wakeup_rt tracer去trace该问题。trace log内容如下:
/sys/kernel/debug/tracing # echo wakeup_rt > current_tracer
/sys/kernel/debug/tracing # echo 0 > options/function-trace
/sys/kernel/debug/tracing # echo 1 > tracing_on
/sys/kernel/debug/tracing # echo 0 > tracing_max_latency
/sys/kernel/debug/tracing # chrt -f 5 sleep 1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat trace
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 4.0.0
# --------------------------------------------------------------------
# latency: 271 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
# -----------------
# | task: watchdog/0-11 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
-0 0dnh4 7us+: 0:120:R + [000] 11: 0:R watchdog/0
-0 0dnh4 34us!: ttwu_do_activate.constprop.98 <-try_to_wake_up
-0 0d..3 248us+: __schedule <-schedule
-0 0d..3 265us : 0:120:R ==> [000] 11: 0:R watchdog/0
虽然这个可以找到对应时间,并且只有wake up调用到schedule之间的时间信息,因为我们关闭了function-trace option,所以没有其他function信息打印出来,如果我们这里使能了
function-trace,那么由此又会引入很大的延迟,所以这个方式不可取,但是没有function信息我们又很难定位到底哪里引起的延迟问题。此时event trace就派上用场了。
/sys/kernel/debug/tracing # echo wakeup_rt > current_tracer
/sys/kernel/debug/tracing # echo 0 > options/function-trace
/sys/kernel/debug/tracing # echo 1 > events/enable
/sys/kernel/debug/tracing # echo 1 > tracing_on
/sys/kernel/debug/tracing # echo 0 > tracing_max_latency
/sys/kernel/debug/tracing # chrt -f 5 sleep 1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat trace
# tracer: wakeup_rt
#
# wakeup_rt latency trace v1.1.5 on 4.0.0
# --------------------------------------------------------------------
# latency: 772 us, #12/12, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
# -----------------
# | task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
-0 1dnh4 11us+: 0:120:R + [001] 12: 0:R watchdog/1
-0 1dnh4 60us+: ttwu_do_activate.constprop.98 <-try_to_wake_up
-0 1dnh4 90us+: sched_wakeup: comm=watchdog/1 pid=12 prio=0 success=1 target_cpu=001
-0 1dnh2 156us+: hrtimer_expire_exit: hrtimer=ffff80007efd8068
-0 1dnh3 186us+: hrtimer_start: hrtimer=ffff80007efd8068 function=watchdog_timer_fn expires=21160150000000 softexpires=21160150000000
-0 1dnh2 280us!: irq_handler_exit: irq=3 ret=handled
-0 1dn.3 493us+: hrtimer_cancel: hrtimer=ffff80007efd7f30
-0 1dn.3 545us+: hrtimer_start: hrtimer=ffff80007efd7f30 function=tick_sched_timer expires=21156160000000 softexpires=21156160000000
-0 1.n.2 609us+: rcu_utilization: Start context switch
-0 1.n.2 673us+: rcu_utilization: End context switch
-0 1d..3 748us+: __schedule <-schedule
-0 1d..3 758us : 0:120:R ==> [001] 12: 0:R watchdog/1
可以看到除了wakeup_rt相关的log外,还会多出很多event相关的信息,更加方便于我们去定位问题所在.
这是ftrace的有一个功能,由于kernel有一个固定大小的stack,如果一个内核开发者在使用中不注意这个,很容易会导致stack overflow,这会引起内核panic。
所以这个功能就是为了方便于调试stack相关的信息,会对应把每个function运行时占用的stack大小打印出来。
通过CONFIG_STACK_TRACER 来使能内核的stack trace功能。
使用实例:
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
After running it for a few minutes
# cat stack_max_size
2928
# cat stack_trace
Depth Size Location (18 entries)
----- ---- --------
0) 2928 224 update_sd_lb_stats+0xbc/0x4ac
1) 2704 160 find_busiest_group+0x31/0x1f1
2) 2544 256 load_balance+0xd9/0x662
3) 2288 80 idle_balance+0xbb/0x130
4) 2208 128 __schedule+0x26e/0x5b9
5) 2080 16 schedule+0x64/0x66
6) 2064 128 schedule_timeout+0x34/0xe0
7) 1936 112 wait_for_common+0x97/0xf1
8) 1824 16 wait_for_completion+0x1d/0x1f
9) 1808 128 flush_work+0xfe/0x119
10) 1680 16 tty_flush_to_ldisc+0x1e/0x20
11) 1664 48 input_available_p+0x1d/0x5c
12) 1616 48 n_tty_poll+0x6d/0x134
13) 1568 64 tty_poll+0x64/0x7f
14) 1504 880 do_select+0x31e/0x511
15) 624 400 core_sys_select+0x177/0x216
16) 224 96 sys_select+0x91/0xb9
17) 128 128 system_call_fastpath+0x16/0x1b
(完)
参考文档:内核文档 Documentation/trace/ftrace.txt