接触了程序分析这么久,对程序分析依旧一知半解,导师建议我们可以学习下clang-static-analyzer(CSA),重点从编写自定义checker入手,于是踏上了clang的学习道路。在这方面如果我的blog里有错误欢迎大佬们指正。
CSA是clang的一部分,clang又是LLVM的一部分,因此学习CSA肯定离不开学习clang和LLVM。
这里不过多介绍llvm/clang,贴上一个blog。
在不同的语境下,LLVM有不同的含义(这里不全):
LLVM可以指LLVM基础架构,即一个完整编译器项目集合,包括前端,后端,优化器,汇编器,JIT引擎等。
LLVM还可以指基于LLVM构造的编译器:部分或完全使用LLVM构建的编译器。
LLVM后端,包含了代码优化与目标代码生成部分,与Clang组成一个完整的编译器。
LLVM项目。
Clang允许hook编译过程,并获得编译每个阶段生成的数据结构的详尽信息,包括AST,CFG。clang tools的一个应用是自动查找程序中的缺陷(defects),提供比编译器多得多的警告。比如clang-tidy工具通过检测程序中使用的语法来发现style问题和不安全或可能不可移植的结构。
CSA是一个查找程序缺陷的符号执行工具。实际程序的实际行为都取决于外部因素,例如输入值、随机数和库组件的行为,analyzer engine用符号值表示未知值,并基于这些符号值执行符号计算。它能检测出导致程序出错的符号值的条件。
因此,Clang Static Analyzer能够发现只发生在罕见程序路径上的深层错误。手动测试或自动测试套件可能错过了这些路径。在发现bug时,analyzer会绘制导致bug的整个路径,并在每个条件语句上显示跳转方向。
先贴上llvm的github网址,我用到的llvm是12.0.0版。采用的安装方式是直接下tar包解压的方式(github release链接,我下的是clang+llvm-12.0.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz
,编译安装实在是太麻烦了,下tar包多简单),安装并配置好环境变量之后可以用clang --version
查看版本,前面给出的blog中已有clang的使用示例,这里就跳过了。我的llvm部分目录如下(可作参考):
llvm
-- bin # 都是二进制PE文件
-- clang
-- clang++
-- clang-tidy
-- clang-format
-- clang-scan-deps
...
-- include # 都是文件夹
-- c++
-- clang
-- clang-c
-- clang-tidy
-- llvm
...
-- lib
...
-- libexec # Perl脚本
-- c++-analyzer
-- ccc-analyzer
-- share # 都是文件夹
-- clang
-- man
-- opt-viewer
-- scan-build
-- scan-view
这里介绍下clang的简单使用,参考前面贴上的博客。我用的示例代码main.c
如下
#include
int main(void){
int i, n;
scanf("%d",&n);
if (n > 10)
printf("hello world\n");
else
printf("no\n");
for (i = 0; i < 10; ++i){
n = i + 1;
n += 2;
}
return 0;
}
用clang -E -Xclang -dump-tokens main.c
命令进行词法分析,命令行输出(只截取main.c
中的内容)为
int 'int' [StartOfLine] Loc=<main.c:3:1>
identifier 'main' [LeadingSpace] Loc=<main.c:3:5>
l_paren '(' Loc=<main.c:3:9>
void 'void' Loc=<main.c:3:10>
r_paren ')' Loc=<main.c:3:14>
l_brace '{' Loc=<main.c:3:15>
int 'int' [StartOfLine] [LeadingSpace] Loc=<main.c:4:5>
identifier 'i' [LeadingSpace] Loc=<main.c:4:9>
comma ',' Loc=<main.c:4:10>
identifier 'n' [LeadingSpace] Loc=<main.c:4:12>
semi ';' Loc=<main.c:4:13>
identifier 'scanf' [StartOfLine] [LeadingSpace] Loc=<main.c:5:5>
l_paren '(' Loc=<main.c:5:10>
string_literal '"%d"' Loc=<main.c:5:11>
comma ',' Loc=<main.c:5:15>
amp '&' Loc=<main.c:5:16>
identifier 'n' Loc=<main.c:5:17>
r_paren ')' Loc=<main.c:5:18>
semi ';' Loc=<main.c:5:19>
if 'if' [StartOfLine] [LeadingSpace] Loc=<main.c:6:5>
l_paren '(' [LeadingSpace] Loc=<main.c:6:8>
identifier 'n' Loc=<main.c:6:9>
greater '>' [LeadingSpace] Loc=<main.c:6:11>
numeric_constant '10' [LeadingSpace] Loc=<main.c:6:13>
r_paren ')' Loc=<main.c:6:15>
identifier 'printf' [StartOfLine] [LeadingSpace] Loc=<main.c:7:9>
l_paren '(' Loc=<main.c:7:15>
string_literal '"hello world\n"' Loc=<main.c:7:16>
r_paren ')' Loc=<main.c:7:31>
semi ';' Loc=<main.c:7:32>
else 'else' [StartOfLine] [LeadingSpace] Loc=<main.c:8:5>
identifier 'printf' [StartOfLine] [LeadingSpace] Loc=<main.c:9:9>
l_paren '(' Loc=<main.c:9:15>
string_literal '"no\n"' Loc=<main.c:9:16>
r_paren ')' Loc=<main.c:9:22>
semi ';' Loc=<main.c:9:23>
for 'for' [StartOfLine] [LeadingSpace] Loc=<main.c:10:5>
l_paren '(' [LeadingSpace] Loc=<main.c:10:9>
identifier 'i' Loc=<main.c:10:10>
equal '=' [LeadingSpace] Loc=<main.c:10:12>
numeric_constant '0' [LeadingSpace] Loc=<main.c:10:14>
semi ';' Loc=<main.c:10:15>
identifier 'i' [LeadingSpace] Loc=<main.c:10:17>
less '<' [LeadingSpace] Loc=<main.c:10:19>
numeric_constant '10' [LeadingSpace] Loc=<main.c:10:21>
semi ';' Loc=<main.c:10:23>
plusplus '++' [LeadingSpace] Loc=<main.c:10:25>
identifier 'i' Loc=<main.c:10:27>
r_paren ')' Loc=<main.c:10:28>
l_brace '{' Loc=<main.c:10:29>
identifier 'n' [StartOfLine] [LeadingSpace] Loc=<main.c:11:9>
equal '=' [LeadingSpace] Loc=<main.c:11:11>
identifier 'i' [LeadingSpace] Loc=<main.c:11:13>
plus '+' [LeadingSpace] Loc=<main.c:11:15>
numeric_constant '1' [LeadingSpace] Loc=<main.c:11:17>
semi ';' Loc=<main.c:11:18>
identifier 'n' [StartOfLine] [LeadingSpace] Loc=<main.c:12:9>
plusequal '+=' [LeadingSpace] Loc=<main.c:12:11>
numeric_constant '2' [LeadingSpace] Loc=<main.c:12:14>
semi ';' Loc=<main.c:12:15>
r_brace '}' [StartOfLine] [LeadingSpace] Loc=<main.c:13:5>
return 'return' [StartOfLine] [LeadingSpace] Loc=<main.c:14:5>
numeric_constant '0' [LeadingSpace] Loc=<main.c:14:12>
semi ';' Loc=<main.c:14:13>
r_brace '}' [StartOfLine] Loc=<main.c:15:1>
eof '' Loc=<main.c:15:2>
用clang -fsyntax-only -Xclang -ast-dump main.c
命令查看AST,命令行输出为
-FunctionDecl 0x7d0ffd8 <main.c:3:1, line:15:1> line:3:5 main 'int (void)'
`-CompoundStmt 0x7d10890 <col:15, line:15:1>
|-DeclStmt 0x7d10190 <line:4:5, col:13>
| |-VarDecl 0x7d10090 <col:5, col:9> col:9 used i 'int'
| `-VarDecl 0x7d10110 <col:5, col:12> col:12 used n 'int'
|-CallExpr 0x7d102f0 <line:5:5, col:18> 'int'
| |-ImplicitCastExpr 0x7d102d8 <col:5> 'int (*)(const char *restrict, ...)' <FunctionToPointerDecay>
| | `-DeclRefExpr 0x7d101a8 <col:5> 'int (const char *restrict, ...)' Function 0x7d045b8 'scanf' 'int (const char *restrict, ...)'
| |-ImplicitCastExpr 0x7d10338 <col:11> 'const char *' <NoOp>
| | `-ImplicitCastExpr 0x7d10320 <col:11> 'char *' <ArrayToPointerDecay>
| | `-StringLiteral 0x7d10208 <col:11> 'char [3]' lvalue "%d"
| `-UnaryOperator 0x7d10248 <col:16, col:17> 'int *' prefix '&' cannot overflow
| `-DeclRefExpr 0x7d10228 <col:17> 'int' lvalue Var 0x7d10110 'n' 'int'
|-IfStmt 0x7d105a0 <line:6:5, line:9:22> has_else
| |-BinaryOperator 0x7d103a8 <line:6:9, col:13> 'int' '>'
| | |-ImplicitCastExpr 0x7d10390 <col:9> 'int' <LValueToRValue>
| | | `-DeclRefExpr 0x7d10350 <col:9> 'int' lvalue Var 0x7d10110 'n' 'int'
| | `-IntegerLiteral 0x7d10370 <col:13> 'int' 10
| |-CallExpr 0x7d10480 <line:7:9, col:31> 'int'
| | |-ImplicitCastExpr 0x7d10468 <col:9> 'int (*)(const char *, ...)' <FunctionToPointerDecay>
| | | `-DeclRefExpr 0x7d103c8 <col:9> 'int (const char *, ...)' Function 0x7cff6f0 'printf' 'int (const char *, ...)'
| | `-ImplicitCastExpr 0x7d104c0 <col:16> 'const char *' <NoOp>
| | `-ImplicitCastExpr 0x7d104a8 <col:16> 'char *' <ArrayToPointerDecay>
| | `-StringLiteral 0x7d10428 <col:16> 'char [13]' lvalue "hello world\n"
| `-CallExpr 0x7d10548 <line:9:9, col:22> 'int'
| |-ImplicitCastExpr 0x7d10530 <col:9> 'int (*)(const char *, ...)' <FunctionToPointerDecay>
| | `-DeclRefExpr 0x7d104d8 <col:9> 'int (const char *, ...)' Function 0x7cff6f0 'printf' 'int (const char *, ...)'
| `-ImplicitCastExpr 0x7d10588 <col:16> 'const char *' <NoOp>
| `-ImplicitCastExpr 0x7d10570 <col:16> 'char *' <ArrayToPointerDecay>
| `-StringLiteral 0x7d104f8 <col:16> 'char [4]' lvalue "no\n"
|-ForStmt 0x7d10828 <line:10:5, line:13:5>
| |-BinaryOperator 0x7d10610 <line:10:10, col:14> 'int' '='
| | |-DeclRefExpr 0x7d105d0 <col:10> 'int' lvalue Var 0x7d10090 'i' 'int'
| | `-IntegerLiteral 0x7d105f0 <col:14> 'int' 0
| |-<<<NULL>>>
| |-BinaryOperator 0x7d10688 <col:17, col:21> 'int' '<'
| | |-ImplicitCastExpr 0x7d10670 <col:17> 'int' <LValueToRValue>
| | | `-DeclRefExpr 0x7d10630 <col:17> 'int' lvalue Var 0x7d10090 'i' 'int'
| | `-IntegerLiteral 0x7d10650 <col:21> 'int' 10
| |-UnaryOperator 0x7d106c8 <col:25, col:27> 'int' prefix '++'
| | `-DeclRefExpr 0x7d106a8 <col:27> 'int' lvalue Var 0x7d10090 'i' 'int'
| `-CompoundStmt 0x7d10808 <col:29, line:13:5>
| |-BinaryOperator 0x7d10778 <line:11:9, col:17> 'int' '='
| | |-DeclRefExpr 0x7d106e0 <col:9> 'int' lvalue Var 0x7d10110 'n' 'int'
| | `-BinaryOperator 0x7d10758 <col:13, col:17> 'int' '+'
| | |-ImplicitCastExpr 0x7d10740 <col:13> 'int' <LValueToRValue>
| | | `-DeclRefExpr 0x7d10700 <col:13> 'int' lvalue Var 0x7d10090 'i' 'int'
| | `-IntegerLiteral 0x7d10720 <col:17> 'int' 1
| `-CompoundAssignOperator 0x7d107d8 <line:12:9, col:14> 'int' '+=' ComputeLHSTy='int' ComputeResultTy='int'
| |-DeclRefExpr 0x7d10798 <col:9> 'int' lvalue Var 0x7d10110 'n' 'int'
| `-IntegerLiteral 0x7d107b8 <col:14> 'int' 2
`-ReturnStmt 0x7d10880 <line:14:5, col:12>
`-IntegerLiteral 0x7d10860 <col:12> 'int' 0
在低版本的llvm中可以通过clang --cc1 -analyze -cfg-dump main.c
来查看main.c
的控制流图,不过这个版本好像不行了。需要用clang -cc1 -analyze -analyzer-checker=debug.ViewCFG main.c
命令调用CSA的ViewCFG checker来查看。
但是我的测试代码中引用了stdio.h
,clang -cc1
默认仅限当前目录,所以会出现fata error: 'stdio.h' file not found
的情况,需要使用-I
参数包含库,这里我的stdio.h
在目录/usr/include
下。而这里还需要引入一个stddef.h
,这个文件在目录$LLVM_DIR/lib/clang/12.0.0/include
下,其中$LLVM_DIR
是llvm的安装目录。
最终命令如下:clang -cc1 -I /usr/include -I $LLVM_DIR/lib/clang/12.0.0/include -analyze -analyzer-checker=debug.DumpCFG main.c
命令行输出的CFG如下
int main()
[B9 (ENTRY)]
Succs (1): B8
[B1]
1: 0
2: return [B1.1];
Preds (1): B4
Succs (1): B0
[B2]
1: i
2: ++[B2.1]
Preds (1): B3
Succs (1): B4
[B3]
1: i
2: [B3.1] (ImplicitCastExpr, LValueToRValue, int)
3: 1
4: [B3.2] + [B3.3]
5: n
6: [B3.5] = [B3.4]
7: n
8: 2
9: [B3.7] += [B3.8]
Preds (1): B4
Succs (1): B2
[B4]
1: i
2: [B4.1] (ImplicitCastExpr, LValueToRValue, int)
3: 10
4: [B4.2] < [B4.3]
T: for (...; [B4.4]; ...)
Preds (2): B2 B5
Succs (2): B3 B1
[B5]
1: 0
2: i
3: [B5.2] = [B5.1]
Preds (2): B6 B7
Succs (1): B4
[B6]
1: printf
2: [B6.1] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(const char *, ...))
3: "no\n"
4: [B6.3] (ImplicitCastExpr, ArrayToPointerDecay, char *)
5: [B6.4] (ImplicitCastExpr, NoOp, const char *)
6: [B6.2]([B6.5])
Preds (1): B8
Succs (1): B5
[B7]
1: printf
2: [B7.1] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(const char *, ...))
3: "hello world\n"
4: [B7.3] (ImplicitCastExpr, ArrayToPointerDecay, char *)
5: [B7.4] (ImplicitCastExpr, NoOp, const char *)
6: [B7.2]([B7.5])
Preds (1): B8
Succs (1): B5
[B8]
1: int i;
2: int n;
3: __isoc99_scanf
4: [B8.3] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(const char *, ...))
5: "%d"
6: [B8.5] (ImplicitCastExpr, ArrayToPointerDecay, char *)
7: [B8.6] (ImplicitCastExpr, NoOp, const char *)
8: n
9: &[B8.8]
10: [B8.4]([B8.7], [B8.9])
11: n
12: [B8.11] (ImplicitCastExpr, LValueToRValue, int)
13: 10
14: [B8.12] > [B8.13]
T: if [B8.14]
Preds (1): B9
Succs (2): B7 B6
[B0 (EXIT)]
Preds (1): B1
CFG的结点序号是倒序的,化成流程图如下:
这里clang把int i, n;
解析成了 int i; int n;
2个语句,CFG结点是手动翻译回来的。至于clang输出的CFG文本怎么解析还有待研究。
LLVM IR使用静态单赋值(SSA)策略(但这并不是说clang生成的IR是SSA的,LLVM 采用了一个“小技巧”,可以把构造 SSA 的工作从前端clang分离出来。这个 trick 是 LLVM 所特有的),生成的IR具有下面特性
以三地址码形式组织指令
假设有无数寄存器可用
LLVM IR有3种表示形式(本质是等价的,可以相互转换):
clang -S -emit-llvm main.c
获得,-S
表示Only run preprocess and compilation stepsFunction
和Instruction
类表示的IR。clang -c -emit-llvm main.c
获得,通过命令-c
表示Only run preprocess, compile, and assemble steps,适用于JIT编译器的快速加载。这里dump出text格式的
; ModuleID = 'main.c'
source_filename = "main.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@.str = private unnamed_addr constant [3 x i8] c"%d\00", align 1
@.str.1 = private unnamed_addr constant [13 x i8] c"hello world\0A\00", align 1
@.str.2 = private unnamed_addr constant [4 x i8] c"no\0A\00", align 1
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
store i32 0, i32* %1, align 4
%4 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %3)
%5 = load i32, i32* %3, align 4
%6 = icmp sgt i32 %5, 10
br i1 %6, label %7, label %9
7: ; preds = %0
%8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.1, i64 0, i64 0))
br label %11
9: ; preds = %0
%10 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.2, i64 0, i64 0))
br label %11
11: ; preds = %9, %7
store i32 0, i32* %2, align 4
br label %12
12: ; preds = %20, %11
%13 = load i32, i32* %2, align 4
%14 = icmp slt i32 %13, 10
br i1 %14, label %15, label %23
15: ; preds = %12
%16 = load i32, i32* %2, align 4
%17 = add nsw i32 %16, 1
store i32 %17, i32* %3, align 4
%18 = load i32, i32* %3, align 4
%19 = add nsw i32 %18, 2
store i32 %19, i32* %3, align 4
br label %20
20: ; preds = %15
%21 = load i32, i32* %2, align 4
%22 = add nsw i32 %21, 1
store i32 %22, i32* %2, align 4
br label %12, !llvm.loop !2
23: ; preds = %12
ret i32 0
}
declare dso_local i32 @__isoc99_scanf(i8*, ...) #1
declare dso_local i32 @printf(i8*, ...) #1
attributes #0 = { noinline nounwind optnone uwtable "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 12.0.0"}
!2 = distinct !{!2, !3}
!3 = !{!"llvm.loop.mustprogress"}
以后会补充对IR的解释。
CSA是clang的一部分,安装好llvm和clang之后CSA已经存在目录之下了,并且目录中已经存在一些CSA checker,通过命令clang -cc1 -analyzer-checker-help
可查询checker list。同时官方给出的checker list(可以对比)
再llvm project的clang/include/clang/StaticAnalyzer/Checkers下有个Checker.td文件保存了Checker的描述。
OVERVIEW: Clang Static Analyzer Checkers List
USAGE: -analyzer-checker <CHECKER or PACKAGE,...>
CHECKERS:
core.CallAndMessage Check for logical errors for function calls and Objective-C message expressions (e.g., uninitialized arguments, null function pointers)
core.DivideZero Check for division by zero
core.NonNullParamChecker Check for null pointers passed as arguments to a function whose arguments are references or marked with the 'nonnull' attribute
core.NullDereference Check for dereferences of null pointers
core.StackAddressEscape Check that addresses to stack memory do not escape the function
core.UndefinedBinaryOperatorResult
Check for undefined results of binary operators
core.VLASize Check for declarations of VLA of undefined or zero size
core.uninitialized.ArraySubscript
Check for uninitialized values used as array subscripts
core.uninitialized.Assign Check for assigning uninitialized values
core.uninitialized.Branch Check for uninitialized values used as branch conditions
core.uninitialized.CapturedBlockVariable
Check for blocks that capture uninitialized values
core.uninitialized.UndefReturn Check for uninitialized values being returned to the caller
cplusplus.InnerPointer Check for inner pointers of C++ containers used after re/deallocation
cplusplus.Move Find use-after-move bugs in C++
cplusplus.NewDelete Check for double-free and use-after-free problems. Traces memory managed by new/delete.
cplusplus.NewDeleteLeaks Check for memory leaks. Traces memory managed by new/delete.
cplusplus.PlacementNew Check if default placement new is provided with pointers to sufficient storage capacity
cplusplus.PureVirtualCall Check pure virtual function calls during construction/destruction
deadcode.DeadStores Check for values stored to variables that are never read afterwards
fuchsia.HandleChecker A Checker that detect leaks related to Fuchsia handles
nullability.NullPassedToNonnull
Warns when a null pointer is passed to a pointer which has a _Nonnull type.
nullability.NullReturnedFromNonnull
Warns when a null pointer is returned from a function that has _Nonnull return type.
nullability.NullableDereferenced
Warns when a nullable pointer is dereferenced.
nullability.NullablePassedToNonnull
Warns when a nullable pointer is passed to a pointer which has a _Nonnull type.
nullability.NullableReturnedFromNonnull
Warns when a nullable pointer is returned from a function that has _Nonnull return type.
optin.cplusplus.UninitializedObject
Reports uninitialized fields after object construction
optin.cplusplus.VirtualCall Check virtual function calls during construction/destruction
optin.mpi.MPI-Checker Checks MPI code
optin.osx.OSObjectCStyleCast Checker for C-style casts of OSObjects
optin.osx.cocoa.localizability.EmptyLocalizationContextChecker
Check that NSLocalizedString macros include a comment for context
optin.osx.cocoa.localizability.NonLocalizedStringChecker
Warns about uses of non-localized NSStrings passed to UI methods expecting localized NSStrings
optin.performance.GCDAntipattern
Check for performance anti-patterns when using Grand Central Dispatch
optin.performance.Padding Check for excessively padded structs.
optin.portability.UnixAPI Finds implementation-defined behavior in UNIX/Posix functions
osx.API Check for proper uses of various Apple APIs
osx.MIG Find violations of the Mach Interface Generator calling convention
osx.NumberObjectConversion Check for erroneous conversions of objects representing numbers into numbers
osx.OSObjectRetainCount Check for leaks and improper reference count management for OSObject
osx.ObjCProperty Check for proper uses of Objective-C properties
osx.SecKeychainAPI Check for proper uses of Secure Keychain APIs
osx.cocoa.AtSync Check for nil pointers used as mutexes for @synchronized
osx.cocoa.AutoreleaseWrite Warn about potentially crashing writes to autoreleasing objects from different autoreleasing pools in Objective-C
osx.cocoa.ClassRelease Check for sending 'retain', 'release', or 'autorelease' directly to a Class
osx.cocoa.Dealloc Warn about Objective-C classes that lack a correct implementation of -dealloc
osx.cocoa.IncompatibleMethodTypes
Warn about Objective-C method signatures with type incompatibilities
osx.cocoa.Loops Improved modeling of loops using Cocoa collection types
osx.cocoa.MissingSuperCall Warn about Objective-C methods that lack a necessary call to super
osx.cocoa.NSAutoreleasePool Warn for suboptimal uses of NSAutoreleasePool in Objective-C GC mode
osx.cocoa.NSError Check usage of NSError** parameters
osx.cocoa.NilArg Check for prohibited nil arguments to ObjC method calls
osx.cocoa.NonNilReturnValue Model the APIs that are guaranteed to return a non-nil value
osx.cocoa.ObjCGenerics Check for type errors when using Objective-C generics
osx.cocoa.RetainCount Check for leaks and improper reference count management
osx.cocoa.RunLoopAutoreleaseLeak
Check for leaked memory in autorelease pools that will never be drained
osx.cocoa.SelfInit Check that 'self' is properly initialized inside an initializer method
osx.cocoa.SuperDealloc Warn about improper use of '[super dealloc]' in Objective-C
osx.cocoa.UnusedIvars Warn about private ivars that are never used
osx.cocoa.VariadicMethodTypes Check for passing non-Objective-C types to variadic collection initialization methods that expect only Objective-C types
osx.coreFoundation.CFError Check usage of CFErrorRef* parameters
osx.coreFoundation.CFNumber Check for proper uses of CFNumber APIs
osx.coreFoundation.CFRetainRelease
Check for null arguments to CFRetain/CFRelease/CFMakeCollectable
osx.coreFoundation.containers.OutOfBounds
Checks for index out-of-bounds when using 'CFArray' API
osx.coreFoundation.containers.PointerSizedValues
Warns if 'CFArray', 'CFDictionary', 'CFSet' are created with non-pointer-size values
security.FloatLoopCounter Warn on using a floating point value as a loop counter (CERT: FLP30-C, FLP30-CPP)
security.insecureAPI.DeprecatedOrUnsafeBufferHandling
Warn on uses of unsecure or deprecated buffer manipulating functions
security.insecureAPI.UncheckedReturn
Warn on uses of functions whose return values must be always checked
security.insecureAPI.bcmp Warn on uses of the 'bcmp' function
security.insecureAPI.bcopy Warn on uses of the 'bcopy' function
security.insecureAPI.bzero Warn on uses of the 'bzero' function
security.insecureAPI.decodeValueOfObjCType
Warn on uses of the '-decodeValueOfObjCType:at:' method
security.insecureAPI.getpw Warn on uses of the 'getpw' function
security.insecureAPI.gets Warn on uses of the 'gets' function
security.insecureAPI.mkstemp Warn when 'mkstemp' is passed fewer than 6 X's in the format string
security.insecureAPI.mktemp Warn on uses of the 'mktemp' function
security.insecureAPI.rand Warn on uses of the 'rand', 'random', and related functions
security.insecureAPI.strcpy Warn on uses of the 'strcpy' and 'strcat' functions
security.insecureAPI.vfork Warn on uses of the 'vfork' function
unix.API Check calls to various UNIX/Posix functions
unix.Malloc Check for memory leaks, double free, and use-after-free problems. Traces memory managed by malloc()/free().
unix.MallocSizeof Check for dubious malloc arguments involving sizeof
unix.MismatchedDeallocator Check for mismatched deallocators.
unix.Vfork Check for proper usage of vfork
unix.cstring.BadSizeArg Check the size argument passed into C string functions for common erroneous patterns
unix.cstring.NullArg Check for null pointers being passed as arguments to C string functions
valist.CopyToSelf Check for va_lists which are copied onto itself.
valist.Uninitialized Check for usages of uninitialized (or already released) va_lists.
valist.Unterminated Check for va_lists which are not released by a va_end call.
webkit.NoUncountedMemberChecker
Check for no uncounted member variables.
webkit.RefCntblBaseVirtualDtor Check for any ref-countable base class having virtual destructor.
webkit.UncountedLambdaCapturesChecker
Check uncounted lambda captures.
这些checker源代码的目录在lib/StaticAnalyzer之下。前面查看程序的CFG用到了debug.DumpCFG checker。
对单个源文件(单个文件的小程序)进行检测时,执行命令clang --analyze -Xanalyzer -analyzer-checker=
这里我简单使用下DivideZero Checker(core.DivideZero
),命令行help信息显示的是Check for division by zero
,源码位置应该是DivZeroChecker.cpp(我自己猜的,可能有误),官方提供的测试样例。
int fooPR10616 (int qX ) {
int a, c, d;
d = (qX-1);
while ( d != 0 ) {
d = c - (c/d) * d;
}
return (a % (qX-1)); // expected-warning {{Division by zero}}
}
根据注释,CSA应该会对 (a % (qX-1))
报出一个warning。这里面涉及到除法的只有 c / d
和 a % (qX - 1)
,前者在外面有while
循环护体,躲过一劫。这里我运行命令 clang -cc1 -analyze -analyzer-checker=core.DivideZero div-zero.c
,CSA报出的信息如下:
div-zero.c:9:13: warning: Division by zero [core.DivideZero]
return (a % (qX-1)); // expected-warning {{Division by zero}}
~~^~~~~~~~
1 warning generated.
可供参考的文档(不过有的是基于4.0.0相关的版本,我用的12.0.0,有的代码不一样了,相应的lib也就不一样了):
clang-developer-manual
How to Write a Checker in 24 Hours(slide)
clang-analyzer-guide-v0.1-pdf
首先,CSA有关内容都在llvm project(我这是12.0.0版)目录下
include/clang/StaticAnalyzer
lib/StaticAnalyzer:里面有一些checker的源代码
test/Analysis
这里我贴上clang-analyzer-guide的示例MainCallChecker
。这个checker的目标是查找代码中是否有违反下面rule的情况
main
shall not be used within a program:main函数不应该递归(程序总是会包括main
的,但是main
中的任何内容不应该再调用main
)。这看起来容易,好像只要只要找函数调用语句并且匹配函数名是否是main
就可以(AST匹配)。而实际上忽略了函数指针的情况。typedef int (*main_t)(int, char**);
int main (int argc , char** argv) {
main_t foo = main ;
int exit_code = foo(argc, argv); // actually calls main ()!
return exit_code ;
}
上面代码中main_t
是自定义的函数指针类型,这类函数返回值int
,第一个形参数是int
,第二个是char**
,上面代码中程序定义了一个函数指针变量foo
指向main
,并通过foo
调用main
,因此存在调用main
的情况,需要抛出warn,而简单的AST匹配(syntax-based check)是不可能做到的。
这里我就不贴上checker的代码了,官方给出了demo。写出的checker有3种运行的方式。
静态:通过静态链接的方式,需要修改Checker.td并重新编译clang,比较麻烦,但重新编译之后自定义checker就已经集成进去了。
动态:自定义checker作为一个单独的模块编译成动态链接库so文件,调用的时候通过clang -cc1 -load Checker.so
加载(Checker.so
是自定义的checker)。这里我选择用动态集成的方式。
同时也可以写一个独立的程序来检测(libtooling方式),就是写好Checker后不再集成回Clang而是作为独立程序运行。
这里我贴上一些官方MainCallChecker的demo
同时这有另一个自定义clang plugin工程,这里写的plugin独立运行和动态集成的版本都是。