LLVM编译浅尝

对于普通的开发人员来说,LLVM计划提供了越来越多的可以使用、编译器以外的其他工具。例如代码静态检查工具LLVM/Clang Static Analyzer,是一个 Clang 的子项目,能够使用同样的 Makefile 生成 HTML 格式的分析报告。
对关注编译技术的开发人员,LLVM提供了很多优点:

  • 现代化的设计
  • LLVM的设计是高度模块化的,使得其代码更为清晰和便于排查问题所在
  • 语言无关的中间代码
    • 一方面透过LLVM能够将不同的语言相互连结起来;也使得LLVM能够紧密地与IDE交互和集成。
    • 另一方面,发布中间代码而非目标代码能够在目标系统上更好地发挥其潜能而又不伤害可调试性(i.e. 在目标系统上针对本机的硬件环境产生目标代码,但又能够直接通过中间代码来进行行级调试)
  • 作为工具和函数库
  • 使用LLVM提供的工具可以比较容易地实现新的编程语言的优化编译器或VM,或为现有的编程语言引入一些更好的优化/调试特性

编译流程

#import 
#import "AppDelegate.h"

int main(int argc, char * argv[]) {
    NSString * appDelegateClassName;
    @autoreleasepool {
        // Setup code that might create autoreleased objects goes here.
        appDelegateClassName = NSStringFromClass([AppDelegate class]);
    }
    return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}

cd 到工程目录, 在命令行输入以下命令

clang -ccc-print-phases main.m

可以看到编译源文件需要经历的几个不同阶段。0输入、1预处理、2编译、3后端、4汇编程序、5实体镜像、6绑定镜像

              +- 0: input, "main.m", objective-c
            +- 1: preprocessor, {0}, objective-c-cpp-output
         +- 2: compiler, {1}, ir
      +- 3: backend, {2}, assembler
   +- 4: assembler, {3}, object
+- 5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image

这样能够了解整个过程中的重要信息。
查看Objective-C的C语言源代码实现如下命令

clang -rewrite-objc main.m

查看clang 的内部命令可以使用-###命令

clang -### main.m -o main

想要看清clang的全部过程,可以通过-E查看clang在编译处理时做了哪些工作

clang -E main.m

执行完上面命令后,会在控制台输出如下内容


# 1 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/Foundation.framework/Headers/FoundationLegacySwiftCompatibility.h" 1 3
# 187 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/Foundation.framework/Headers/Foundation.h" 2 3
# 9 "main.m" 2

int main(int argc, const char * argv[]) {
    @autoreleasepool {

        NSLog(@"Hello, World!");
    }
    return 0;
}

预编译包括宏的替换、头文件的导入
下面的这些代码也会在预编译里进行处理

  • "#define"
  • "#include"
  • "#indef"
  • "#pragma"

预处理完成后,clang就会进行词法分析。这里会把代码切成一个一个token,比如大小括号、等于号和字符串等

clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m

词法分析命令行控制台打印

annot_module_include '#import 

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        /'      Loc=
int 'int'    [StartOfLine]  Loc=
identifier 'main'    [LeadingSpace] Loc=
l_paren '('     Loc=
int 'int'       Loc=
identifier 'argc'    [LeadingSpace] Loc=
comma ','       Loc=
const 'const'    [LeadingSpace] Loc=
char 'char'  [LeadingSpace] Loc=
star '*'     [LeadingSpace] Loc=
identifier 'argv'    [LeadingSpace] Loc=
l_square '['        Loc=
r_square ']'        Loc=
r_paren ')'     Loc=
l_brace '{'  [LeadingSpace] Loc=
at '@'   [StartOfLine] [LeadingSpace]   Loc=
identifier 'autoreleasepool'        Loc=
l_brace '{'  [LeadingSpace] Loc=
identifier 'NSLog'   [StartOfLine] [LeadingSpace]   Loc=
l_paren '('     Loc=
at '@'      Loc=
string_literal '"Hello, World!"'        Loc=
r_paren ')'     Loc=
semi ';'        Loc=
r_brace '}'  [StartOfLine] [LeadingSpace]   Loc=
return 'return'  [StartOfLine] [LeadingSpace]   Loc=
numeric_constant '0'     [LeadingSpace] Loc=
semi ';'        Loc=
r_brace '}'  [StartOfLine]  Loc=
eof ''      Loc=

然后进行语法分析,验证语法是否正确,再将所有节点组成抽象语法树

clang -fmodules -fsyntax-only -Xclang -ast-dump main.m

命令行运行结果

TranslationUnitDecl 0x7f9bbd83d008 <>  
|-TypedefDecl 0x7f9bbd83d8c0 <>  implicit __int128_t '__int128'
| `-BuiltinType 0x7f9bbd83d5a0 '__int128'
|-TypedefDecl 0x7f9bbd83d930 <>  implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x7f9bbd83d5c0 'unsigned __int128'
|-TypedefDecl 0x7f9bbd83d9d8 <>  implicit SEL 'SEL *'
| `-PointerType 0x7f9bbd83d990 'SEL *' imported
|   `-BuiltinType 0x7f9bbd83d800 'SEL'
|-TypedefDecl 0x7f9bbd83dab8 <>  implicit id 'id'
| `-ObjCObjectPointerType 0x7f9bbd83da60 'id' imported
|   `-ObjCObjectType 0x7f9bbd83da30 'id' imported
|-TypedefDecl 0x7f9bbd83db98 <>  implicit Class 'Class'
| `-ObjCObjectPointerType 0x7f9bbd83db40 'Class' imported
|   `-ObjCObjectType 0x7f9bbd83db10 'Class' imported
|-ObjCInterfaceDecl 0x7f9bbd83dbf0 <>  implicit Protocol
|-TypedefDecl 0x7f9bbd83df90 <>  implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x7f9bbd83dd60 'struct __NSConstantString_tag'
|   `-Record 0x7f9bbd83dcc0 '__NSConstantString_tag'
|-TypedefDecl 0x7f9bbd87e648 <>  implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x7f9bbd87e600 'char *'
|   `-BuiltinType 0x7f9bbd83d0a0 'char'
|-TypedefDecl 0x7f9bbd87e958 <>  implicit __builtin_va_list 'struct __va_list_tag [1]'
| `-ConstantArrayType 0x7f9bbd87e900 'struct __va_list_tag [1]' 1 
|   `-RecordType 0x7f9bbd87e740 'struct __va_list_tag'
|     `-Record 0x7f9bbd87e6a0 '__va_list_tag'
|-ImportDecl 0x7f9bbdc1f198  col:1 implicit Foundation
`-FunctionDecl 0x7f9bbdc1f470  line:10:5 main 'int (int, const char **)'
  |-ParmVarDecl 0x7f9bbdc1f1f0  col:14 argc 'int'
  |-ParmVarDecl 0x7f9bbdc1f320  col:33 argv 'const char **':'const char **'
  `-CompoundStmt 0x7f9bbdc45e18 
    |-ObjCAutoreleasePoolStmt 0x7f9bbdc45dd0 
    | `-CompoundStmt 0x7f9bbdc45db8 
    |   `-CallExpr 0x7f9bbdc45d78  'void'
    |     |-ImplicitCastExpr 0x7f9bbdc45d60  'void (*)(id, ...)' 
    |     | `-DeclRefExpr 0x7f9bbdc45c58  'void (id, ...)' Function 0x7f9bbdc1f5b8 'NSLog' 'void (id, ...)'
    |     `-ImplicitCastExpr 0x7f9bbdc45da0  'id':'id' 
    |       `-ObjCStringLiteral 0x7f9bbdc45ce0  'NSString *'
    |         `-StringLiteral 0x7f9bbdc45cb8  'char [14]' lvalue "Hello, World!"
    `-ReturnStmt 0x7f9bbdc45e08 
      `-IntegerLiteral 0x7f9bbdc45de8  'int' 0

完成这些步骤后就可以进行IR代码的生成了。CodeGen 会负责将抽象语法数自上而下遍历,逐步翻译成LLVM IR。IR代码即是前端代码的输出结果,也是后端编译的输入内容

clang -S -fobjc-arc -emit-llvm main.m -o main.ll

执行结果

; ModuleID = 'main.m'
source_filename = "main.m"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx12.0.0"

%struct.__NSConstantString_tag = type { i32*, i32, i8*, i64 }

@__CFConstantStringClassReference = external global [0 x i32]
@.str = private unnamed_addr constant [14 x i8] c"Hello, World!\00", section "__TEXT,__cstring,cstring_literals", align 1
@_unnamed_cfstring_ = private global %struct.__NSConstantString_tag { i32* getelementptr inbounds ([0 x i32], [0 x i32]* @__CFConstantStringClassReference, i32 0, i32 0), i32 1992, i8* getelementptr inbounds ([14 x i8], [14 x i8]* @.str, i32 0, i32 0), i64 13 }, section "__DATA,__cfstring", align 8 #0

; Function Attrs: noinline optnone ssp uwtable
define i32 @main(i32 %0, i8** %1) #1 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  %6 = call i8* @llvm.objc.autoreleasePoolPush() #2
  notail call void (i8*, ...) @NSLog(i8* bitcast (%struct.__NSConstantString_tag* @_unnamed_cfstring_ to i8*))
  call void @llvm.objc.autoreleasePoolPop(i8* %6)
  ret i32 0
}

; Function Attrs: nounwind
declare i8* @llvm.objc.autoreleasePoolPush() #2

declare void @NSLog(i8*, ...) #3

; Function Attrs: nounwind
declare void @llvm.objc.autoreleasePoolPop(i8*) #2

attributes #0 = { "objc_arc_inert" }
attributes #1 = { noinline optnone ssp uwtable "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
attributes #3 = { "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7}
!llvm.ident = !{!8}

!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 12, i32 1]}
!1 = !{i32 1, !"Objective-C Version", i32 2}
!2 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!3 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,__objc_imageinfo,regular,no_dead_strip"}
!4 = !{i32 1, !"Objective-C Garbage Collection", i8 0}
!5 = !{i32 1, !"Objective-C Class Properties", i32 64}
!6 = !{i32 1, !"wchar_size", i32 4}
!7 = !{i32 7, !"PIC Level", i32 2}
!8 = !{!"Apple clang version 13.0.0 (clang-1300.0.29.30)"}

在这里LLVM会做些优化工作,在xcode编译设置里也可以设置优先级别 -O1、-O3、-Os,还可以写一些自己的Pass

 clang -O3 -S -fobjc-arc -emit-llvm main.m -o main_O3.ll

``
执行结果

; ModuleID = 'main.m'
source_filename = "main.m"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx12.0.0"

%struct.__NSConstantString_tag = type { i32, i32, i8, i64 }

@__CFConstantStringClassReference = external global [0 x i32]
@.str = private unnamed_addr constant [14 x i8] c"Hello, World!\00", section "__TEXT,__cstring,cstring_literals", align 1
@unnamed_cfstring = private global %struct.__NSConstantString_tag { i32* getelementptr inbounds ([0 x i32], [0 x i32]* @__CFConstantStringClassReference, i32 0, i32 0), i32 1992, i8* getelementptr inbounds ([14 x i8], [14 x i8]* @.str, i32 0, i32 0), i64 13 }, section "__DATA,__cfstring", align 8 #0

; Function Attrs: ssp uwtable
define i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #1 {
%3 = tail call i8* @llvm.objc.autoreleasePoolPush() #2
notail call void (i8, ...) @NSLog(i8 bitcast (%struct.__NSConstantString_tag* @unnamed_cfstring to i8)), !clang.arc.no_objc_arc_exceptions !9
tail call void @llvm.objc.autoreleasePoolPop(i8
%3) #2
ret i32 0
}

; Function Attrs: nounwind
declare i8* @llvm.objc.autoreleasePoolPush() #2

declare void @NSLog(i8*, ...) local_unnamed_addr #3

; Function Attrs: nounwind
declare void @llvm.objc.autoreleasePoolPop(i8*) #2

attributes #0 = { "objc_arc_inert" }
attributes #1 = { ssp uwtable "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
attributes #3 = { "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7}
!llvm.ident = !{!8}

!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 12, i32 1]}
!1 = !{i32 1, !"Objective-C Version", i32 2}
!2 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!3 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,__objc_imageinfo,regular,no_dead_strip"}
!4 = !{i32 1, !"Objective-C Garbage Collection", i8 0}
!5 = !{i32 1, !"Objective-C Class Properties", i32 64}
!6 = !{i32 1, !"wchar_size", i32 4}
!7 = !{i32 7, !"PIC Level", i32 2}
!8 = !{!"Apple clang version 13.0.0 (clang-1300.0.29.30)"}
!9 = !{}


Pass 是LLVM优化工作的一个节点,每个节点都做些优化的工作,所有节点一起完成LLVM所有的优化和转化工作。
如果开启了Bitcod, LLVM会对代码做进一步的优化、这份优化生成的Bitcode,可以用在后端架构中

clang -emit-llvm -c main.m -o main.bc


生成汇编

clang -S -fobjc-arc main.m -o main.s

汇编结果

.section __TEXT,__text,regular,pure_instructions
.build_version macos, 12, 0 sdk_version 12, 1
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc

%bb.0:

pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq    %rsp, %rbp
.cfi_def_cfa_register %rbp
subq    $32, %rsp
movl    $0, -4(%rbp)
movl    %edi, -8(%rbp)
movq    %rsi, -16(%rbp)
callq   _objc_autoreleasePoolPush
movq    %rax, -24(%rbp)                 ## 8-byte Spill
leaq    L__unnamed_cfstring_(%rip), %rdi
movb    $0, %al
callq   _NSLog
movq    -24(%rbp), %rdi                 ## 8-byte Reload
callq   _objc_autoreleasePoolPop
xorl    %eax, %eax
addq    $32, %rsp
popq    %rbp
retq
.cfi_endproc
                                    ## -- End function
.section    __TEXT,__cstring,cstring_literals

L_.str: ## @.str
.asciz "Hello, World!"

.section    __DATA,__cfstring
.p2align    3                               ## @_unnamed_cfstring_

L__unnamed_cfstring_:
.quad __CFConstantStringClassReference
.long 1992 ## 0x7c8
.space 4
.quad L
.str
.quad 13 ## 0xd

.section    __DATA,__objc_imageinfo,regular,no_dead_strip

L_OBJC_IMAGE_INFO:
.long 0
.long 64

.subsections_via_symbols

生成目标文件

clang -fmodules -c main.m -o main.o

结果

????
? (???__text__TEXT>???__cstring__TEXT>6__cfstring__DATAP H?__objc_imageinfo__DATAh__compact_unwind__LDx p__eh_frame__TEXT?@?
h2

     Xh
       P-(-frameworkFoundation-(-frameworkCoreGraphics-(-frameworkCoreServices- -frameworkIOKit-(-frameworkDiskArbitration-(-frameworkCFNetwork- -frameworkSecurity-(-frameworkCoreFoundationUH??H?? ?E??}?H?u??H?E?H?=*??H?}??1?H?? ]?Hello@>zRxld!?>

2-)-"-:A _objc_autoreleasePoolPop_main_objc_autoreleasePoolPush_NSLog___CFConstantStringClassReference%

生成可执行文件

clang main.o -o main


执行:

./main

输出:

main[99036:29849196] Hello, World!

下面是完整步骤:
- 将编译信息写入辅助文件、创建文件架构.app文件。
- 处理文件的打包信息
- 执行cocoapods编译前脚本,checkPods Manifest.lock.
- 编译.m文件,使用CompileC和clang命令
- 链接器会去链接程序所需要的Framwork。
- 编译 xib
- 拷贝资源文件
- 编译 ImageAsset
- 处理 Info.plist
- 执行 CocoaPod 脚本
- 拷贝标准库
- 创建.app文件和签名

clang 命令参数说明
- -x:  指定后续输入文件的编译语言,比如Objective-C
- -arch: 指定编译的架构,比如ARM7
- -f: 以-f开头的命令参数,用来诊断、分析代码。
- -W: 以-W开头的命令参数,可以通过逗号分割不同的参数以定制编译警告。
- -D:  以-D开头的命令参数,指的是预编译宏,通过这些宏可以实现条件编译。
- -I: 添加目录到搜索路径中。
- -F: 指需要的Framework
- -c: 运行预处理、编译和汇编
- -o: 将编译结果输出到指定文件



你可能感兴趣的:(LLVM编译浅尝)