1、main函数分析
之前很多的分析都是从main.m
文件入手的,但是从来没分析过为啥APP的启动是通过这个文件来的,所以有疑问就要解决一下,新建一个单界面工程,以下就是main.m
代码。
#import
#import "AppDelegate.h"
int main(int argc, char * argv[]) {
NSString * appDelegateClassName;
@autoreleasepool {
// Setup code that might create autoreleased objects goes here.
appDelegateClassName = NSStringFromClass([AppDelegate class]);
}
return UIApplicationMain(argc, argv, nil, appDelegateClassName);
}
由学习过C/C++
经验可以知道main
函数就是整个程序的入口,但是之前的C/C++
是commandLine类型的,并不是APP类型的。
这里源码可以知道main
函数是调用了UIApplicationMain
方法,而UIApplicationMain
方法的参数包含了AppDelegate
这个类,也就是对这个类进行了调用。这里也粗略知道是main
函数调用了AppDelegate
这个类导致APP的启动。
这里就存在有个疑问就是:
之前玩过方法交换了解到,有两个方法load
和initialize
,网上解释是load
是在加载该类是调用一次且只调用一次,initialize
是初始化该类是会调用到,也仅仅是这么多了。initialize
就不说了初始化才调用,绝对在main
函数后,load
是在前面还是后面需要验证下,在viewcontroller
中加入load
方法:
+(void)load
{
NSLog(@"-----viewcontroller's load");
}
且分别在load和main
方法添加断点,走一个~
然后发现居然是viewcontroller
的load
方法先调用,然后再是main
函数的调用,而且分别查看堆栈可以发现:
dyld
了,基本上可以确定main
执行前或者说是APP启动前是进行了dyld
的调用了的,而且库可以找到名字就是libdyld.dylib
。
这里其实还有一个补充的点就是既然
main.m
文件属于混编,那在这个文件类的C/C++
的方法是否可以在main
函数之前提前调用,因为这个情况在C/C++
中很常见,所以试一下:
void func1(){
printf("func1‘s load");
}
__attribute__((constructor)) void func2(){
printf("func2‘s load");
}
添加到main.m
文件中main
函数外,断点伺候,得到func1
是没有调用的,func2
是在load
之后,main
之前调用的。这里主要的区别就是方法的修饰__attribute__((constructor))
,可以查阅GNU文档
知道,__attribute__
修饰,添加的是constructor 构造
参数,就会在main
之前调用,如果添加的是destructor 析构
就会在main
之后调用。
这里基本上就可以大致总结以下:
APP加载的大致流程是:APP启动
——> dyld加载
——>load方法调用
——>constructor类型c/c++的方法调用
——>main调用
。
2、dyld分析
iOS开发中可以引用的库文件有.a
、.dylib
、.tbd
、.framework
几种后缀,也分两类即静态库
和动态库
。
静态库
:链接时会被完整的复制到可执行文件中,被多次使用就有多份拷贝。
动态库:
链接时不复制,程序运行时由系统动态加载到内存,系统只加载一次,多个程序共用(系统的一些库如UIKit.framework
等),节省内存。
涉及到的具体就是链接
这块,具体编译运行流程可以参考查漏补缺,结合APP加载的流程就可以知道dyld加载
就应该是将需要的库文件加入到内存中,不然少了这一步没有库文件可以引用将无法将app运行起来。
那么这个dyld
到底是什么呢?
dyld(the dynamic link editor)
是苹果的动态链接器,是苹果操作系统一个重要组成部分,在系统内核做好程序准备工作之后,交由dyld
负责余下的工作。而且它是开源的,任何人可以通过苹果官网下载它的源码来阅读理解它的运作方式,了解系统加载动态库的细节。我现阶段下载的最新版本是dyld-832.7.1源码。
根据堆栈的调用顺序来看最初的调用是_dyld_start
,所以在源码中搜索_dyld_start
,找到对应arm64
的汇编代码:
#if __arm64__ && !TARGET_OS_SIMULATOR
.text
.align 2
.globl __dyld_start
__dyld_start:
mov x28, sp
and sp, x28, #~15 // force 16-byte alignment of stack
mov x0, #0
mov x1, #0
stp x1, x0, [sp, #-16]! // make aligned terminating frame
mov fp, sp // set up fp to point to terminating frame
sub sp, sp, #16 // make room for local variables
#if __LP64__
ldr x0, [x28] // get app's mh into x0
ldr x1, [x28, #8] // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
add x2, x28, #16 // get argv into x2
#else
ldr w0, [x28] // get app's mh into x0
ldr w1, [x28, #4] // get argc into x1 (kernel passes 32-bit int argc as 64-bits on stack to keep alignment)
add w2, w28, #8 // get argv into x2
#endif
adrp x3,___dso_handle@page
add x3,x3,___dso_handle@pageoff // get dyld's mh in to x4
mov x4,sp // x5 has &startGlue
// call dyldbootstrap::start(app_mh, argc, argv, dyld_mh, &startGlue)
bl __ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
mov x16,x0 // save entry point address in x16
#if __LP64__
ldr x1, [sp]
#else
ldr w1, [sp]
#endif
cmp x1, #0
b.ne Lnew
// LC_UNIXTHREAD way, clean up stack and jump to result
#if __LP64__
add sp, x28, #8 // restore unaligned stack pointer without app mh
#else
add sp, x28, #4 // restore unaligned stack pointer without app mh
#endif
#if __arm64e__
braaz x16 // jump to the program's entry point
#else
br x16 // jump to the program's entry point
#endif
// LC_MAIN case, set up stack for call to main()
Lnew: mov lr, x1 // simulate return address into _start in libdyld.dylib
#if __LP64__
ldr x0, [x28, #8] // main param1 = argc
add x1, x28, #16 // main param2 = argv
add x2, x1, x0, lsl #3
add x2, x2, #8 // main param3 = &env[0]
mov x3, x2
Lapple: ldr x4, [x3]
add x3, x3, #8
#else
ldr w0, [x28, #4] // main param1 = argc
add x1, x28, #8 // main param2 = argv
add x2, x1, x0, lsl #2
add x2, x2, #4 // main param3 = &env[0]
mov x3, x2
Lapple: ldr w4, [x3]
add x3, x3, #4
#endif
cmp x4, #0
b.ne Lapple // main param4 = apple
#if __arm64e__
braaz x16
#else
br x16
#endif
#endif // __arm64__ && !TARGET_OS_SIMULATOR
// When iOS 10.0 simulator runs on 10.11, abort_with_payload() does not exist,
// so it falls back and uses dyld_fatal_error().
#if TARGET_OS_SIMULATOR
.text
.align 2
.globl _dyld_fatal_error
_dyld_fatal_error:
#if __arm64__ || __arm64e__
brk #3
#else
int3
#endif
nop
#endif
这里这么多,捡重点研究乱军丛中就一眼看到了__ZN13dyldbootstrap5startEPKN5dyld311MachOLoadedEiPPKcS3_Pm
这一堆乱码,上注有call dyldbootstrap::start
对比堆栈第二条命令刚好对上,而且这条命令上面没有其他b
命令所以不会有分支跳转,必定会调用dyldbootstrap::start
方法,继续搜索,在namespace dyldbootstrap
中找到start
方法:
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{
// Emit kdebug tracepoint to indicate dyld bootstrap has started
dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0);
// if kernel had to slide dyld, we need to fix up load sensitive locations
// we have to do this before using any global variables
rebaseDyld(dyldsMachHeader);
// kernel sets up env pointer to be just past end of agv array
const char** envp = &argv[argc+1];
// kernel sets up apple pointer to be just past end of envp array
const char** apple = envp;
while(*apple != NULL) { ++apple; }
++apple;
// set up random value for stack canary
__guard_setup(apple);
#if DYLD_INITIALIZER_SUPPORT
// run all C++ initializers inside dyld
runDyldInitializers(argc, argv, envp, apple);
#endif
_subsystem_init(apple);
// now that we are done bootstrapping dyld, call dyld's main
uintptr_t appsSlide = appsMachHeader->getSlide();
return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
最终调用的是dyld::_main
,在dyld2.m
文件中找到_main
方法,发现方法的代码很长,那就分代码段大致分析一下:
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[], const char* apple[],
uintptr_t* startGlue)
{
//第一步,设置运行环境
uint8_t mainExecutableCDHashBuffer[20];
const uint8_t* mainExecutableCDHash = nullptr;
if ( const char* mainExeCdHashStr = _simple_getenv(apple, "executable_cdhash") ) {
unsigned bufferLenUsed;
if ( hexStringToBytes(mainExeCdHashStr, mainExecutableCDHashBuffer, sizeof(mainExecutableCDHashBuffer), bufferLenUsed) )
//获取主mach-o程序的hash值
mainExecutableCDHash = mainExecutableCDHashBuffer;
}
//获取当前设备的一些架构信息
getHostInfo(mainExecutableMH, mainExecutableSlide);
//获取主mach-o程序的macho_header结构
sMainExecutableMachHeader = mainExecutableMH;
//获取主mach-o程序的Slide值
sMainExecutableSlide = mainExecutableSlide;
//将platform ID置于所有镜像信息中,方便调试分辩出进行到哪一步了
{
__block bool platformFound = false;
((dyld3::MachOFile*)mainExecutableMH)->forEachSupportedPlatform(^(dyld3::Platform platform, uint32_t minOS, uint32_t sdk) {
if (platformFound) {
halt("MH_EXECUTE binaries may only specify one platform");
}
gProcessInfo->platform = (uint32_t)platform;
platformFound = true;
});
if (gProcessInfo->platform == (uint32_t)dyld3::Platform::unknown) {
// There were no platforms found in the binary. This may occur on macOS for alternate toolchains and old binaries.
// It should never occur on any of our embedded platforms.
#if TARGET_OS_OSX
gProcessInfo->platform = (uint32_t)dyld3::Platform::macOS;
#else
halt("MH_EXECUTE binaries must specify a minimum supported OS version");
#endif
}
}
// Remove interim apple[0] transition code from dyld
if (!sExecPath) sExecPath = apple[0];
#if TARGET_OS_IPHONE && !TARGET_OS_SIMULATOR
// kernel is not passing a real path for main executable
if ( strncmp(sExecPath, "/var/containers/Bundle/Application/", 35) == 0 ) {
if ( char* newPath = (char*)malloc(strlen(sExecPath)+10) ) {
strcpy(newPath, "/private");
strcat(newPath, sExecPath);
sExecPath = newPath;
}
}
#endif
if ( sExecPath[0] != '/' ) {
// have relative path, use cwd to make absolute
char cwdbuff[MAXPATHLEN];
if ( getcwd(cwdbuff, MAXPATHLEN) != NULL ) {
// maybe use static buffer to avoid calling malloc so early...
char* s = new char[strlen(cwdbuff) + strlen(sExecPath) + 2];
strcpy(s, cwdbuff);
strcat(s, "/");
strcat(s, sExecPath);
sExecPath = s;
}
}
// Remember short name of process for later logging
//获取进程名称
sExecShortName = ::strrchr(sExecPath, '/');
if ( sExecShortName != NULL )
++sExecShortName;
else
sExecShortName = sExecPath;
#if TARGET_OS_OSX && __has_feature(ptrauth_calls)
// on Apple Silicon macOS, only Apple signed ("platform binary") arm64e can be loaded
sOnlyPlatformArm64e = true;
// internal builds, or if boot-arg is set, then non-platform-binary arm64e slices can be run
if ( const char* abiMode = _simple_getenv(apple, "arm64e_abi") ) {
if ( strcmp(abiMode, "all") == 0 )
sOnlyPlatformArm64e = false;
}
#endif
//配置进程受限模式
configureProcessRestrictions(mainExecutableMH, envp);
#if TARGET_OS_OSX
if ( !gLinkContext.allowEnvVarsPrint && !gLinkContext.allowEnvVarsPath && !gLinkContext.allowEnvVarsSharedCache ) {
pruneEnvironmentVariables(envp, &apple);
// set again because envp and apple may have changed or moved
//再次设置上下文
setContext(mainExecutableMH, argc, argv, envp, apple);
}
else
#endif
{
// 检测环境变量
checkEnvironmentVariables(envp);
defaultUninitializedFallbackPaths(envp);
}
//---------------第一步结束------------
// load shared cache
// 第二步,加载共享缓存
// 检查共享缓存是否开启,iOS必须开启
checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
#if TARGET_OS_SIMULATOR
if ( sSharedCacheOverrideDir)
mapSharedCache(mainExecutableSlide);
#else
mapSharedCache(mainExecutableSlide);
#endif
}
。。。
try {
// add dyld itself to UUID list
addDyldImageToUUIDList();
。。。
//开启crash信息记录
CRSetCrashLogMessage(sLoadingCrashMessage);
// instantiate ImageLoader for main executable
//第三步实例化主程序
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
gLinkContext.mainExecutable = sMainExecutable;
gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
。。。
// Now that shared cache is loaded, setup an versioned dylib overrides
#if SUPPORT_VERSIONED_PATHS
checkVersionedPaths();
#endif
。。。
// load any inserted libraries
//第四步 加载插入的动态库
if ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib)
loadInsertedDylib(*lib);
}
// link main executable
//第五部 链接主程序
gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
//内存区ASLR保护:主要是对内存进行一个地址映射 将物理地址信息映射成逻辑的,这样lldb调试出来就是逻辑地址
if ( mainExcutableAlreadyRebased ) {
// previous link() on main executable has already adjusted its internal pointers for ASLR
// work around that by rebasing by inverse amount
sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
}
#endif
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
#endif
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
// link any inserted libraries
// do this after linking main executable so that any dylibs pulled in by inserted
// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
//第六步 链接插入动态库
if ( sInsertedDylibCount > 0 ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
image->setNeverUnloadRecursive();
}
if ( gLinkContext.allowInterposing ) {
// only INSERTED libraries can interpose
// register interposing info after all inserted libraries are bound so chaining works
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
image->registerInterposing(gLinkContext);
}
}
}
if ( gLinkContext.allowInterposing ) {
// dyld should support interposition even without DYLD_INSERT_LIBRARIES
for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
ImageLoader* image = sAllImages[i];
if ( image->inSharedCache() )
continue;
image->registerInterposing(gLinkContext);
}
}
// apply interposing to initial set of images
for(int i=0; i < sImageRoots.size(); ++i) {
sImageRoots[i]->applyInterposing(gLinkContext);
}
ImageLoader::applyInterposingToDyldCache(gLinkContext);
// Bind and notify for the main executable now that interposing has been registered
uint64_t bindMainExecutableStartTime = mach_absolute_time();
sMainExecutable->recursiveBindWithAccounting(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true);
uint64_t bindMainExecutableEndTime = mach_absolute_time();
ImageLoaderMachO::fgTotalBindTime += bindMainExecutableEndTime - bindMainExecutableStartTime;
gLinkContext.notifyBatch(dyld_image_state_bound, false);
// Bind and notify for the inserted images now interposing has been registered
if ( sInsertedDylibCount > 0 ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true, nullptr);
}
}
// do weak binding only after all inserted images linked
// 第七步 执行弱符号绑定
sMainExecutable->weakBind(gLinkContext);
gLinkContext.linkingMainExecutable = false;
sMainExecutable->recursiveMakeDataReadOnly(gLinkContext);
CRSetCrashLogMessage("dyld: launch, running initializers");
#if SUPPORT_OLD_CRT_INITIALIZATION
// Old way is to run initializers via a callback from crt1.o
if ( ! gRunInitializersOldWay )
initializeMainExecutable();
#else
// run all initializers 开始调用main()
// 第八步 执行初始化方法
initializeMainExecutable();
#endif
// notify any montoring proccesses that this process is about to enter main()
notifyMonitoringDyldMain();
if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0, 0, 2);
}
ARIADNEDBG_CODE(220, 1);
// find entry point for main executable
// 第九步 查找入口点并返回
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
if ( result != 0 ) {
// main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
else
halt("libdyld.dylib support not present for LC_MAIN");
}
else {
// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
*startGlue = 0;
}
总计九步:
1、设置运行环境
;
2、加载共享缓存
;
3、实例化主程序
;
4、加载插入的动态库
;
5、链接主程序
;
6、链接插入的动态库
;
7、执行弱符号绑定
;
8、执行初始化方法
;
9、查找入口点并返回
(即main()
);
3、重要方法跟踪
dyld:_main
大致步骤了解了,针对堆栈里的调用顺序,必须有针对性的分析其中一些重点方法:
1、设置运行环境
:
//获取当前宿主设备的一些架构信息
static void getHostInfo(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide)
{
#if CPU_SUBTYPES_SUPPORTED
#if __ARM_ARCH_7K__
sHostCPU = CPU_TYPE_ARM;
sHostCPUsubtype = CPU_SUBTYPE_ARM_V7K;
#elif __ARM_ARCH_7A__
sHostCPU = CPU_TYPE_ARM;
sHostCPUsubtype = CPU_SUBTYPE_ARM_V7;
#elif __ARM_ARCH_6K__
sHostCPU = CPU_TYPE_ARM;
sHostCPUsubtype = CPU_SUBTYPE_ARM_V6;
#elif __ARM_ARCH_7F__
sHostCPU = CPU_TYPE_ARM;
sHostCPUsubtype = CPU_SUBTYPE_ARM_V7F;
#elif __ARM_ARCH_7S__
sHostCPU = CPU_TYPE_ARM;
sHostCPUsubtype = CPU_SUBTYPE_ARM_V7S;
#elif __ARM64_ARCH_8_32__
sHostCPU = CPU_TYPE_ARM64_32;
sHostCPUsubtype = CPU_SUBTYPE_ARM64_32_V8;
#elif __arm64e__
sHostCPU = CPU_TYPE_ARM64;
sHostCPUsubtype = CPU_SUBTYPE_ARM64E;
#elif __arm64__
sHostCPU = CPU_TYPE_ARM64;
sHostCPUsubtype = CPU_SUBTYPE_ARM64_V8;
#else
struct host_basic_info info;
mach_msg_type_number_t count = HOST_BASIC_INFO_COUNT;
mach_port_t hostPort = mach_host_self();
kern_return_t result = host_info(hostPort, HOST_BASIC_INFO, (host_info_t)&info, &count);
if ( result != KERN_SUCCESS )
throw "host_info() failed";
sHostCPU = info.cpu_type;
sHostCPUsubtype = info.cpu_subtype;
mach_port_deallocate(mach_task_self(), hostPort);
#if __x86_64__
// host_info returns CPU_TYPE_I386 even for x86_64. Override that here so that
// we don't need to mask the cpu type later.
sHostCPU = CPU_TYPE_X86_64;
#if !TARGET_OS_SIMULATOR
sHaswell = (sHostCPUsubtype == CPU_SUBTYPE_X86_64_H);
// x86_64h: Fall back to the x86_64 slice if an app requires GC.
if ( sHaswell ) {
if ( isGCProgram(mainExecutableMH, mainExecutableSlide) ) {
// When running a GC program on a haswell machine, don't use and 'h slices
sHostCPUsubtype = CPU_SUBTYPE_X86_64_ALL;
sHaswell = false;
gLinkContext.sharedRegionMode = ImageLoader::kDontUseSharedRegion;
}
}
#endif
#endif
#endif
#endif
}
可以知道这个是获取CPU
架构信息的。
//设置状态机、上下文信息
static void setContext(const macho_header* mainExecutableMH, int argc, const char* argv[], const char* envp[], const char* apple[])
{
gLinkContext.loadLibrary = &libraryLocator;
gLinkContext.terminationRecorder = &terminationRecorder;
gLinkContext.flatExportFinder = &flatFindExportedSymbol;
gLinkContext.coalescedExportFinder = &findCoalescedExportedSymbol;
gLinkContext.getCoalescedImages = &getCoalescedImages;
gLinkContext.undefinedHandler = &undefinedHandler;
gLinkContext.getAllMappedRegions = &getMappedRegions;
gLinkContext.bindingHandler = NULL;
gLinkContext.notifySingle = ¬ifySingle;
gLinkContext.notifyBatch = ¬ifyBatch;
gLinkContext.removeImage = &removeImage;
gLinkContext.registerDOFs = dyld3::Loader::dtraceUserProbesEnabled() ? ®isterDOFs : NULL;
gLinkContext.clearAllDepths = &clearAllDepths;
gLinkContext.printAllDepths = &printAllDepths;
gLinkContext.imageCount = &imageCount;
gLinkContext.setNewProgramVars = &setNewProgramVars;
gLinkContext.inSharedCache = &inSharedCache;
gLinkContext.setErrorStrings = &setErrorStrings;
#if SUPPORT_OLD_CRT_INITIALIZATION
gLinkContext.setRunInitialzersOldWay= &setRunInitialzersOldWay;
#endif
gLinkContext.findImageContainingAddress = &findImageContainingAddress;
gLinkContext.addDynamicReference = &addDynamicReference;
#if SUPPORT_ACCELERATE_TABLES
gLinkContext.notifySingleFromCache = ¬ifySingleFromCache;
gLinkContext.getPreInitNotifyHandler= &getPreInitNotifyHandler;
gLinkContext.getBoundBatchHandler = &getBoundBatchHandler;
#endif
gLinkContext.bindingOptions = ImageLoader::kBindingNone;
gLinkContext.argc = argc;
gLinkContext.argv = argv;
gLinkContext.envp = envp;
gLinkContext.apple = apple;
gLinkContext.progname = (argv[0] != NULL) ? basename(argv[0]) : "";
gLinkContext.programVars.mh = mainExecutableMH;
gLinkContext.programVars.NXArgcPtr = &gLinkContext.argc;
gLinkContext.programVars.NXArgvPtr = &gLinkContext.argv;
gLinkContext.programVars.environPtr = &gLinkContext.envp;
gLinkContext.programVars.__prognamePtr=&gLinkContext.progname;
gLinkContext.mainExecutable = NULL;
gLinkContext.imageSuffix = NULL;
gLinkContext.dynamicInterposeArray = NULL;
gLinkContext.dynamicInterposeCount = 0;
gLinkContext.prebindUsage = ImageLoader::kUseAllPrebinding;
gLinkContext.sharedRegionMode = ImageLoader::kUseSharedRegion;
}
这一步才是的设置运行环境的重要部分,包括一些回调函数、参数、标志信息等。设置的回调函数都是dyld
模块自身实现的,如loadLibrary()
函数实际调用的是libraryLocator()
,负责加载动态库。
2、加载共享缓存
:
主要方法只有mapSharedCache(mainExecutableSlide)
通过上一步骤的mainExecutableSlide
为参数,加载共享混存。
3、实例化主程序
:
// The kernel maps in main executable before dyld gets control. We need to
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
// try mach-o loader
// if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return (ImageLoaderMachO*)image;
// }
// throw "main executable not a known format";
}
这一步就是实例化一个ImageLoader
。instantiateFromLoadedImage()
调用ImageLoaderMachO::instantiateMainExecutable()
实例化主程序的ImageLoader
,主要参数是macho_header
、mainExecutableSlide
、主进程路径
和另外一个上下文;可以知道主Mach-O
经过封装后就是一个image
。
4、加载插入的动态库
:
static void loadInsertedDylib(const char* path)
{
unsigned cacheIndex;
try {
LoadContext context;
context.useSearchPaths = false;
context.useFallbackPaths = false;
context.useLdLibraryPath = false;
context.implicitRPath = false;
context.matchByInstallName = false;
context.dontLoad = false;
context.mustBeBundle = false;
context.mustBeDylib = true;
context.canBePIE = false;
context.origin = NULL; // can't use @loader_path with DYLD_INSERT_LIBRARIES
context.rpath = NULL;
load(path, context, cacheIndex);
}
catch (const char* msg) {
if ( gLinkContext.allowInsertFailures )
dyld::log("dyld: warning: could not load inserted library '%s' into hardened process because %s\n", path, msg);
else
halt(dyld::mkstringf("could not load inserted library '%s' because %s\n", path, msg));
}
catch (...) {
halt(dyld::mkstringf("could not load inserted library '%s'\n", path));
}
}
可以知道是通过动态库的路径,一些库的属性也在context
中设置了,将库文件加载到内存中,load(path, context, cacheIndex)
返回的同样也是ImageLoader
类型的image
镜像。
5、链接主程序
:
这一步调用link()
函数将实例化后的主程序进行动态修正,让二进制变为可正常执行的状态。link()
函数内部调用了ImageLoader::link()
函数,从源代码可以看到,这一步主要做了以下几个事情:
recursiveLoadLibraries()
根据LC_LOAD_DYLIB
加载命令把所有依赖库加载进内存;
recursiveUpdateDepth()
递归刷新依赖库的层级;
recursiveRebase()
由于ASLR
的存在,必须递归对主程序以及依赖库进行重定位操作;
recursiveBind()
把主程序二进制和依赖进来的动态库全部执行符号表绑定;
weakBind()
如果链接的不是主程序二进制的话,会在此时执行弱符号绑定,主程序二进制则在link()
完后再执行弱符号绑定,后面第七步会涉及;
recursiveGetDOFSections()
、context.registerDOFs()
注册DOF(DTrace Object Format)
。
ImageLoader::link()
源代码如下:
void ImageLoader::link(const LinkContext& context, bool forceLazysBound, bool preflightOnly, bool neverUnload, const RPathChain& loaderRPaths, const char* imagePath)
{
//dyld::log("ImageLoader::link(%s) refCount=%d, neverUnload=%d\n", imagePath, fDlopenReferenceCount, fNeverUnload);
// clear error strings
(*context.setErrorStrings)(0, NULL, NULL, NULL);
uint64_t t0 = mach_absolute_time();
this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths, imagePath);
context.notifyBatch(dyld_image_state_dependents_mapped, preflightOnly);
// we only do the loading step for preflights
if ( preflightOnly )
return;
uint64_t t1 = mach_absolute_time();
context.clearAllDepths();
this->updateDepth(context.imageCount());
__block uint64_t t2, t3, t4, t5;
{
dyld3::ScopedTimer(DBG_DYLD_TIMING_APPLY_FIXUPS, 0, 0, 0);
t2 = mach_absolute_time();
this->recursiveRebaseWithAccounting(context);
context.notifyBatch(dyld_image_state_rebased, false);
t3 = mach_absolute_time();
if ( !context.linkingMainExecutable )
this->recursiveBindWithAccounting(context, forceLazysBound, neverUnload);
t4 = mach_absolute_time();
if ( !context.linkingMainExecutable )
this->weakBind(context);
t5 = mach_absolute_time();
}
// interpose any dynamically loaded images
if ( !context.linkingMainExecutable && (fgInterposingTuples.size() != 0) ) {
dyld3::ScopedTimer timer(DBG_DYLD_TIMING_APPLY_INTERPOSING, 0, 0, 0);
this->recursiveApplyInterposing(context);
}
// now that all fixups are done, make __DATA_CONST segments read-only
if ( !context.linkingMainExecutable )
this->recursiveMakeDataReadOnly(context);
if ( !context.linkingMainExecutable )
context.notifyBatch(dyld_image_state_bound, false);
uint64_t t6 = mach_absolute_time();
if ( context.registerDOFs != NULL ) {
std::vector dofs;
this->recursiveGetDOFSections(context, dofs);
context.registerDOFs(dofs);
}
uint64_t t7 = mach_absolute_time();
// clear error strings
(*context.setErrorStrings)(0, NULL, NULL, NULL);
fgTotalLoadLibrariesTime += t1 - t0;
fgTotalRebaseTime += t3 - t2;
fgTotalBindTime += t4 - t3;
fgTotalWeakBindTime += t5 - t4;
fgTotalDOF += t7 - t6;
// done with initial dylib loads
fgNextPIEDylibAddress = 0;
}
在link()
方法中可以知道有一个addImage(image)
,还有一个addRootImage(image)
方法,其主要操作都是对sAllImages
或者sImageRoots
进行push_back
压栈操作,可以知道sAllImages
是所有镜像的一个存储栈。
6、链接插入的动态库
:
与主程序Mach-O
链接类似,循环对sAllImages
中的插入的动态库镜像进行链接,都是调用link()
方法,与主程序不同的是还调用了镜像的其他方法,如:setNeverUnloadRecursive()
,registerInterposing(gLinkContext)
,而且共享内存的image
不会调用registerInterposing(gLinkContext)
方法。
最后每个sImageRoots
成员即初始化的几个image
将调用applyInterposing
7、执行弱符号绑定
:
weakBind()
首先通过getCoalescedImages()
合并所有动态库的弱符号到一个列表里,然后调用initializeCoalIterator()
对需要绑定的弱符号进行排序,接着调用incrementCoalIterator()
读取dyld_info_command
结构的weak_bind_off
和weak_bind_size
字段,确定弱符号的数据偏移与大小,最终进行弱符号绑定。这里不是算是dyld
流程分析重点,有关弱符号
定义可以参考查漏补缺。
8、执行初始化方法
:
这一步由initializeMainExecutable()
完成。dyld
会优先初始化动态库,然后初始化主程序。该函数首先执行runInitializers()
,内部再依次调用processInitializers()
、recursiveInitialization()
。我们全局搜索recursiveInitialization()
函数(位于ImageLoader.cpp
文件中):
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
recursive_lock lock_info(this_thread);
recursiveSpinLock(lock_info);
if ( fState < dyld_image_state_dependents_initialized-1 ) {
uint8_t oldState = fState;
// break cycles
fState = dyld_image_state_dependents_initialized-1;
try {
// initialize lower level libraries first
for(unsigned int i=0; i < libraryCount(); ++i) {
ImageLoader* dependentImage = libImage(i);
if ( dependentImage != NULL ) {
// don't try to initialize stuff "above" me yet
if ( libIsUpward(i) ) {
uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) };
uninitUps.count++;
}
else if ( dependentImage->fDepth >= fDepth ) {
dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps);
}
}
}
// record termination order
if ( this->needsTermination() )
context.terminationRecorder(this);
// let objc know we are about to initialize this image
uint64_t t1 = mach_absolute_time();
fState = dyld_image_state_dependents_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
// initialize this image
bool hasInitializers = this->doInitialization(context);
// let anyone know we finished initializing this image
fState = dyld_image_state_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_initialized, this, NULL);
if ( hasInitializers ) {
uint64_t t2 = mach_absolute_time();
timingInfo.addTime(this->getShortName(), t2-t1);
}
}
catch (const char* msg) {
// this image is not initialized
fState = oldState;
recursiveSpinUnLock();
throw;
}
}
recursiveSpinUnLock();
}
找到 context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
而且这一句下面还有一句代码也比较重要bool hasInitializers = this->doInitialization(context);
,这里先分析notifySingle
方法,全局搜索找到源码,在dyld2.cpp
找到notifySingle()
函数:
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
//dyld::log("notifySingle(state=%d, image=%s)\n", state, image->getPath());
std::vector* handlers = stateToHandlers(state, sSingleHandlers);
if ( handlers != NULL ) {
dyld_image_info info;
info.imageLoadAddress = image->machHeader();
info.imageFilePath = image->getRealPath();
info.imageFileModDate = image->lastModified();
for (std::vector::iterator it = handlers->begin(); it != handlers->end(); ++it) {
const char* result = (*it)(state, 1, &info);
if ( (result != NULL) && (state == dyld_image_state_mapped) ) {
//fprintf(stderr, " image rejected by handler=%p\n", *it);
// make copy of thrown string so that later catch clauses can free it
const char* str = strdup(result);
throw str;
}
}
}
if ( state == dyld_image_state_mapped ) {
// Save load addr + UUID for images from outside the shared cache
// Include UUIDs for shared cache dylibs in all image info when using private mapped shared caches
if (!image->inSharedCache()
|| (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion)) {
dyld_uuid_info info;
if ( image->getUUID(info.imageUUID) ) {
info.imageLoadAddress = image->machHeader();
addNonSharedCacheImageUUID(info);
}
}
}
if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
uint64_t t0 = mach_absolute_time();
dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
uint64_t t1 = mach_absolute_time();
uint64_t t2 = mach_absolute_time();
uint64_t timeInObjC = t1-t0;
uint64_t emptyTime = (t2-t1)*100;
if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
timingInfo->addTime(image->getShortName(), timeInObjC);
}
}
// mach message csdlc about dynamically unloaded images
if ( image->addFuncNotified() && (state == dyld_image_state_terminated) ) {
notifyKernel(*image, false);
const struct mach_header* loadAddress[] = { image->machHeader() };
const char* loadPath[] = { image->getPath() };
notifyMonitoringDyld(true, 1, loadAddress, loadPath);
}
}
在方法中找到了(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
,可以肯定的是sNotifyObjCInit
就是一个函数指针,这个指针在何时赋值需要搜索一下,全局搜索sNotifyObjCInit
:
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
// call 'mapped' function with all images mapped so far
try {
notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);
}
catch (const char* msg) {
// ignore request to abort during registration
}
// call 'init' function on all images already init'ed (below libSystem)
for (std::vector::iterator it=sAllImages.begin(); it != sAllImages.end(); it++) {
ImageLoader* image = *it;
if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) {
dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
}
}
}
在dyld2.cpp
找到了registerObjCNotifiers
赋值语句,但是什么时候调用registerObjCNotifiers
,继续搜索registerObjCNotifiers
,在dyldAPI.cpp
文件中找到_dyld_objc_notify_register
调用了registerObjCNotifiers
方法;继续搜索_dyld_objc_notify_register
,确实可以在dyldAPIsLibSystem.cpp
找到_dyld_objc_notify_register
方法的实现,发现只是一个对代码版本控制,其实质还是对_objcNotifyInit
赋值,与之前的分析雷同,所以这里就是搜索不到调用_dyld_objc_notify_register
方法的地方了,所以只有用符号断点了。
在任意一个app
工程中,添加:
lldb
输入bt
命令,查看调用栈得到:
_dyld_objc_notify_register
方法之前调用的是libobjc.A.dylib
库中的_objc_init
方法。这个库很熟悉啊,就是前几篇文章分析过的,直接来一个全局的搜_objc_init
,在objc-os.mm
文件中找到:
void _objc_init(void)
{
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
environ_init();
tls_init();
static_init();
runtime_init();
exception_init();
cache_init();
_imp_implementationWithBlock_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
#if __OBJC2__
didCallDyldNotifyRegister = true;
#endif
}
果然发现调用了_dyld_objc_notify_register
方法,对比传入参数的声明:
void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,
_dyld_objc_notify_init init,
_dyld_objc_notify_unmapped unmapped);
发现init
就是传入的load_images
方法,所以归根结底sNotifyObjCInit
这个函数指针指向的就是libobjc.A.dylib
库的objc-runtime-new.mm
文件中的load_images
方法,分析下load_images
源码:
void
load_images(const char *path __unused, const struct mach_header *mh)
{
if (!didInitialAttachCategories && didCallDyldNotifyRegister) {
didInitialAttachCategories = true;
loadAllCategories();
}
// Return without taking locks if there are no +load methods here.
if (!hasLoadMethods((const headerType *)mh)) return;
recursive_mutex_locker_t lock(loadMethodLock);
// Discover load methods
{
mutex_locker_t lock2(runtimeLock);
prepare_load_methods((const headerType *)mh);
}
// Call +load methods (without runtimeLock - re-entrant)
call_load_methods();
}
会先进行分类Category
的加载,这里就发现分类是运行时加载到内存的;后面就是prepare_load_methods
——找到load
方法,通过Mach-O
文件的header
找到相关的+load
方法信息,最后调用call_load_methods()
方法,调用所有的+load
方法:
void call_load_methods(void)
{
static bool loading = NO;
bool more_categories;
loadMethodLock.assertLocked();
// Re-entrant calls do nothing; the outermost call will finish the job.
if (loading) return;
loading = YES;
void *pool = objc_autoreleasePoolPush();
do {
// 1. Repeatedly call class +loads until there aren't any more
while (loadable_classes_used > 0) {
call_class_loads();
}
// 2. Call category +loads ONCE
more_categories = call_category_loads();
// 3. Run more +loads if there are classes OR more untried categories
} while (loadable_classes_used > 0 || more_categories);
objc_autoreleasePoolPop(pool);
loading = NO;
}
通过源码可以知道,不止是类中的+load
方法,分类中的+load
方法也会调用,其原理就是循环找到可加载类中的+load
方法并调用(*load_method)(cls, @selector(load))
,朴实且枯燥。
可以知道对应之前的load方法堆栈
这张图,确实发现是load_images
之后就进入了load
方法的断点,对上了,没毛病。
这里又有问题了,我们知道的load_images
方法是_objc_init
方法中实现赋值给sNotifyObjCInit
函数指针,调用sNotifyObjCInit
函数指针所指的方法是在第八步执行初始化方法initializeMainExecutable()
中调用,也就是和load方法堆栈
图的调用流程一模一样的,但是啥时候调用_objc_init
方法了,按照道理应该在notifySingle
方法调用前,那就只有跑一下了,在之前调试的objc-787.1
(最新为objc-818.2
)源码中,在_objc_init
方法加入断点,断住后输入bt
命令查看堆栈:
initializeMainExecutable()
中调用的,只不过从我们之前留下的bool hasInitializers = this->doInitialization(context);
中调用的,研究一下doInitialization()
,可以发现主要有两个方法doImageInit(context)
和doModInitFunctions(context)
,在doImageInit(context)
存在一句代码:
if ( ! dyld::gProcessInfo->libSystemInitialized ) {
// libSystem initializer must run first
dyld::throwf("-init function in image (%s) that does not link with libSystem.dylib\n", this->getPath());
}
如果以为这调用这个方法如果libSystem
库没被运行的话将会报错,同样的在doModInitFunctions(context)
的源码中也会有这段代码,可见这两个方法调用的前提都是libSystem
库已经运行了。由于网上很多文章介绍的都是doModInitFunctions
将会加载所有cxx
文件,也就是c++
的方法都会调用,这里由于无法直接在源码中判断,只有在func2
中打上断点然后进入断点后lldb
命令bt
打印堆栈:
doModInitFunctions
调用后就对fuc2
进行了调用,验证成功。
回到主流程探究
_objc_init
方法的调用,在doModInitFunctions
方法中会常见到
// now safe to use malloc() and other calls in libSystem.dylib
dyld::gProcessInfo->libSystemInitialized = true;
按照注解会调用libSystem
,_objc_init调用堆栈
提示的是调用的是libSystem.B.dylib
库中的libSystem_initializer
方法,找到这个源码libSystem,全局搜索可以在init.c
文件中找到libSystem_initializer
方法,继续对照堆栈和源码分析,可以在这个方法中找到调用的libdispatch_init()
方法,这个方法全局是搜不到的,因为在libdispatch.dylib
库中。
同样的找到源码libdispatch,全局搜索_os_object_init
方法,在object.m
文件中找到了这个方法:
void
_os_object_init(void)
{
_objc_init();
Block_callbacks_RR callbacks = {
sizeof(Block_callbacks_RR),
(void (*)(const void *))&objc_retain,
(void (*)(const void *))&objc_release,
(void (*)(const void *))&_os_objc_destructInstance
};
_Block_use_RR2(&callbacks);
#if DISPATCH_COCOA_COMPAT
const char *v = getenv("OBJC_DEBUG_MISSING_POOLS");
if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
v = getenv("DISPATCH_DEBUG_MISSING_POOLS");
if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
v = getenv("LIBDISPATCH_DEBUG_MISSING_POOLS");
if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
#endif
}
可见第一句代码就是_objc_init()
方法的调用,这样方法调用链就基本上完整了。
9、查找入口点并返回
:
在_main
方法中的查找入口的方法一个是getEntryFromLC_MAIN
,如果这个方法结果返回为失败就调用getEntryFromLC_UNIXTHREAD
方法,最后会将result
返回,这样就回到了调用dyld::_main
方法的地方,而调用dyld::_main
方法就是在dyldbootstrap::start
方法中,可以知道dyldbootstrap::start
方法的调用是在_dyld_start
方法,_dyld_start
方法是汇编代码,那就回到对这个汇编代码的分析:
main()
方法的部分后面参数都比较清晰,而且也可以知道main
是主程序Mach-O
镜像文件中写死了的,如果我们改变main.m
中的main
方法名称,会导致调用不到main
方法报错。
3、APP加载流程总结
总结一下上面的叙述:
之前的APP加载的大致流程是:APP启动
——> dyld加载
——>load方法调用
——>constructor类型c/c++的方法调用
——>main调用
。
现在更新为:
其中这个图还缺少
dyld加载
——>load方法调用
这一步详细介绍,需要补充的是:
load方法调用
:_objc_init
——>load_images
——>call_load_methods
——>+load