0x1 开始
Anddroid上的ART从5.0之后变成默认的选择,可见ART的重要性,目前关于Dalvik Hook方面研究的文章很多,但我在网上却找不到关于ART Hook相关的文章,甚至连鼎鼎大名的XPosed和Cydia Substrate到目前为止也不支持ART的Hook。当然我相信,技术方案他们肯定是的,估计卡在机型适配上的了。
既然网上找不到相关的资料,于是我决定自己花些时间去研究一下,终于黃天不负有心人,我找到了一个切实可行的方法,即本文所介绍的方法。
应该说明的是本文所介绍的方法肯定不是最好的,但大家看完本文之后,如果能启发大家找到更好的ART Hook方法,那我抛砖引玉的目的就达到了。废话不多说,我们开始吧。
运行环境: 4.4.2 ART模式的模拟器
开发环境: Mac OS X 10.10.3
0x2 ART类方法加载及执行
在ART中类方法的执行要比在Dalvik中要复杂得多,Dalvik如果除去JIT部分,可以理解为是一个解析执行的虚拟机,而ART则同时包含本地指令执行和解析执行两种模式,同时所生成的oat文件也包含两种类型,分别是portable和quick。portable和quick的主要区别是对于方法的加载机制不相同,quick大量使用了Lazy Load机制,因此应用的启动速度更快,但加载流程更复杂。其中quick是作为默认选项,因此本文所涉及的技术分析都是基于quick类型的。
由于ART存在本地指令执行和解析执行两种模式,因此类方法之间并不是能直接跳转的,而是通过一些预先定义的bridge函数进行状态和上下文的切换,这里引用一下老罗博客中的示意图:
当执行某个方法时,如果当前是本地指令执行模式,则会执行ArtMethod::GetEntryPointFromCompiledCode() 指向的函数,否则则执行ArtMethod::GetEntryPointFromInterpreter() 指向的函数。因此每个方法,都有两个入口点,分别保存在ArtMethod::entry_point_from_compiled_code_ 和ArtMethod::entry_point_from_interpreter_ 。了解这一点非常重要,后面我们主要就是在这两个入口做文章。
在讲述原理之前,需要先把以下两个流程了解清楚,这里的内容要展开是非常庞大的,我针对Hook的关键点,简明扼要的描述一下,但还是强烈建议大家去老罗的博客里细读一下其中关于ART的几篇文章。
这个过程发生在oat被装载进内存并进行类方法链接的时候,类方法链接的代码在art/runtime/class_linker.cc中的LinkCode,如下所示:
static void LinkCode(SirtRef< mirror::ArtMethod >& method, const OatFile::OatClass * oat_class, uint32_t method_index)
SHARED_LOCKS_REQUIRED(Locks::mutator_lock_ ) {
DCHECK(method-> GetEntryPointFromCompiledCode() == NULL );
const OatFile::OatMethod oat_method = oat_class-> GetOatMethod(method_index);
oat_method. LinkMethod(method. get());
Runtime* runtime = Runtime::Current ();
bool enter_interpreter = NeedsInterpreter(method. get(), method-> GetEntryPointFromCompiledCode());
if (enter_interpreter) {
method-> SetEntryPointFromInterpreter(interpreter::artInterpreterToInterpreterBridge );
} else {
method-> SetEntryPointFromInterpreter(artInterpreterToCompiledCodeBridge);
}
if (method-> IsAbstract()) {
method-> SetEntryPointFromCompiledCode(GetCompiledCodeToInterpreterBridge());
return ;
}
if (method-> IsStatic() && ! method-> IsConstructor()) {
method-> SetEntryPointFromCompiledCode(GetResolutionTrampoline(runtime-> GetClassLinker()));
} else if (enter_interpreter) {
method-> SetEntryPointFromCompiledCode(GetCompiledCodeToInterpreterBridge());
}
if (method-> IsNative()) {
method-> UnregisterNative(Thread ::Current ());
}
runtime-> GetInstrumentation()-> UpdateMethodsCode(method. get(),method-> GetEntryPointFromCompiledCode());
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
通过上面的代码我们可以得到,一个ArtMethod的入口主要有以下几种:
Interpreter2Interpreter对应artInterpreterToInterpreterBridge(art/runtime/interpreter/interpreter.cc);
Interpreter2CompledCode对应artInterpreterToCompiledCodeBridge(/art/runtime/entrypoints/interpreter/interpreter_entrypoints.cc);
CompliedCode2Interpreter对应art_quick_to_interpreter_bridge(art/runtime/arch/arm/quick_entrypoints_arm.S);
CompliedCode2ResolutionTrampoline对应art_quick_resolution_trampoline(art/runtime/arch/arm/quick_entrypoints_arm.S);
CompliedCode2CompliedCode这个入口是直接指向oat中的指令,详细可见OatMethod::LinkMethod;
其中调用约定主要有两种,分别是:
typedef void (EntryPointFromInterpreter )(Thread* self, MethodHelper& mh, const DexFile::CodeItem* code_item, ShadowFrame* shadow_frame, JValue* result), 这种对应上述1,3两种入口;
剩下的2,4,5三种入口对应的是CompledCode的入口,代码中并没有直接给出,但我们通过分析ArtMethod::Invoke的方法调用,就可以知道其调用约定了。Invoke过程中会调用art_quick_invoke_stub(/art/runtime/arch/arm/quick_entrypoints_arm.S),代码如下所示:
ENTRY art_quick_invoke_stub
push {r0 , r4 , r5 , r9 , r11 , lr} @ spill regs
.save {r0 , r4 , r5 , r9 , r11 , lr}
.pad #24
.cfi _adjust_cfa_offset 24
.cfi _rel_offset r0 , 0
.cfi _rel_offset r4 , 4
.cfi _rel_offset r5 , 8
.cfi _rel_offset r9 , 12
.cfi _rel_offset r11 , 16
.cfi _rel_offset lr, 20
mov r11 , sp @ save the stack pointer
.cfi _def_cfa_register r11
mov r9 , r3 @ move managed thread pointer into r9
mov r4 , #SUSPEND_CHECK_INTERVAL @ reset r4 to suspend check interval
add r5 , r2 , #16 @ create space for method pointer in frame
and r5 , #0xFFFFFFF0 @ align frame size to 16 bytes
sub sp, r5 @ reserve stack space for argument array
add r0 , sp, #4 @ pass stack pointer + method ptr as dest for memcpy
bl memcpy @ memcpy (dest, src, bytes)
ldr r0 , [r11 ] @ restore method*
ldr r1 , [sp, #4] @ copy arg value for r1
ldr r2 , [sp, #8] @ copy arg value for r2
ldr r3 , [sp, #12] @ copy arg value for r3
mov ip, #0 @ set ip to 0
str ip, [sp] @ store NULL for method* at bottom of frame
ldr ip, [r0 , #METHOD_CODE_OFFSET] @ get pointer to the code
blx ip @ call the method
mov sp, r11 @ restore the stack pointer
ldr ip, [sp, #24] @ load the result pointer
strd r0 , [ip] @ store r0 /r1 into result pointer
pop {r0 , r4 , r5 , r9 , r11 , lr} @ restore spill regs
.cfi _adjust_cfa_offset -24
bx lr
END art_quick_invoke_stub
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
“ldr ip, [r0, #METHOD_CODE_OFFSET]”其实就是把ArtMethod::entry_point_from_compiled_code_赋值给ip,然后通过blx直接调用。通过这段小小的汇编代码,我们得出如下堆栈的布局:
-(low)
| caller(Method *) | <- sp
| arg1 | <- r1
| arg2 | <- r2
| arg3 | <- r3
| ... |
| argN |
| callee (Method *) | <- r0
+(high)
这种调用约定并不是平时我们所见的调用约定,主要体现在参数当超过4时,并不是从sp开始保存,而是从sp + 20这个位置开始存储,所以这就是为什么在代码里entry_point_from_compiled_code_的类型是void *的原因了,因为无法用代码表示。
理解好这个调用约定对我们方案的实现至头重要 。
上面详细讲述了类方法加载和链接的过程,但在实际执行的过程中,其实还不是直接调用ArtMethod的entry_point(解析执行和本地指令执行的入口),为了加快执行速度,ART为oat文件中的每个dex创建了一个DexCache(art/runtime/mirror/dex_cache.h)结构,这个结构会按dex的结构生成一系列的数组,这里我们只分析它里面的methods字段。 DexCache初始化的方法是Init,实现如下:
void DexCache::Init(const DexFile* dex_file,
String* location,
ObjectArray* strings,
ObjectArray* resolved_types,
ObjectArray* resolved_methods,
ObjectArray* resolved_fields,
ObjectArray* initialized_static_storage) {
//...
//...
Runtime* runtime = Runtime::Current();
if (runtime->HasResolutionMethod()) {
// Initialize the resolve methods array to contain trampolines for resolution.
ArtMethod* trampoline = runtime->GetResolutionMethod();
size_t length = resolved_methods->GetLength();
for (size_t i = 0 ; i < length; i++) {
resolved_methods->SetWithoutChecks(i, trampoline);
}
}
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
根据dex方法的个数,产生相应长度resolved_methods数组,然后每一个都用Runtime::GetResolutionMethod()返回的结果进行填充,这个方法是由Runtime::CreateResolutionMethod产生的,代码如下:
mirror::ArtMethod * Runtime::CreateResolutionMethod () {
mirror::Class * method_class = mirror::ArtMethod ::GetJavaLangReflectArtMethod ();
Thread * self = Thread ::Current ();
SirtRef< mirror::ArtMethod >
method(self , down_cast< mirror::ArtMethod *> (method_class-> AllocObject(self )));
method-> SetDeclaringClass(method_class);
method-> SetDexMethodIndex(DexFile::kDexNoIndex );
Runtime* r = Runtime::Current ();
ClassLinker* cl = r-> GetClassLinker();
method-> SetEntryPointFromCompiledCode(r-> IsCompiler() ? NULL : GetResolutionTrampoline(cl));
return method. get();
}
从method->SetDexMethodIndex(DexFile::kDexNoIndex)这句得知,所有的ResolutionMethod的methodIndexDexFile::kDexNoIndex。而ResolutionMethod的entrypoint就是我们上面入口分析中的第4种情况,GetResolutionTrampoline最终返回的入口为art_quick_resolution_trampoline(art/runtime/arch/arm/quick_entrypoints_arm.S)。我们看一下其实现代码:
.extern artQuickResolutionTrampoline
ENTRY art_quick_resolution_trampoline
SETUP_REF_AND_ARGS_CALLEE_SAVE_FRAME
mov r2 , r9 @ pass Thread::Current
mov r3 , sp @ pass SP
blx artQuickResolutionTrampoline @ (Method* called, receiver, Thread*, SP)
cbz r0 , 1 f @ is code pointer null? goto exception
mov r12 , r0
ldr r0 , [sp, #0] @ load resolved method in r0
ldr r1 , [sp, #8] @ restore non-callee save r1
ldrd r2 , [sp, #12] @ restore non-callee saves r2-r3
ldr lr, [sp, #44] @ restore lr
add sp, #48 @ rewind sp
.cfi _adjust_cfa_offset -48
bx r12 @ tail-call into actual code
1 :
RESTORE_REF_AND_ARGS_CALLEE_SAVE_FRAME
DELIVER_PENDING_EXCEPTION
END art_quick_resolution_trampoline
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
调整好寄存器后,直接跳转至artQuickResolutionTrampoline(art/runtime/entrypoints/quick/quick_trampoline_entrypoints.cc),接下来我们分析这个方法的实现(大家不要晕了。。。,我会把无关紧要的代码去掉):
// Lazily resolve a method for quick. Called by stub code.
extern "C" const void* artQuickResolutionTrampoline(mirror::ArtMethod* called,
mirror::Object* receiver,
Thread* thread, mirror::ArtMethod** sp)
SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {
FinishCalleeSaveFrameSetup(thread, sp, Runtime::kRefsAndArgs);
// Start new JNI local reference state
JNIEnvExt* env = thread->GetJniEnv();
ScopedObjectAccessUnchecked soa(env);
ScopedJniEnvLocalRefState env_state(env);
const char* old_cause = thread->StartAssertNoThreadSuspension("Quick method resolution set up" );
// Compute details about the called method (avoid GCs)
ClassLinker* linker = Runtime::Current()->GetClassLinker();
mirror::ArtMethod* caller = QuickArgumentVisitor::GetCallingMethod(sp);
InvokeType invoke_type;
const DexFile* dex_file;
uint32_t dex_method_idx;
if (called->IsRuntimeMethod()) {
//...
//...
} else {
invoke_type = kStatic;
dex_file = &MethodHelper(called).GetDexFile();
dex_method_idx = called->GetDexMethodIndex();
}
//...
// Resolve method filling in dex cache.
if (called->IsRuntimeMethod()) {
called = linker->ResolveMethod(dex_method_idx, caller, invoke_type);
}
const void* code = NULL ;
if (LIKELY(!thread->IsExceptionPending())) {
//...
linker->EnsureInitialized(called_class, true, true);
//...
}
// ...
return code;
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
inline bool ArtMethod::IsRuntimeMethod() const {
return GetDexMethodIndex() == DexFile::kDexNoIndex;
}
called->IsRuntimeMethod()用于判断当前方法是否为ResolutionMethod。如果是,那么就走ClassLinker::ResolveMethod流程去获取真正的方法,见代码:
mirror::ArtMethod * ClassLinker::ResolveMethod (const DexFile& dex_file,
uint32_t method_idx,
mirror::DexCache * dex_cache,
mirror::ClassLoader * class_loader,
const mirror::ArtMethod * referrer ,
InvokeType type ) {
DCHECK(dex_cache != NULL );
mirror::ArtMethod * resolved = dex_cache-> GetResolvedMethod(method_idx);
if (resolved != NULL ) {
return resolved;
}
const DexFile::MethodId & method_id = dex_file. GetMethodId(method_idx);
mirror::Class * klass = ResolveType(dex_file, method_id. class_idx_, dex_cache, class_loader);
if (klass == NULL ) {
DCHECK(Thread ::Current ()-> IsExceptionPending());
return NULL ;
}
switch (type ) {
case kDirect:
case kStatic:
resolved = klass-> FindDirectMethod(dex_cache, method_idx);
break;
case kInterface:
resolved = klass-> FindInterfaceMethod(dex_cache, method_idx);
DCHECK(resolved == NULL || resolved-> GetDeclaringClass()-> IsInterface());
break;
case kSuper:
case kVirtual:
resolved = klass-> FindVirtualMethod(dex_cache, method_idx);
break;
default:
LOG (FATAL) << "Unreachable - invocation type: " << type ;
}
if (resolved == NULL ) {
const char* name = dex_file. StringDataByIdx(method_id. name_idx_);
std::string signature(dex_file. CreateMethodSignature(method_id. proto_idx_, NULL ));
switch (type ) {
case kDirect:
case kStatic:
resolved = klass-> FindDirectMethod(name, signature);
break;
case kInterface:
resolved = klass-> FindInterfaceMethod(name, signature);
DCHECK(resolved == NULL || resolved-> GetDeclaringClass()-> IsInterface());
break;
case kSuper:
case kVirtual:
resolved = klass-> FindVirtualMethod(name, signature);
break;
}
}
if (resolved != NULL ) {
dex_cache-> SetResolvedMethod(method_idx, resolved);
return resolved;
} else {
}
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
其实这里发生了“连锁反应”,ClassLinker::ResolveType走的流程,跟ResolveMethod是非常类似的,有兴趣的朋友可以跟一下。 找到解析后的klass,再经过一轮疯狂的搜索,把找到的resolved通过DexCache::SetResolvedMethod覆盖掉之前的“替身”。当再下次再通过ResolveMethod解析方法时,就可以直接把该方法返回,不需要再解析了。
我们回过头来再重新“复现”一下这个过程,当我们首次调用某个类方法,其过程如下所示:
调用ResolutionMethod的entrypoint,进入art_quick_resolution_trampoline;
art_quick_resolution_trampoline跳转到artQuickResolutionTrampoline;
artQuickResolutionTrampoline调用ClassLinker::ResolveMethod解析类方法;
ClassLinker::ResolveMethod调用ClassLinkder::ResolveType解析类,再从解析好的类寻找真正的方法;
调用DexCache::SetResolvedMethod,用真正的方法覆盖掉“替身”方法;
调用真正方法的entrypoint代码;
也许你会问,为什么要把过程搞得这么绕? 一切都是为了延迟加载,提高启动速度,这个过程跟ELF Linker的PLT/GOT符号重定向的过程是何其相似啊,所以技术都是想通的,一通百明。
0x3 Hook ArtMethod
通过上述ArtMethod加载和执行两个流程的分析,对于如何Hook ArtMethod,我想到了两个方案,分别
修改DexCach里的methods,把里面的entrypoint修改为自己的,做一个中转处理;
直接修改加载后的ArtMethod的entrypoint,同样做一个中转处理;
上面两个方法都是可行的,但由于我希望整个项目可以在NDK环境(而不是在源码下)下编译,因为就采用了方案2,因为通过JNI的接口就可以直接获取解析之后的ArtMethod,可以减少很多文件依赖。
回到前面的调用约定,每个ArtMethod都有两个约定,按道理我们应该准备两个中转函数的,但这里我们不考虑强制解析模式执行,所以只要处理好entry_point_from_compiled_code的中转即可。
首先,我们找到对应的方法,先保存其entrypoint,然后再把我们的中转函数art_quick_dispatcher覆盖,代码如下所示:
extern int __attribute__ ((visibility ("hidden" ))) art_java_method_hook(JNIEnv* env, HookInfo *info) {
const char * classDesc = info->classDesc;
const char * methodName = info->methodName;
const char * methodSig = info->methodSig;
const bool isStaticMethod = info->isStaticMethod;
jclass claxx = env->FindClass(classDesc);
if (claxx == NULL){
LOGE("[-] %s class not found" , classDesc);
return -1 ;
}
jmethodID methid = isStaticMethod ?
env->GetStaticMethodID(claxx, methodName, methodSig) :
env->GetMethodID(claxx, methodName, methodSig);
if (methid == NULL){
LOGE("[-] %s->%s method not found" , classDesc, methodName);
return -1 ;
}
ArtMethod *artmeth = reinterpret_cast (methid);
if (art_quick_dispatcher != artmeth->GetEntryPointFromCompiledCode()){
uint64_t (*entrypoint)(ArtMethod* method, Object *thiz, u4 *arg1, u4 *arg2);
entrypoint = (uint64_t (*)(ArtMethod*, Object *, u4 *, u4 *))artmeth->GetEntryPointFromCompiledCode();
info->entrypoint = (const void *)entrypoint;
info->nativecode = artmeth->GetNativeMethod();
artmeth->SetEntryPointFromCompiledCode((const void *)art_quick_dispatcher);
artmeth->SetNativeMethod((const void *)info);
LOGI("[+] %s->%s was hooked\n" , classDesc, methodName);
}else {
LOGW("[*] %s->%s method had been hooked" , classDesc, methodName);
}
return 0 ;
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
我们关键的信息通过ArtMethod::SetNativeMethod保存起来了。
考虑到ART特殊的调用约定,art_quick_dispatcher只能用汇编实现了,把寄存器适当的调整一下,再跳转到另一个函数artQuickToDispatcher,这样就可以很方便用c/c++访问参数了。
先看一下art_quick_dispatcher函数的实现如下:
.extern artQuickToDispatcher
ENTRY art_quick_dispatcher
push {r4 , r5 , lr} @ sp - 12
mov r0 , r0 @ pass r0 to method
str r1 , [sp, #(12 + 4)]
str r2 , [sp, #(12 + 8)]
str r3 , [sp, #(12 + 12)]
mov r1 , r9 @ pass r1 to thread
add r2 , sp, #(12 + 4) @ pass r2 to args array
add r3 , sp, #12 @ pass r3 to old SP
blx artQuickToDispatcher @ (Method* method, Thread*, u4 **, u4 **)
pop {r4 , r5 , pc} @ return on success, r0 and r1 hold the result
END art_quick_dispatcher
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
我把r2指向参数数组,这样就我们就可以非常方便的访问所有参数了。另外,我用r3保存了旧的sp地址,这样是为后面调用原来的entrypoint做准备的。我们先看看artQuickToDispatcher的实现:
extern "C" uint64_t artQuickToDispatcher(ArtMethod* method, Thread *self, u4 **args, u4 **old_sp){
HookInfo *info = (HookInfo *)method->GetNativeMethod();
LOGI("[+] entry ArtHandler %s->%s" , info->classDesc, info->methodName);
// If it not is static method, then args[0 ] was pointing to this
if (!info->isStaticMethod){
Object *thiz = reinterpret_cast(args[0 ]);
if (thiz != NULL){
char *bytes = get_chars_from_utf16(thiz->GetClass()->GetName()) ;
LOGI ("[+] thiz class is %s" , bytes) ;
delete bytes ;
}
}
const void *entrypoint = info -> entrypoint;
method->SetNativeMethod(info->nativecode); // restore nativecode for JNI method
uint64_t res = art_quick_call_entrypoint(method, self, args, old_sp, entrypoint);
JValue* result = (JValue* )&res;
if (result != NULL){
Object *obj = result->l;
char *raw_class_name = get_chars_from_utf16(obj->GetClass()->GetName()) ;
if (strcmp(raw_class_name, "java.lang.String" ) == 0 ) {
char *raw_string_value = get_chars_from_utf16 ((String *)obj) ;
LOGI ("result-class %s, result-value \"%s\"" , raw_class_name, raw_string_value) ;
free (raw_string_value) ;
}else {
LOGI ("result-class %s" , raw_class_name) ;
}
free (raw_class_name) ;
}
// entrypoid may be replaced by trampoline , only once .
// if (method->IsStatic() && !method->IsConstructor()) {
entrypoint = method -> GetEntryPointFromCompiledCode();
if (entrypoint != (const void *)art_quick_dispatcher){
LOGW("[*] entrypoint was replaced. %s->%s" , info->classDesc, info->methodName);
method->SetEntryPointFromCompiledCode((const void *)art_quick_dispatcher);
info->entrypoint = entrypoint;
info->nativecode = method->GetNativeMethod();
}
method->SetNativeMethod((const void *)info);
// }
return res;
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
这里参数解析就不详细说了,接下来是最棘手的问题——如何重新调回原来的entrypoint。
这里的关键点是要还原之前的堆栈布局,art_quick_call_entrypoint就是负责完成这个工作的,其实现如下所示:
ENTRY art_quick_call_entrypoint
push {r4 , r5 , lr} @ sp - 12
sub sp, #(40 + 20) @ sp - 40 - 20
str r0 , [sp, #(40 + 0)] @ var_40_0 = method_pointer
str r1 , [sp, #(40 + 4)] @ var_40_4 = thread_pointer
str r2 , [sp, #(40 + 8)] @ var_40_8 = args_array
str r3 , [sp, #(40 + 12)] @ var_40_12 = old_sp
mov r0 , sp
mov r1 , r3
ldr r2 , =40
blx memcpy @ memcpy(dest, src, size_of_byte)
ldr r0 , [sp, #(40 + 0)] @ restore method to r0
ldr r1 , [sp, #(40 + 4)]
mov r9 , r1 @ restore thread to r9
ldr r5 , [sp, #(40 + 8)] @ pass r5 to args_array
ldr r1 , [r5 ] @ restore arg1
ldr r2 , [r5 , #4] @ restore arg2
ldr r3 , [r5 , #8] @ restore arg3
ldr r5 , [sp, #(40 + 20 + 12)] @ pass ip to entrypoint
blx r5
add sp, #(40 + 20)
pop {r4 , r5 , pc} @ return on success, r0 and r1 hold the result
END art_quick_call_entrypoint
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
这里我偷懒了,直接申请了10个参数的空间,再使用之前传进入来的old_sp进行恢复,使用memcpy直接复制40字节。之后就是还原r0, r1, r2, r3, r9的值了。调用entrypoint完后,结果保存在r0和r1,再返回给artQuickToDispatcher。
至此,整个ART Hook就分析完毕了。
0x4 4.4与5.X上实现的区别
我的整个方案都是在4.4上测试的,主要是因为我只有4.4的源码,而且硬盘空间不足,实在装不下5.x的源码了。但整个思路,是完全可以套用用5.X上。另外,5.X的实现代码比4.4上复杂了很多,否能像我这样在NDK下编译完成就不知道了。
正常的4.4模拟器是以dalvik启动的,要到设置里改为art,这里会要求进行重启,但一般无效,我们手动关闭再重新打开就OK了,但需要等上一段时间才可以。
0x5 结束
虽然这篇文章只是介绍了Art Hook的技术方案,但其中的技术原理,对于如何在ART上进行代码加固、动态代码原理等等也是很有启发性。
老样子,整个项目的代码,我已经提交到https://github.com/boyliang/AllHookInOne,大家遇到什么问题,欢迎提问,有问题记得反馈。
对了,请用https://github.com/boyliang/ndk-patch给你的NDK打一下patch。