Part-1:Environment Setup
A few weeks ago I came into the need of optimizing some OpenGL code for iOS. There was alot of matrix- and vector-calculations going on that could greatly be improved by taking advantage of the NEON coprocessor found in every of the newer iPhone and iPad versions. It is a vector-based processor also allowing single-precission float-point calculations. Taking advantage of this processor for OpenGL applications can make a signification difference as you often do vector-based floating-point calculations. Nicely, there already exist a librarythat provides all that is needed regarding vector- and matrix-opertions and leverages the NEON chip by implementing the main functionality directly in assembly code (you are not able to directly access that coprocessor from your C-code). My only problem: I always like to understand what I am doing and would even sometimes rather re-implement some library myself then blindly reusing something I don’t understand the inner-workings of. Not sure if this is a good quality or a bad one…
This led me to the goal for this series of posts: Learn assembly for the ARM chip, which is found in all of the iOS-powered devices by Apple (and also in most other phones that play in the same league as the iPhone). I will try to write down my experience and the learned stuff within this series to help others getting started as well. For this, I will heavily rely on external resources like documentations, blog-posts and tutorials. This “diary” will function as the glue that holds everything together and makes me stick with my set goal to learn ARM assembly.
In this parts of the series we will focus on setting up the environment and make sure that our tool-chain is working. Namely, we will do the following:
Lets get started!
Installing the GNU ARM toolchain
As the iOS applications compiled for the iPhone Simulator are translated to normal x68 code (the simulator does not simulate an ARM chip) we cannot just write assembly code in your iOS project and run it in the simulator. For sure, we don’t want to connect an iOS device for our early fiddling with the ARM chip. Thus we will install the GNU ARM Assembler tool-chain which is free and comes with a simulator that allows us to run the developed code without a real device.
Go to the GNU ARM webpage and download the binary for your architecture. As I am focusing on Apple mobile devices, i assume you will download the toolchain for Mac OSX.
After unpacking and installing the download, you should have the following tools installed, that you are able to call from you command-line shell
As you might notice, these are all the standard binutils you already know. They only support (and compile for) the ARM architecture, but if you are familiar with gcc, gdb, etc, you should have no big problems with these tools.
The only small negative point is that these tools build and run ELF-based executables and not the Mach-O executables known from Mac OSX and iOS. So, you will have to use objdump here, but use otool for example on your iOS executables. But it is not really justified to make such a complaint…
Building a simple Hello-World App for the ARM chip
Lets build a very simple Hello-World app. We will compile, link, run and debug that app with the just installed tools.
Create a helloworld.c file with the following content:
#include <stdio.h> int main(int argc, char *argv[]) { char *str = "hello world"; printf("%s\n", str); asm volatile("mov r0,r0"); return 0; }
You can see that the file only prints out “hello world” and has a simple stub for adding some assembly-code to playing around with the ARM-specifics. The “mov r0,r0”, is basically a no-op and we don’t worry about it for now.
Perform the following set of calls to build, link and run the executable (named “hw”):
arm-elf-gcc -mcpu=arm7 -O2 -g -c helloworld.c -o helloworld.o arm-elf-gcc -mcpu=arm7 -o hw helloworld.o -lc arm-elf-run hw
There should be no problems with these steps and running the “hw” exectuable should actually print “hello world” to the commandline; but I just also want to show you how to start the executable in the debugger:
macbook:arm-asm daniel$ arm-elf-gdb GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "--host=powerpc-apple-darwin6.8 --target=arm-elf". (gdb) file hw Reading symbols from hw...done. (gdb) target sim Connected to the simulator. (gdb) load Loading section .init, size 0x1c vma 0x8000 Loading section .text, size 0x2d74 vma 0x801c Loading section .fini, size 0x18 vma 0xad90 Loading section .rodata, size 0x18 vma 0xada8 Loading section .data, size 0x8a8 vma 0xaec0 Loading section .eh_frame, size 0x4 vma 0xb768 Loading section .ctors, size 0x8 vma 0xb76c Loading section .dtors, size 0x8 vma 0xb774 Loading section .jcr, size 0x4 vma 0xb77c Start address 0x811c Transfer rate: 111616 bits in <1 sec. (gdb) run Starting program: /Users/daniel/tmp/arm-asm/hw hello world Program exited normally. [Switching to process 0]
You see, it requires first to call “file hw”, to load the executable, “target sim”, then “load” and finally “run” to run the executable. All further commands, like setting break-points and stepping through the code is standard gdb and you can look it up in any regular tutorial on this toolchain (you can already do alot with break, stepi, nexti, continue, list and print).
Do you remember the “mov r0,r0” assembly instruction in our code? We can use “arm-elf-objdump -d hw” to actually disassembly the executable and see it listed as a “nop”:
macbook:arm-asm daniel$ arm-elf-objdump -d hw | grep -A 12 "<main>:" 00008224 <main>: 8224: e1a0c00d mov ip, sp 8228: e92dd800 stmdb sp!, {fp, ip, lr, pc} 822c: e24cb004 sub fp, ip, #4 ; 0x4 8230: e59f000c ldr r0, [pc, #12] ; 8244 <main+0x20> 8234: eb00023e bl 8b34 <puts> 8238: e1a00000 nop (mov r0,r0) 823c: e3a00000 mov r0, #0 ; 0x0 8240: e91ba800 ldmdb fp, {fp, sp, pc} 8244: 0000ada8 andeq sl, r0, r8, lsr #27 00008248 <atexit>: 8248: e1a0c00d mov ip, sp
Congratulations! You have successfully installed the tool-chain and are now ready to fiddle around with ARM assembly.
Further Reading
As a homework for the next part of the series, i will assume you have a look at the following really great resources:
The first tutorial was originally written for the Gameboy Advanced, but that should not confuse you. The GBA also uses an ARM7-chip and thus we can transfer almost all of the learned to our iOS devices.
You actually might wamt to play around a little with what you learn in the tutorial. Do you remember the “asm volatile” block in your main-method? You could use it to put some simple assembler code in and just play around. Details on how to pass and return back C-based variables to/from this inline-code, can be found in this very well-written blog-post. I don’t assume you go through it in every detail (we will come back to it later), but it will help you if you want to pass values between C- and assembly-code and not have to worry about call-conventions between C- and assembly-code too much. Actually, this will be the topic in Part 2 or 3 of the series.
=======================================================================
Part-2:First Steps
As this series is mainly to write down my own learning experience and recap on it, I assume you have made your homework from Part 1 as well as I have. Mainly, reading theWhirlwind tour of ARM Assembly (Sections 23.1 – 23.3 should be enough for now). From here, we will start to write our own first assembly functions and get familiar with the the toolchain. In detail, we will learn the following in this part:
Recap
Let’s start with a quick summary of what you should have learn from the Whirlwind tour of ARM Assembly:
The first assembler function
Let us write a first simple assembler function based on our already gained knowledge. It will be located in an own assembler-file and will be called from our C-code’s main-method.
Create a file asmlib.s with the following assembly-code:
@ ARM Assembler Test Library @ int asm_sum(int a, int b) .align 2 @ Align to word boundary .arm @ This is ARM code .global asm_sum @ This makes it a real symbol asm_sum: @ Start of function definition add r2, r0, r1 @ Add up a (r0) and b (r1) and store result in r2 mov r0, r2 @ Store sum (r2) in r0 which stores return-value mov pc, lr @ Set program counter to lr (was set by caller)
We have defined a very simple assembler function to multiple the two arguments a and b (which are actually handed to the function in register r0 and r1) and return back the sum within r0 to the caller. We will get into the details of the call-conventions in the next Part of the series. For now, just take it for granted that the arguments are handed over in this way and returned in r0. You should know by now that the “asm_sum:” is a label to this instruction-block and when jumped to will first execute the “add”, “mov” and last reset the program-counter (pc) to the next statement of the callers code (was stored in lr by caller).
There are some ARM-asembler directives used in the begining of the file to align the function-label to the next word-boundary (.align x means align to 2^x byte boundary). In general, the ARM processor should be able to handle unaligned access as well, but the Apple documents specifically state that functions have to be aligned. To know the details on why aligned access is important, read this post. “.arm” defines this as ARM-code and “.global” exposes the label as a global symbol. This is important so we can call this function from our C-code.
Some more information is given in the comments starting with “@”. One additional note: We could actual remove the “mov r0, r2″ call if we had called “add r0, r0, r1″ in the first place, but to learn assembly, it is not bad to have some more explicit code.
Next, we will call our defined function that returns the sum of its two arguments from our main-method in C-code. We use the following main.c for this:
#include <stdio.h> extern int asm_sum(int a, int b); int main(int argc, char *argv[]) { printf("== sum ==\n"); int a = 71; int b = 29; printf("%d + %d = %d\n", a, b, asm_sum(a, b)); return 0; }
The only thing that you will notice is that we have to define the asm_sum-method as as external symbol, as we don’t define it here in our C-code.
You can now try to compile and link both files to an executable with the following calls (the last line is already for running our executable):
arm-elf-gcc -mcpu=arm7 -O2 -g -c asmlib.s -o asmlib.o arm-elf-gcc -mcpu=arm7 -O2 -g -c main.c -o main.o arm-elf-gcc -mcpu=arm7 -o armtest *.o -lc arm-elf-run armtest
Running the program should show you the expected result; i.e. display the printf-statement in the stdout. Good job!
Debugging
Lets step through our assembly code with the debugger. First, create a file named “.gdbinit” in the folder where also the other two files are located and put in the following lines:
file armtest target sim load
This file will be loaded on startup of gdb and already set our executable, set the target architecture to the ARM simulator and load the code into memory. We can now do some simple debugging:
macbook:ARMAssembly_Part2 daniel$ arm-elf-gdb GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "--host=powerpc-apple-darwin6.8 --target=arm-elf". Connected to the simulator. Loading section .init, size 0x1c vma 0x8000 Loading section .text, size 0x8a3c vma 0x801c Loading section .fini, size 0x18 vma 0x10a58 Loading section .rodata, size 0x248 vma 0x10a70 Loading section .data, size 0x8bc vma 0x10db8 Loading section .eh_frame, size 0x4 vma 0x11674 Loading section .ctors, size 0x8 vma 0x11678 Loading section .dtors, size 0x8 vma 0x11680 Loading section .jcr, size 0x4 vma 0x11688 Start address 0x811c Transfer rate: 306272 bits in <1 sec. (gdb) break asmlib.s:9 Breakpoint 1 at 0x8228: file asmlib.s, line 9. (gdb) run Starting program: /Users/daniel/Dev/Test/ARMAssembly_Part2/armtest == sum == Breakpoint 1, asm_sum () at asmlib.s:9 9 mov r0, r2 @ Store sum (r2) in r0 which stores return-value Current language: auto; currently asm (gdb) print $r0 $1 = 71 (gdb) print $r1 $2 = 29 (gdb) print $r2 $3 = 100 (gdb) stepi 10 mov pc, lr @ Set program counter to lr (was set by caller) (gdb) continue Continuing. 71 + 29 = 100 Program exited normally. [Switching to process 0] Current language: auto; currently c (gdb)
You see, that we first define a breakpoint at line 9 of the file asmlib.s, then we “run” the executable. The simulator stops at the breakpoint. “print $regname” allows us to print the content of a register. As exepected, the function parameters one and two are stored in r0 and r1 (compare with the values set for a and b in main.c). After that, we use “stepi” to step to the next instruction and then call “continue” so the normal execution proceeded and finally ends the program. These few gdb-commands should already allow you to do some simple debugging. For more details on gdb, google is your friend
Some more instructions in a nutshell
Lets write one more assembly function to get familiar with some more instructions. How about a multiply-routine that returns for two parameter a and b the value a*b:
@ int asm_mul(int a, int b) .align 2 .arm .global asm_mul asm_mul: stmfd sp!, {r4-r11} @ in case we needed to work with more than registers r0-r3, @ have to save the first on the stack (only r0-r3 and r12 are scratch registers) @ Here, actually don't need them... mov r3, #0 @ Initialize register holding result of multiplication movs r2, r0 @ Move "a" into r2 and set status-flags (mov"s") beq asm_mul_return @ Immediately return if a==0 movs r2, r1 @ Move "b" into r2 and set status-flags (mov"s") beq asm_mul_return @ Immediately return if b==0 asm_mul_loop: add r3, r3, r0 @ r3 = r3 + r0 subs r1, r1, #1 @ r1 = r1 - 1 (decrement) bne asm_mul_loop @ If the zero-flag is not set (r1 > 0), loop once more asm_mul_return: ldmfd sp!, {r4-r11} @ Restore the registers mov r0, r3 @ Store result in r0 (return register) mov pc, lr
Please note that we could have used assembly instructions to do the multiplication for us, but this way we can recap on some instructions we have learned more easily.
The algorithm implemented is basically: result = 0; if (a == 0 or b==0) return result; else while(b>0) result = result + a; b = b – 1 and should be quiet straight-forward. Here are some points to take away though:
To test your routine, you will have to add a method-declaration for asm_mul to main.c and call it within your main-method. You can find the full sources and a Makefile of our examples on github.
Further Reading
As we will be covering function-call conventions in iOS within the next part, I recommend the following read as preparation:
=======================================================================
to cotinue