注:
gcc -S xxx.c
可以得到xxx.c文件对应的汇编文件xxx.s
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
Copyright (C)2003 Sandeep S.
This document is free; you can redistribute and/or modify this under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Kindly forward feedback and criticism toSandeep.S. I will be indebted to anybody who points out errors and inaccuracies in this document; I shall rectify them as soon as I am informed.
I express my sincere appreciation to GNU people for providing such a great feature. Thanks to Mr.Pramode C E for all the helps he did. Thanks to friends at the Govt Engineering College, Trichur for their moral-support and cooperation, especially to Nisha Kurur and Sakeeb S. Thanks to my dear teachers at Govt Engineering College, Trichur for their cooperation.
Additionally, thanks to Phillip, Brennan Underwood and [email protected]; Many things here are shamelessly stolen from their works.
We are here to learn about GCC inline assembly. What this inline stands for?
We can instruct the compiler to insert the code of a function into the code of its callers, to the point where actually the call is to be made. Such functions are inline functions. Sounds similar to a Macro? Indeed there are similarities.
What is the benefit of inline functions?
This method of inlining reduces the function-call overhead. And if any of the actual argument values are constant, their known values may permit simplifications at compile time so that not all of the inline function’s code needs to be included. The effect on code size is less predictable, it depends on the particular case. To declare an inline function, we’ve to use the keywordinline
in its declaration.
Now we are in a position to guess what is inline assembly. Its just some assembly routines written as inline functions. They are handy, speedy and very much useful in system programming. Our main focus is to study the basic format and usage of (GCC) inline assembly functions. To declare inline assembly functions, we use the keywordasm
.
Inline assembly is important primarily because of its ability to operate and make its output visible on C variables. Because of this capability, "asm" works as an interface between the assembly instructions and the "C" program that contains it.
GCC, the GNU C Compiler for Linux, usesAT&T/UNIXassembly syntax. Here we’ll be using AT&T syntax for assembly coding. Don’t worry if you are not familiar with AT&T syntax, I will teach you. This is quite different from Intel syntax. I shall give the major differences.
The direction of the operands in AT&T syntax is opposite to that of Intel. In Intel syntax the first operand is the destination, and the second operand is the source whereas in AT&T syntax the first operand is the source and the second operand is the destination. ie,
"Op-code dst src" in Intel syntax changes to
"Op-code src dst" in AT&T syntax.
Register names are prefixed by % ie, if eax is to be used, write %eax.
AT&T immediate operands are preceded by ’$’. For static "C" variables also prefix a ’$’. In Intel syntax, for hexadecimal constants an ’h’ is suffixed, instead of that, here we prefix ’0x’ to the constant. So, for hexadecimals, we first see a ’$’, then ’0x’ and finally the constants.
In AT&T syntax the size of memory operands is determined from the last character of the op-code name. Op-code suffixes of ’b’, ’w’, and ’l’ specify byte(8-bit), word(16-bit), and long(32-bit) memory references. Intel syntax accomplishes this by prefixing memory operands (not the op-codes) with ’byte ptr’, ’word ptr’, and ’dword ptr’.
Thus, Intel "mov al, byte ptr foo" is "movb foo, %al" in AT&T syntax.
In Intel syntax the base register is enclosed in ’[’ and ’]’ where as in AT&T they change to ’(’ and ’)’. Additionally, in Intel syntax an indirect memory reference is like
section:[base + index*scale + disp], which changes to
section:disp(base, index, scale) in AT&T.
One point to bear in mind is that, when a constant is used for disp/scale, ’$’ shouldn’t be prefixed.
Now we saw some of the major differences between Intel syntax and AT&T syntax. I’ve wrote only a few of them. For a complete information, refer to GNU Assembler documentations. Now we’ll look at some examples for better understanding.
+------------------------------+------------------------------------+
| Intel Code | AT&T Code |
+------------------------------+------------------------------------+
| mov eax,1 | movl $1,%eax |
| mov ebx,0ffh | movl $0xff,%ebx |
| int 80h | int $0x80 |
| mov ebx, eax | movl %eax, %ebx |
| mov eax,[ecx] | movl (%ecx),%eax |
| mov eax,[ebx+3] | movl 3(%ebx),%eax |
| mov eax,[ebx+20h] | movl 0x20(%ebx),%eax |
| add eax,[ebx+ecx*2h] | addl (%ebx,%ecx,0x2),%eax |
| lea eax,[ebx+ecx] | leal (%ebx,%ecx),%eax |
| sub eax,[ebx+ecx*4h-20h] | subl -0x20(%ebx,%ecx,0x4),%eax |
+------------------------------+------------------------------------+
The format of basic inline assembly is very much straight forward. Its basic form. is
asm("assembly code");
Example.
asm("movl %ecx %eax"); /* moves the contents of ecx to eax */
__asm__("movb %bh (%eax)"); /*moves the byte from bh to the memory pointed by eax */
You might have noticed that here I’ve usedasm
and__asm__
. Both are valid. We can use__asm__
if the keywordasm
conflicts with something in our program. If we have more than one instructions, we write one per line in double quotes, and also suffix a ’/n’ and ’/t’ to the instruction. This is because gcc sends each instruction as a string toas(GAS) and by using the newline/tab we send correctly formatted lines to the assembler.
Example.
__asm__ ("movl %eax, %ebx/n/t"
"movl $56, %esi/n/t"
"movl %ecx, $label(%edx,%ebx,$4)/n/t"
"movb %ah, (%ebx)");
If in our code we touch (ie, change the contents) some registers and return from asm without fixing those changes, something bad is going to happen. This is because GCC have no idea about the changes in the register contents and this leads us to trouble, especially when compiler makes some optimizations. It will suppose that some register contains the value of some variable that we might have changed without informing GCC, and it continues like nothing happened. What we can do is either use those instructions having no side effects or fix things when we quit or wait for something to crash. This is where we want some extended functionality. Extended asm provides us with that functionality.
In basic inline assembly, we had only instructions. In extended assembly, we can also specify the operands. It allows us to specify the input registers, output registers and a list of clobbered registers. It is not mandatory to specify the registers to use, we can leave that head ache to GCC and that probably fit into GCC’s optimization scheme better. Anyway the basic format is:
asm ( assembler template
: output operands /* optional */
: input operands /* optional */
: list of clobbered registers /* optional */
);
The assembler template consists of assembly instructions. Each operand is described by an operand-constraint string followed by the C expression in parentheses. A colon separates the assembler template from the first output operand and another separates the last output operand from the first input, if any. Commas separate the operands within each group. The total number of operands is limited to ten or to the maximum number of operands in any instruction pattern in the machine description, whichever is greater.
If there are no output operands but there are input operands, you must place two consecutive colons surrounding the place where the output operands would go.
Example:
asm ("cld/n/t"
"rep/n/t"
"stosl"
: /* no output registers */
: "c" (count), "a" (fill_value), "D" (dest)
: "%ecx", "%edi"
);
Now, what does this code do? The above inline fills thefill_value
count
times to the location pointed to by the registeredi
. It also says to gcc that, the contents of registerseax
andedi
are no longer valid. Let us see one more example to make things more clearer.
int a=10, b;
asm ("movl %1, %%eax;
movl %%eax, %0;"
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
Here what we did is we made the value of ’b’ equal to that of ’a’ using assembly instructions. Some points of interest are:
When the execution of "asm" is complete, "b" will reflect the updated value, as it is specified as an output operand. In other words, the change made to "b" inside "asm" is supposed to be reflected outside the "asm".
Now we may look each field in detail.
The assembler template contains the set of assembly instructions that gets inserted inside the C program. The format is like: either each instruction should be enclosed within double quotes, or the entire group of instructions should be within double quotes. Each instruction should also end with a delimiter. The valid delimiters are newline(/n) and semicolon(;). ’/n’ may be followed by a tab(/t). We know the reason of newline/tab, right?. Operands corresponding to the C expressions are represented by %0, %1 ... etc.
C expressions serve as operands for the assembly instructions inside "asm". Each operand is written as first an operand constraint in double quotes. For output operands, there’ll be a constraint modifier also within the quotes and then follows the C expression which stands for the operand. ie,
"constraint" (C expression) is the general form. For output operands an additional modifier will be there. Constraints are primarily used to decide the addressing modes for operands. They are also used in specifying the registers to be used.
If we use more than one operand, they are separated by comma.
In the assembler template, each operand is referenced by numbers. Numbering is done as follows. If there are a total of n operands (both input and output inclusive), then the first output operand is numbered 0, continuing in increasing order, and the last input operand is numbered n-1. The maximum number of operands is as we saw in the previous section.
Output operand expressions must be lvalues. The input operands are not restricted like this. They may be expressions. The extended asm feature is most often used for machine instructions the compiler itself does not know as existing ;-). If the output expression cannot be directly addressed (for example, it is a bit-field), our constraint must allow a register. In that case, GCC will use the register as the output of the asm, and then store that register contents into the output.
As stated above, ordinary output operands must be write-only; GCC will assume that the values in these operands before the instruction are dead and need not be generated. Extended asm also supports input-output or read-write operands.
So now we concentrate on some examples. We want to multiply a number by 5. For that we use the instructionlea
.
asm ("leal (%1,%1,4), %0"
: "=r" (five_times_x)
: "r" (x)
);
Here our input is in ’x’. We didn’t specify the register to be used. GCC will choose some register for input, one for output and does what we desired. If we want the input and output to reside in the same register, we can instruct GCC to do so. Here we use those types of read-write operands. By specifying proper constraints, here we do it.
asm ("leal (%0,%0,4), %0"
: "=r" (five_times_x)
: "0" (x)
);
Now the input and output operands are in the same register. But we don’t know which register. Now if we want to specify that also, there is a way.
asm ("leal (%%ecx,%%ecx,4), %%ecx"
: "=c" (x)
: "c" (x)
);
In all the three examples above, we didn’t put any register to the clobber list. why? In the first two examples, GCC decides the registers and it knows what changes happen. In the last one, we don’t have to putecx
on the c lobberlist, gcc knows it goes into x. Therefore, since it can know the value ofecx
, it isn’t considered clobbered.
Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the field after the third ’:’ in the asm function. This is to inform. gcc that we will use and modify them ourselves. So gcc will not assume that the values it loads into these registers will be valid. We shoudn’t list the input and output registers in this list. Because, gcc knows that "asm" uses them (because they are specified explicitly as constraints). If the instructions use any other registers, implicitly or explicitly (and the registers are not present either in input or in the output constraint list), then those registers have to be specified in the clobbered list.
If our instruction can alter the condition code register, we have to add "cc" to the list of clobbered registers.
If our instruction modifies memory in an unpredictable fashion, add "memory" to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction. We also have to add thevolatilekeyword if the memory affected is not listed in the inputs or outputs of the asm.
We can read and write the clobbered registers as many times as we like. Consider the example of multiple instructions in a template; it assumes the subroutine _foo accepts arguments in registerseax
andecx
.
asm ("movl %0,%%eax;
movl %1,%%ecx;
call _foo"
: /* no outputs */
: "g" (from), "g" (to)
: "eax", "ecx"
);
If you are familiar with kernel sources or some beautiful code like that, you must have seen many functions declared asvolatile
or__volatile__
which follows anasm
or__asm__
. I mentioned earlier about the keywordsasm
and__asm__
. So what is thisvolatile
?
If our assembly statement must execute where we put it, (i.e. must not be moved out of a loop as an optimization), put the keywordvolatile
after asm and before the ()’s. So to keep it from moving, deleting and all, we declare it as
asm volatile ( ... : ... : ... : ...);
Use__volatile__
when we have to be verymuch careful.
If our assembly is just for doing some calculations and doesn’t have any side effects, it’s better not to use the keywordvolatile
. Avoiding it helps gcc in optimizing the code and making it more beautiful.
In the sectionSome Useful Recipes
, I have provided many examples for inline asm functions. There we can see the clobber-list in detail.
By this time, you might have understood that constraints have got a lot to do with inline assembly. But we’ve said little about constraints. Constraints can say whether an operand may be in a register, and which kinds of register; whether the operand can be a memory reference, and which kinds of address; whether the operand may be an immediate constant, and which possible values (ie range of values) it may have.... etc.
There are a number of constraints of which only a few are used frequently. We’ll have a look at those constraints.
When operands are specified using this constraint, they get stored in General Purpose Registers(GPR). Take the following example:
asm ("movl %%eax, %0/n" :"=r"(myval));
Here the variable myval is kept in a register, the value in registereax
is copied onto that register, and the value ofmyval
is updated into the memory from this register. When the "r" constraint is specified, gcc may keep the variable in any of the available GPRs. To specify the register, you must directly specify the register names by using specific register constraints. They are:
+---+--------------------+
| r | Register(s) |
+---+--------------------+
| a | %eax, %ax, %al |
| b | %ebx, %bx, %bl |
| c | %ecx, %cx, %cl |
| d | %edx, %dx, %dl |
| S | %esi, %si |
| D | %edi, %di |
+---+--------------------+
When the operands are in the memory, any operations performed on them will occur directly in the memory location, as opposed to register constraints, which first store the value in a register to be modified and then write it back to the memory location. But register constraints are usually used only when they are absolutely necessary for an instruction or they significantly speed up the process. Memory constraints can be used most efficiently in cases where a C variable needs to be updated inside "asm" and you really don’t want to use a register to hold its value. For example, the value of idtr is stored in the memory location loc:
asm("sidt %0/n" : :"m"(loc));
In some cases, a single variable may serve as both the input and the output operand. Such cases may be specified in "asm" by using matching constraints.
asm ("incl %0" :"=a"(var):"0"(var));
We saw similar examples in operands subsection also. In this example for matching constraints, the register %eax is used as both the input and the output variable. var input is read to %eax and updated %eax is stored in var again after increment. "0" here specifies the same constraint as the 0th output variable. That is, it specifies that the output instance of var should be stored in %eax only. This constraint can be used:
The most important effect of using matching restraints is that they lead to the efficient use of available registers.
Some other constraints used are:
Following constraints are x86 specific.
While using constraints, for more precise control over the effects of constraints, GCC provides us with constraint modifiers. Mostly used constraint modifiers are
The list and explanation of constraints is by no means complete. Examples can give a better understanding of the use and usage of inline asm. In the next section we’ll see some examples, there we’ll find more about clobber-lists and constraints.
Now we have covered the basic theory about GCC inline assembly, now we shall concentrate on some simple examples. It is always handy to write inline asm functions as MACRO’s. We can see many asm functions in the kernel code. (/usr/src/linux/include/asm/*.h).
First we start with a simple example. We’ll write a program to add two numbers.
int main(void)
{
int foo = 10, bar = 15;
__asm__ __volatile__("addl %%ebx,%%eax"
:"=a"(foo)
:"a"(foo), "b"(bar)
);
printf("foo+bar=%d/n", foo);
return 0;
}
Here we insist GCC to store foo in %eax, bar in %ebx and we also want the result in %eax. The ’=’ sign shows that it is an output register. Now we can add an integer to a variable in some other way.
__asm__ __volatile__(
" lock ;/n"
" addl %1,%0 ;/n"
: "=m" (my_var)
: "ir" (my_int), "m" (my_var)
: /* no clobber-list */
);
This is an atomic addition. We can remove the instruction ’lock’ to remove the atomicity. In the output field, "=m" says that my_var is an output and it is in memory. Similarly, "ir" says that, my_int is an integer and should reside in some register (recall the table we saw above). No registers are in the clobber list.
Now we’ll perform. some action on some registers/variables and compare the value.
__asm__ __volatile__( "decl %0; sete %1"
: "=m" (my_var), "=q" (cond)
: "m" (my_var)
: "memory"
);
Here, the value of my_var is decremented by one and if the resulting value is0
then, the variable cond is set. We can add atomicity by adding an instruction "lock;/n/t" as the first instruction in assembler template.
In a similar way we can use "incl %0" instead of "decl %0", so as to increment my_var.
Points to note here are that (i) my_var is a variable residing in memory. (ii) cond is in any of the registers eax, ebx, ecx and edx. The constraint "=q" guarantees it. (iii) And we can see that memory is there in the clobber list. ie, the code is changing the contents of memory.
How to set/clear a bit in a register? As next recipe, we are going to see it.
__asm__ __volatile__( "btsl %1,%0"
: "=m" (ADDR)
: "Ir" (pos)
: "cc"
);
Here, the bit at the position ’pos’ of variable at ADDR ( a memory variable ) is set to1
We can use ’btrl’ for ’btsl’ to clear the bit. The constraint "Ir" of pos says that, pos is in a register, and it’s value ranges from 0-31 (x86 dependant constraint). ie, we can set/clear any bit from 0th to 31st of the variable at ADDR. As the condition codes will be changed, we are adding "cc" to clobberlist.
Now we look at some more complicated but useful function. String copy.
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__( "1:/tlodsb/n/t"
"stosb/n/t"
"testb %%al,%%al/n/t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
: "0" (src),"1" (dest)
: "memory");
return dest;
}
The source address is stored in esi, destination in edi, and then starts the copy, when we reach at0, copying is complete. Constraints "&S", "&D", "&a" say that the registers esi, edi and eax are early clobber registers, ie, their contents will change before the completion of the function. Here also it’s clear that why memory is in clobberlist.
We can see a similar function which moves a block of double words. Notice that the function is declared as a macro.
#define mov_blk(src, dest, numwords) /
__asm__ __volatile__ ( /
"cld/n/t" /
"rep/n/t" /
"movsl" /
: /
: "S" (src), "D" (dest), "c" (numwords) /
: "%ecx", "%esi", "%edi" /
)
Here we have no outputs, so the changes that happen to the contents of the registers ecx, esi and edi are side effects of the block movement. So we have to add them to the clobber list.
In Linux, system calls are implemented using GCC inline assembly. Let us look how a system call is implemented. All the system calls are written as macros (linux/unistd.h). For example, a system call with three arguments is defined as a macro as shown below.
#define _syscall3(type,name,type1,arg1,type2,arg2,type3,arg3) /
type name(type1 arg1,type2 arg2,type3 arg3) /
{ /
long __res; /
__asm__ volatile ( "int $0x80" /
: "=a" (__res) /
: "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), /
"d" ((long)(arg3))); /
__syscall_return(type,__res); /
}
Whenever a system call with three arguments is made, the macro shown above is used to make the call. The syscall number is placed in eax, then each parameters in ebx, ecx, edx. And finally "int 0x80" is the instruction which makes the system call work. The return value can be collected from eax.
Every system calls are implemented in a similar way. Exit is a single parameter syscall and let’s see how it’s code will look like. It is as shown below.
{
asm("movl $1,%%eax; /* SYS_exit is 1 */
xorl %%ebx,%%ebx; /* Argument is in ebx, it is 0 */
int $0x80" /* Enter kernel mode */
);
}
The number of exit is "1" and here, it’s parameter is 0. So we arrange eax to contain 1 and ebx to contain 0 and byint $0x80
, theexit(0)
is executed. This is how exit works.
This document has gone through the basics of GCC Inline Assembly. Once you have understood the basic concept it is not difficult to take steps by your own. We saw some examples which are helpful in understanding the frequently used features of GCC Inline Assembly.
GCC Inlining is a vast subject and this article is by no means complete. More details about the syntax’s we discussed about is available in the official documentation for GNU Assembler. Similarly, for a complete list of the constraints refer to the official documentation of GCC.
And of-course, the Linux kernel use GCC Inline in a large scale. So we can find many examples of various kinds in the kernel sources. They can help us a lot.
If you have found any glaring typos, or outdated info in this document, please let us know.
Ok. This is meant to be an introduction to inline assembly under DJGPP. DJGPP is based on GCC, so it uses the AT&T/UNIX syntax and has a somewhat unique method of inline assembly. I spent many hours figuring some of this stuff out and toldInfothat I hate it, many times.
Hopefully if you already know Intel syntax, the examples will be helpful to you. I've put variable names, register names and other literals inbold type.
AT&T:%eax
Intel:eax
AT&T:movl %eax, %ebx
Intel:mov ebx, eax
AT&T:movl $_booga, %eaxNow let's loadebxwith0xd00d:
Intel:mov eax, _booga
AT&T:movl $0xd00d, %ebx
Intel:mov ebx, d00dh
AT&T:movw %ax, %bxThe equivalent forms for Intel isbyte ptr,word ptr, anddword ptr, but that is for when you are...
Intel:mov bx, ax
AT&T:immed32(basepointer,indexpointer,indexscale)You could think of the formula to calculate the address as:
Intel:[basepointer + indexpointer*indexscale + immed32]
immed32 + basepointer + indexpointer * indexscaleYou don't have to use all those fields, but youdohave to have at least 1 of immed32, basepointer and youMUSTadd the size suffix to the operator!
AT&T:_boogaNote: the underscore ("_") is how you get at static (global) C variables from assembler.This only works with global variables. Otherwise, you can use extended asm to have variables preloaded into registers for you. I address that farther down.
Intel:[_booga]
AT&T:(%eax)
Intel:[eax]
AT&T:_variable(%eax)
Intel:[eax + _variable]
AT&T:_array(,%eax,4)
Intel:[eax*4 + array]
C code:*(p+1) where p is a char *
AT&T:1(%eax) where eax has the value of p
Intel:[eax + 1]
AT&T:_struct_pointer+8I assume you can do that with Intel format as well.
AT&T:_array(%ebx,%eax,8)
Intel:[ebx + eax*8 + _array]
asm ("statements");Pretty simple, no? So
asm ("nop");will do nothing of course, and
asm ("cli");will stop interrupts, with
asm ("sti");of course enabling them. You can use __asm__instead of asmif the keyword asmconflicts with something in your program.
When it comes to simple stuff like this, basic inline assembly is fine. You can even push your registers onto the stack, use them, and put them back.
asm ("pushl %eax/n/t"(The /n's and /t's are there so the .sfile that GCC generates and hands to GAS comes out right when you've got multiple statements per asm.)
"movl $0, %eax/n/t"
"popl %eax");
But if youdotouch the registers, and don't fix things at the end of yourasmstatement, like so:
asm ("movl %eax, %ebx");then your program will probably blow things to hell. This is because GCC hasn't been told that your asmstatement clobbered ebxand edxand booga, which it might have been keeping in a register, and might plan on using later. For that, you need:
asm ("xorl %ebx, %edx");
asm ("movl $0, _booga");
Here is the basic format:
asm ( "statements" : output_registers : input_registers : clobbered_registers);Let's just jump straight to a nifty example, which I'll then explain:
asm ("cld/n/t"The above stores the value in fill_value counttimes to the pointer dest.
"rep/n/t"
"stosl"
: /* no output registers */
: "c" (count), "a" (fill_value), "D" (dest)
: "%ecx", "%edi" );
Let's look at this bit by bit.
asm ("cld/n/t"We are clearing the direction bit of the flagsregister. You never know what this is going to be left at, and it costs you all of 1 or 2 cycles.
"rep/n/t"Notice that GAS requires the repprefix to occupy a line of it's own. Notice also that stoshas the lsuffix to make it move longwords.
"stosl"
: /* no output registers */Well, there aren't any in this function.
: "c" (count), "a" (fill_value), "D" (dest)Here we load ecxwith count, eaxwith fill_value, and ediwith dest. Why make GCC do it instead of doing it ourselves? Because GCC, in its register allocating, might be able to arrange for, say, fill_valueto already be in eax. If this is in a loop, it might be able to preserve eaxthru the loop, and save a movlonce per loop.
: "%ecx", "%edi" );And here's where we specify to GCC, "you can no longer count on the values you loaded into ecxor edito be valid." This doesn't mean they will be reloaded for certain. This is the clobberlist.
Seem funky? Well, it really helps when optimizing, when GCC can know exactly what you're doing with the registers before and after. It folds your assembly code into the code it's generates (whose rules for generation lookremarkablylike the above) and then optimizes. It's even smart enough to know that if you tell it to put (x+1) in a register, then if you don't clobber it, and later C code refers to (x+1), and it was able to keep that register free, it will reuse the computation. Whew.
Here's the list of register loading codes that you'll be likely to use:
a eaxNote that you can't directly refer to the byte registers ( ah, al, etc.) or the word registers ( ax, bx, etc.) when you're loading this way. Once you've got it in there, though, you can specify axor whatever all you like.
b ebx
c ecx
d edx
S esi
D edi
I constant value (0 to 31)
q,r dynamically allocated register (see below)
g eax, ebx, ecx, edx or variable in memory
A eax and edx combined into a 64-bit integer (use long longs)
The codeshaveto be in quotes, and the expressions to load inhaveto be in parentheses.
When you do the clobber list, you specify the registers as abovewiththe%. If you write to a variable, youmustinclude"memory"as one of The Clobbered. This is in case you wrote to a variable that GCC thought it had in a register. This is the same as clobbering all registers. While I've never run into a problem with it, you might also want to add"cc"as a clobber if you change the condition codes (the bits in theflagsregister thejnz,je, etc. operators look at.)
Now, that's all fine and good for loading specific registers. But what if you specify, say,ebx, andecx, and GCC can't arrange for the values to be in those registers without having to stash the previous values. It's possible to let GCC pick the register(s). You do this:
asm ("leal (%1,%1,4), %0"The above example multiplies x by 5 really quickly (1 cycle on the Pentium). Now, we could have specified, say eax. But unless we really need a specific register (like when using rep movslor rep stosl, which are hardcoded to use ecx, edi, and esi), why not let GCC pick an available one? So when GCC generates the output code for GAS, %0 will be replaced by the register it picked.
: "=r" (x)
: "0" (x) );
And where did"q"and"r"come from? Well,"q"causes GCC to allocate fromeax,ebx,ecx, andedx."r"lets GCC also consideresiandedi. So make sure, if you use"r"that it would be possible to useesiorediin that instruction. If not, use"q".
Now, you might wonder, how to determine how the%ntokens get allocated to the arguments. It's a straightforward first-come-first-served, left-to-right thing, mapping to the"q"'s and"r"'s. But if you want to reuse a register allocated with a"q"or"r", you use"0","1","2"... etc.
You don't need to put a GCC-allocated register on the clobberlist as GCC knows that you're messing with it.
Now for output registers.
asm ("leal (%1,%1,4), %0"Note the use of =to specify an output register. You just have to do it that way. If you want 1 variable to stay in 1 register for both in and out, you have to respecify the register allocated to it on the way in with the "0"type codes as mentioned above.
: "=r" (x_times_5)
: "r" (x) );
asm ("leal (%0,%0,4), %0"This also works, by the way:
: "=r" (x)
: "0" (x) );
asm ("leal (%%ebx,%%ebx,4), %%ebx"2 things here:
: "=b" (x)
: "b" (x) );
__asm__ __volatile__ (...whatever...);However, I would like to point out that if your assembly's only purpose is to calculate the output registers, with no other side effects, you should leave off the volatilekeyword so your statement will be processed into GCC's common subexpression elimination optimization.
#define disable() __asm__ __volatile__ ("cli");Of course, libchas these defined too.
#define enable() __asm__ __volatile__ ("sti");
#define times3(arg1, arg2) /These multiply arg1 by 3, 5, or 9 and put them in arg2. You should be ok to do:
__asm__ ( /
"leal (%0,%0,2),%0" /
: "=r" (arg2) /
: "0" (arg1) );
#define times5(arg1, arg2) /
__asm__ ( /
"leal (%0,%0,4),%0" /
: "=r" (arg2) /
: "0" (arg1) );
#define times9(arg1, arg2) /
__asm__ ( /
"leal (%0,%0,8),%0" /
: "=r" (arg2) /
: "0" (arg1) );
times5(x,x);as well.
#define rep_movsl(src, dest, numwords) /Helpful Hint: If you say memcpy()with a constant length parameter, GCC will inline it to a rep movsllike above. But if you need a variable length version that inlines and you're always moving dwords, there ya go.
__asm__ __volatile__ ( /
"cld/n/t" /
"rep/n/t" /
"movsl" /
: : "S" (src), "D" (dest), "c" (numwords) /
: "%ecx", "%esi", "%edi" )
#define rep_stosl(value, dest, numwords) /Same as above but for memset(), which doesn't get inlined no matter what (for now.)
__asm__ __volatile__ ( /
"cld/n/t" /
"rep/n/t" /
"stosl" /
: : "a" (value), "D" (dest), "c" (numwords) /
: "%ecx", "%edi" )
#define RDTSC(llptr) ({ /Reads the TimeStampCounter on the Pentium and puts the 64 bit result into llptr.
__asm__ __volatile__ ( /
".byte 0x0f; .byte 0x31" /
: "=A" (llptr) /
: : "eax", "edx"); })
If you're wondering, I personally am a big fan of AT&T/UNIX syntax now. (It might have helped that I cut my teeth on SPARC assembly. Of course, that machine actually had a decent number of general registers.) It might seem weird to you at first, but it's really more logical than Intel format, and has no ambiguities.
If I still haven't answered a question of yours, look in theInfopages for more information, particularly on the input/output registers. You can do some funky stuff like use"A"to allocate two registers at once for 64-bit math or"m"for static memory locations, and a bunch more that aren't really used as much as"q"and"r".
Alternately,mail me, and I'll see what I can do. (If you find any errors in the above,please, e-mail me and tell me about it! It's frustrating enough to learn without buggy docs!) Or heck, mail me to say "boogabooga."
It's the least you can do.