注:GCC4.6 已经支持 -split-stack 选项
Why Fiber/Coroutine? 可参考我的相关文章:异步通讯中使用纤程(Fiber/UserSpaceThread)
Ian Lance Taylor
The goal of split stacks is to permit a discontiguous stack which is grown automatically as needed. This means that you can run multiple threads, each starting with a small stack, and have the stack grow and shrink as required by the program. It is then no longer necessary to think about stack requirements when writing a multi-threaded program. The memory usage of a typical multi-threaded program can decrease significantly, as each thread does not require a worst-case stack size. It becomes possible to run millions of threads (either full NPTL threads or co-routines) in a 32-bit address space.
This is currently implemented for 32-bit and 64-bit x86 targets running GNU/Linux in gcc 4.6.0 and later. For full functionality you must be using the gold linker, which you can get by building binutils 2.21 or later with --enable-gold.
The stack will have a guaranteed zone which is always available. The size of the guard area will be target specific. It will include enough stack space to actually allocate more stack space. Each function will have to verify that it has enough space in the current stack to execute.
The basic verification will be a comparison between the stack pointer and the current bottom of the stack plus the guaranteed zone size. This will have to be the first operation in the function, and will also be target specific. It must be fast, as it will be executed by each called function.
There are two cases to consider. For functions which require a stack frame less than the size of the guaranteed guard area, we can do a simple comparison between the stack pointer and the stack limit. For functions which require a larger stack frame, we must do a comparison including the size of the stack frame.
Reserve a register to hold the bottom of the stack plus the guaranteed size. This will have to be a callee-saved register. Each function with a small stack frame then starts with a two instruction sequence:
cmp %sp,%reg bg 1f expand stack 1:
Use a TLS variable. In the general case, in a shared library, this will require calling the__tls_get_addrfunction. That means that that function will have to work without requiring any additional stack space. This is infeasible unless the whole system is compiled with split stacks. It would requireLD_BIND_NOWto be set, so that the__tls_get_addrfunction is resolved at program startup time. Even that is probably insufficient unless we can ensure that the space for the variable is fully allocated. In general I don't think we can ensure this, becausedlopencan cause a thread to require more space for TLS variables, and that space will be allocated on the first call to__tls_get_addr. So I don't think this approach can work in general.
Have the stack always end at a N-bit boundary. E.g., if we always allocate stack segments as a multiple of 4K, then align each one so that the stack always ends at a 12-bit boundary. Then the amount of space remaining on the stack isSP&0xfff. We then have a four instruction sequence:
mov %sp,%junkreg and $0xfff,%junkreg cmp $framesize plus slop,%junkreg bg 1f expand stack 1:
Introduce a new function call which handles the comparison of the stack pointer and the stack expansion. This function call would be part of the threading library and can use special knowledge--e.g, the ability to quickly locate information about the stack of the current frame. This function call would have to use a special calling sequence, with a single argument which is the size of the stack frame. Then each function would start with a call to this function. On some targets this function call could perhaps be combined with a function call required by PIC code--e.g.,__i686.get_pc_thunk.bx.
Expanding the stack requires allocating additional memory. This additional memory will have to be allocated using only the stack space slop. All of the functions used to allocate additional stack space must be compiled to not use a split stack. A new function attribute,no_split_stackwill be introduced to mean that the stack should not be split. It would also work to ensure that the stack is large enough that they do not need to split the stack during the allocation call.
After expanding the stack, the function will copy any stack based parameters from the old stack to the new stack. Fortunately, all C++ objects which require a copy or move constructor are implicitly passed by reference, so copying the parameters on the stack is OK. For varargs functions, this is impossible in general, so we will compile varargs functions differently: they will use an argument pointer which is not necessarily based on the frame pointer. For functions which return objects on the stack, the objects will be returned on the old stack. This should normally happen automatically, as the initial hidden parameter will naturally point to the old stack.
When expanding the stack, the return address of the function will be munged to point to a function which will release the allocated stack block and reset the stack pointer to the caller. The address of the old stack block, and the old stack pointer, will have been saved somewhere in the new stack block.
We want to be able to use split stack programs on systems with prebuilt libraries compiled without split stacks. This means that we need to ensure that there is sufficient stack space before calling any such function.
Each object file compiled in split stack mode will be annotated to indicate that the functions use split stacks. This should probably be annotated with a note but there is no general support for creating arbitrary notes in GNU as. Therefore, each object file compiled in split stack mode will have an empty section with a special name:.note.GNU-split-stack. If an object file compiled in split stack mode includes some functions with theno_split_stackattribute, then the object file will also have a.note.GNU-no-split-stacksection. This will tell the linker that some functions may not have the expected split stack prologue.
When the linker links an executable or shared library, it will look for calls from split-stack code to non-split-stack code. This will include calls to non-split-stack shared libraries (thus, a program linked against a split-stack shared library may fail if at runtime the dynamic linker finds a non-split-stack shared library; it might be desirable to use a new segment type to detect this situation).
For calls from split-stack code to non-split-stack code, the linker will change the initial instructions in the split-stack (caller) function. This means that the linker will have to have special knowledge of the instructions that the compiler emits. The effect of the changes will be to increase the required framesize by a number large enough to reasonably work for a non-split-stack. This will be a target dependent number; the default will be something like 64K. Note that this large stack will be released when the split-stack function returns. Note that I'm disregarding the case of split-stack code in a shared library calling non-split-stack code in the main executable; that seems like an unlikely problem.
Function pointers are a tricky case. In general we don't know whether a function pointer points to split-stack code. Therefore, all calls through a function pointer will be modified to call (or jump to) a special function__fnptr_morestack. This will use a target specific function calling sequence, and will be implemented as though it were itself a function call instruction. That is, all the parameters will be set up, and then the code will jump to__fnptr_morestack. The__fnptr_morestackfunction takes two parameters: the function pointer to call, and the number of bytes of arguments pushed on the stack. (This is not yet implemented.)
The debugger will have to learn about split stack mode. There are two major issues: unwinding the stack, and finding the arguments to a varargs function.
Fortunately DWARF is complex enough to represent the unusual return sequence used by a split stack function. Therefore, the only major issue is that gdb expects all stack address to monotonically decrease. Some marker will be needed to tell gdb to disable this sanity check.
For a varargs function, the varargs will be accessed via an argument pointer rather than on the stack. gdb may need to be adjusted to understand this.
Add a new target functionboolsupports_split_stack()along withTARGET_SUPPORTS_SPLIT_STACK.
Add-fsplit-stackoption to gcc. This will only be permitted iftargetm.supports_split_stack()returns true.
Addno_split_stackfunction attribute. This will be ignored if-fno-split-stack.
If-fsplit-stackis used, emit a split-stack note into the object file.
Write the__generic_morestackroutine. It must allocate the required amount of memory without itself splitting the stack. This is likely to be OS dependent.
allocaand dynamic arrays must change to check for available stack space. If there is not enough available, they must call a heap allocation routine. The space must be released on function exit.
Define the calling convention for the__morestackroutine.
Implement this sequence in the prologue generation code when-fsplit-stackis used and the no_split_stack attribute is not used.
Define the calling convention for the__fnptr_morestackroutine. Change the call define_expands to use the new calling convention for calls to a REG.
Write the__morestackroutine. This will have to gets its argument in a target dependent fashion. It will have to save registers and switch to an alternate stack. Then it can call__generic_morestack. It must change the return value of the calling function to point to__releasestack. The function entry sequence must permit this to be done reliably.
Write the__releasestackroutine. It must retrieve the original stack pointer and original return address from the allocated stack, release the allocated stack, and adjust the stack pointer to point to the old stack.
Write the__morestack_nosplitroutine. This is called when split-stack code calls no-split-stack code. It is similar to__morestack, but uses a target dependent large stack size.
Write the__fnptr_morestackroutine. In some cases it may be straightforward to examine the code at the function pointer to see if it checks for a split stack. If it does, the code can jump straight to the called function. Otherwise, it must allocate stack as with__morestack_nosplit, and then copy the argument bytes over to the new stack.
Ifld-ris used to link an object with the split-stack note with an object without the split-stack note, issue an error.
If code in a split-stack object calls code in a non-split-stack object, change the initial starting sequence of the code to always call__morestack_nosplit.
In a link which includes split-stack objects, if a function with a non-split-stack object is referenced in a way other than calling it (e.g., taking its address), then instead take the address of a linker-generated stub function. The stub function will call__morestack_nosplit, copy parameters to the new stack if necessary, and then jump to the real function. Copying parameters will have to make an assumption about the number of parameters, so if too many parameters are passed this will fail. I'm not sure how best to handle this unlikely case. We should only allocate a new stack if called from a split-stack function.
None: SplitStacks (2011-02-07 01:47:39由IanLanceTaylor编辑)