Overview
Nanojit is a small, cross-platform C++ library that emits machine code. Both the Tamarin JIT and the SpiderMonkey JIT (a.k.a. TraceMonkey) use Nanojit as their back end.
You can get Nanojit by cloning the tamarin-redux
Mercurial repository at http://hg.mozilla.org/tamarin-redux. It's in the nanojit
directory.
The input for Nanojit is a stream of Nanojit LIR instructions. The term LIR is compiler jargon for a language used internally in a compiler that is usually cross-platform but very close to machine language. It is an acronym for "low-level intermediate representation". A compiler's LIR is typically one of several partly-compiled representations of a program that a compiler produces on the way from raw source code to machine code.
An application using Nanojit creates a nanojit::LirBuffer
object to hold LIR instructions. It creates a nanojit::LirBufWriter
object to write instructions to the buffer. Then it wraps the LirBufWriter
in zero or more other LirWriter
objects, all of which implement the same interface asLirBufWriter
. This chain of LirWriter
objects forms a pipeline for the instructions to pass through. Each LirWriter
can perform an optimization or other task on the program as it passes through the system and into the LirBuffer
.
Once the instructions are in the LirBuffer
, the application calls nanojit::compile()
to produce machine code, which is stored in ananojit::Fragment
. Internally to Nanojit, another set of filters operates on the LIR as it passes from the LirBuffer
toward the assembler. The result of compilation is a function that the application can call from C via a pointer to the first instruction.
Example
The following code works with SpiderMonkey's hacked version of Nanojit. Figuring out how to compile it is left as an exercise for the reader; the following works when run in the object directory of an --enable-debug
SpiderMonkey shell:
g++ -DDEBUG -g3 -Wno-invalid-offsetof -fno-rtti -include js-confdefs.h -I dist/include/ -I.. -I ../nanojit -o jittest ../jittest.cpp libjs_static.a
(Remove the -DDEBUG
if you have not compiled SpiderMonkey with --enable-debug
, and use whatever you called the sample source file in place ofjittest.cpp
.)
- #include <stdio.h>
- #include <stdint.h>
- #include "jsapi.h"
- #include "jstracer.h"
- #include "nanojit.h"
- using namespace nanojit;
- const uint32_t CACHE_SIZE_LOG2 = 20;
- static avmplus::GC gc = avmplus::GC();
- static avmplus::AvmCore core = avmplus::AvmCore();
- int main()
- {
- LogControl lc;
- #ifdef DEBUG
- lc.lcbits = LC_ReadLIR | LC_Assembly;
- #else
- lc.lcbits = 0;
- #endif
- // Set up the basic Nanojit objects.
- Allocator *alloc = new VMAllocator();
- CodeAlloc *codeAlloc = new CodeAlloc();
- Assembler *assm = new (&gc) Assembler(*codeAlloc, *alloc, &core, &lc);
- Fragmento *fragmento =
- new (&gc) Fragmento(&core, &lc, CACHE_SIZE_LOG2, codeAlloc);
- LirBuffer *buf = new (*alloc) LirBuffer(*alloc);
- #ifdef DEBUG
- fragmento->labels = new (*alloc) LabelMap(*alloc, &lc);
- buf->names = new (*alloc) LirNameMap(*alloc, fragmento->labels);
- #endif
- // Create a Fragment to hold some native code.
- Fragment *f = fragmento->getAnchor((void *)0xdeadbeef);
- f->lirbuf = buf;
- f->root = f;
- // Create a LIR writer
- LirBufWriter out(buf);
- // Write a few LIR instructions to the buffer: add the first parameter
- // to the constant 2.
- out.ins0(LIR_start);
- LIns *two = out.insImm(2);
- LIns *firstParam = out.insParam(0, 0);
- LIns *result = out.ins2(LIR_add, firstParam, two);
- out.ins1(LIR_ret, result);
- // Emit a LIR_loop instruction. It won't be reached, but there's
- // an assertion in Nanojit that trips if a fragment doesn't end with
- // a guard (a bug in Nanojit).
- LIns *rec_ins = out.insSkip(sizeof(GuardRecord) + sizeof(SideExit));
- GuardRecord *guard = (GuardRecord *) rec_ins->payload();
- memset(guard, 0, sizeof(*guard));
- SideExit *exit = (SideExit *)(guard + 1);
- guard->exit = exit;
- guard->exit->target = f;
- f->lastIns = out.insGuard(LIR_loop, out.insImm(1), rec_ins);
- // Compile the fragment.
- compile(assm, f, *alloc verbose_only(, fragmento->labels));
- if (assm->error() != None) {
- fprintf(stderr, "error compiling fragment\n");
- return 1;
- }
- printf("Compilation successful.\n");
- // Call the compiled function.
- typedef JS_FASTCALL int32_t (*AddTwoFn)(int32_t);
- AddTwoFn fn = reinterpret_cast<AddTwoFn>(f->code());
- printf("2 + 5 = %d\n", fn(5));
- return 0;
- }
Code Explanation
Interesting part are the lines 38-44:
// Write a few LIR instructions to the buffer: add the first parameter
// to the constant 2.
out.ins0(LIR_start);
LIns *two = out.insImm(2);
LIns *firstParam = out.insParam(0, 0);
LIns *result = out.ins2(LIR_add, firstParam, two);
out.ins1(LIR_ret, result);
Basically, what the code provided above is doing is feeding raw LIR into the LIR Buffer, using the LIRWriter's writer object. From an operational point of view, it is creating a function, which takes an integer input, and adds it to two, and outputs the result. The function is created here on lines 57-69:
// Compile the fragment.
compile(assm, f, *alloc verbose_only(, fragmento->labels));
if (assm->error() != None) {
fprintf(stderr, "error compiling fragment\n");
return 1;
}
printf("Compilation successful.\n");
// Call the compiled function.
typedef JS_FASTCALL int32_t (*AddTwoFn)(int32_t);
AddTwoFn fn = reinterpret_cast<addtwofn>(f->code());
printf("2 + 5 = %d\n", fn(5));
return 0;</addtwofn>
This upper half of this snippet includes code where the raw LIR is first converted into machine code.(where compile(fragmento->assm(), f); is called basically).
Then a pointer to a function is used, which takes an int as input and returns the sum of that parameter with two. (typedef JS_FASTCALL int32_t (*AddTwoFn)(int32_t); )
Then, printf is hardcoded to call it with a parameter 5, and on linking with nanojit library, the following program will display
2+5=7
Now, what I need to do is generate output for this:
start
two = int 2
twoPlusTwo = add two, two
ret twoPlusTwo
This adds two and two in the most hardcoded way possible. The conversion from LIR to a program like one shown above is the task of the parser.
Guards
Guards are special LIR instructions, similar to conditional branches, with the difference that when they are called, instead of going to a particular address, they leave the JIT code entirely, and stop the trace.
Need
Guards are required in a cross platform dynamic language like Javascript. Certain assumptions are made when a particular JIT code is generated.
For example, in an instruction INR x, a guard would check that x doesn't overflow the range for a 32 bit integer. The JIT code would have a guard checking this condition(an xt guard), and would return to the interpreter if the condition turns out to be true. The interpreter is then equipped to handle the overflow.
Hence, guards are needed to prevent certain erroneous behaviour that might result from the assumptions that are generally made while JIT is generated.
TODO: Explain guards, guard records, VMSideExit
, Fragmento
, VerboseWriter::formatGuard
...