32-bit JVM memory model on AIX
To understand the memory model of the 32-bit JVM for AIX, you first need to understand the address space of 32-bit processes on AIX. Various address space models on AIX were discussed at great length in the Developing and porting C and C++ applications on AIX Redbook, form number SG245674, and in the AIX product documentation the General Programming Concepts: Writing and Debugging Programs, the chapter on Large Program Support. Though we'll describe the address space models that are pertinent to the discussion of Java memory model, it is highly recommended you read both of these documents, especially if you use JNI in your Java applications.
The default address space of a 32-bit process
AIX is an operating system that uses virtual memory to address more memory than is physically available in the system. The translation of effective (virtual, logical) addresses to physical (real) addresses for instruction and data storage access is done using both software constructed tables and PowerPC hardware. On the PowerPC CPU planar, there are 16 segment registers used to form effective addresses and provide memory protection at the segment level, so the process memory model on AIX is segmented. When a process is active, the registers contain the addresses of the16 segments addressable by that process, which forms its address space.
32-bit applications on AIX have an address space of 4 GB (2**32=4G), with virtual addresses ranging from 0x00000000 through 0xFFFFFFFF. AIX divides this address space into 16 independent segments; each is a 256 MB chunk and addressed by a separate segment register. The high-order nibble of any virtual address identifies its segment. For example,
Not all of this memory is available to the application. Some segments are reserved by the operating system for the AIX kernel, shared libraries, and executable code. Other segments are reserved for heap memory, which is accessed by calling malloc()
, calloc()
, realloc()
, free()
and similar functions. The remaining segments are available for use by shmat()
and mmap()
for shared memory and file mappings. Figure 1 shows how each segment is used in the default 32-bit process memory model. The term text refers to program code as commonly used in the AIX documentation.
To briefly summarize the usage of each segment:
shmat()
or mmap()
routines to allocate shared memory segments. Unless the users specify an address in shmat()
or mmap()
calls, these segments are allocated in order, starting from segment 0x3 towards higher number addresses. In general, hardcoding the attaching memory address is bad programming practice, and hinders the portability of the application. The maximum possible shared memory in this model is 11 segments, which is 2.75 GB.The 0x2 segment, the program data segment, contains most of the per-process information, including:
malloc()
,
calloc()
,
realloc()
,
free()
and similar functions.
The large address-space model for a 32-bit process
Because the system places user data, heap, and stack within the program data segment (segment 0x2), the system limits the maximum amount of stack, heap, and static data to slightly less than 256MB. This size is adequate for most applications. However, applications with one or more of the following characteristics do not work well with the default memory model:
To address the demand of these large programs, AIX supports another address space model called the large address space. This model allows processes to allocate less segments for shared memory, and more for program data. The number of segments to be moved from shared memory to program data is specified in a value called "maxdata." There are three ways to specify the maxdata value:
ld
command using the -bmaxdata:0xN0000000 option where N is a number between 1 and 8, indicating the number of segments to be reserved for program data. This will put the maxdata value into the header of the XCOFF object file. Using dump -ov
on the object file can show the maxdata value of the executable. /usr/ccs/bin/ldedit
is provided to modify the XCOFF header of an executable file that has already been built. For example, to modify a.out to allow 6 segments for user heap:
$/usr/ccs/bin/ldedit -bmaxdata:0x60000000 a.out |
$ LDR_CNTRL=MAXDATA=0x60000000 a.out |
Since an environment variable is used in this approach, depending on how you set it, all processes started in this shell and their child processes may be affected.
Figure 2 shows the segment usage of a 32-bit process with a MAXDATA value of 0x30000000. In this large address space model, user data and heap move to the three extra segments allocated for program data, or segments 3, 4, and 5. The per process kernel data and user stack remains in segment 2. AIX reserves the three 256 MB segments for user data and heap, starting with segment 3. The initialized and uninitialized data goes into the lower addresses of segment 3, and the process heap continues right after the static data ends, growing towards higher number segments. The number of segments reserved for the process' shared memory is now reduced to 8 segments, instead of 11 as in the default 32-bit address space model. Again, unless the users specify an address in shmat()
or mmap()
calls, these segments are allocated in order, starting from segment 0x6 towards higher number segments.
The very large address-space model for a 32 bit process
The large address-space model we just discussed lets users allocate more segments to program data by sacrificing the number of segments for shared memory. The users have to decide how many segments should be allocated for program data before running the program. Once the program starts executing, there is no way for programmers to make adjustments.
AIX 5.1 introduces a very large address-space model that supports a more flexible mechanism for 32-bit programs to make maximum use of the available segments, as either program data or shared memory. The very large address-space model allows segments to be dynamically allocated between program data and shared memory. The maxdata value only serves as a cap for the program heap, but does not make the system reserve the segments for program heap only.
In AIX 5.2, the very large address-space model was further enhanced to allow 32-bit applications to grow their data heap up to 13 segments (3.25 GB).
The following is a summary of the features of the very large address-space model:
shmat
or mmap
subroutines. The maxdata value only serves as an upper bound of the number of segments that the program heap can grow up to. When a process tries to expand its data area into a new segment, the operation succeeds as long as the segment is not being used by shmat
or mmap
. A program can call the shmdt
or munmap
subroutine to stop using a segment so that the segment can be used for the data area. After a segment has been used for the data area, however, it can no longer be used for any other purpose, even if the size of the data area is reduced. shmat
or mmap
subroutines return an address in the lowest available segment. When the very large address-space model is used, these subroutines will return an address in the highest available segment. A request for a specific address will succeed, as long as the address is not in a segment that has already been used for the data area. This behavior is followed for all processes that specify the DSA property. /usr/ccs/bin/ldedit
command (only available in AIX 5.1 and later). For example,
$ /usr/ccs/bin/ldedit -bmaxdata:0x80000000/dsa a.out |
$ LDR_CNTRL=MAXDATA=0xC0000000@DSA a.out |
Other than including 0xD and d0xF segments in the dynamically allocated region, all the other allocation mechanism are the same as having maxdata value less than 0xB0000000. The heap segment starts after the initialized and uninitialized data and grows towards the higher number segment. The shared memory segments start from segment 0xF and grow toward lower number segments without skipping segment 0xD.
If a program is using this model (maxdata=0 and DSA option), all of the 13 segments (3.25 GB) are reserved for shared memory, which starts from segment 0xE and grows toward segment 0x3. Segmet 0x2 will be used for program stack, program heap, and all shared objects used by this process.
This model is useful for applications such as some database managers that manage their own memory requests on shared memory segments. However, it is the application's responsibility to ensure that segment 0x2 does not get corrupted due to stack overrun.
The following figures help summarize the very large address space. In Figure 3, the process is running with maxdata=0xA0000000 and the DSA option specified; the heap of the process can be as big as 10 segments (2.5 GB). Notice the system does not reserve 10 segments (0x3 - 0xC) for program heap. Segment 0x3 is always taken up by program data because of the presence of initialized and unitialized data. The rest of those 10 segments (0x4 - 0xC) are open for dynamic allocation between program heap and shared memory, so the program heap can potentially take 10 segments (0x3 - 0xC). Because maxdata is set to 0xA0000000, by putting an upper bound for program heap at segment 0xC the system reserves segment 0xE exclusively for shared memory, which can also potentially take 10 segments (0x4 -0xC, 0xE) and grow to 2.5 GB.
This model of very large address space applies to all program execution with maxdata less than 0xB0000000. As mentioned, AIX 5.1 only supports this model when maxdata is less than or equal to 0x80000000.
In Figure 4, the process is running with maxdata=0xB0000000 and the DSA option specified; the heap of the process can be as big as 11 segments (2.75 GB). Notice the shared library text and data are not occupying segment 0xD and 0xF anymore, and are moved to segment 0x2. Again, the lower address end of segment 0x3 is first taken up by the initialized and uninitialized data, therefore program heap. All the other 10 segments (0x4 - 0xD) are not reserved for either the program heap or shared memory, but open for competition between the two. However, segments 0xE and 0xF are exclusively reserved for shared memory. In this example when maxdata value is 0xB0000000/dsa, the program heap can potentially take 11 segments (0x3 - 0xD), and the shared memory can also potentially take up 12 segments (0x4 - 0xF) and grow to 3.0 GB. This model does allow users to specify maxdata value as big as 0xD0000000, in which case the program heap can potentially take 13 segments (3.25 GB)
AIX 5.1 does not support this model of very large address space, either.
In Figure 5, the process is running with maxdata=0 and the DSA option specified. In this case, not only the shared library and text, but also the program data and heap are moved to segment 2. All 13 segments (0x3 - 0xF) are exclusively reserved for the process to attach shared memory, and become a contiguous 3.25 GB shared memory space.
This model of very large address space is not supported on AIX 5.1, either.
The -Xms and -Xmx options (on JDK 1.1.8, -ms and -mx)
Now that we know all the various memory models on AIX, we can talk about how they relate to Java on AIX. As a Java application developer, you know how to use -Xms and -Xmx command line options (on JDK 1.1.8, -ms and -mx ) to request the starting and maximum sizes for Java heap. However, the -Xms and -Xmx settings can only work within the range allowed by the JVM's address space model. To get the Java heap size you want, you need to specify it using the -Xms and -Xmx options, and set the maxdata value if needed.
To understand how large you should make your Java heap, you need to know the difference between the Java heap and program heap we've been referring to in the discussion of AIX 32-bit process address space model. To clearly distinguish program heap from Java heap, we'll use native heap for the program heap.
When you launch JVM using the Java command or other JVM launcher commands (called Java command in the rest of this article), the command runs as an AIX process. This process is most likely a program written in some native language, such as C or C++. Just like other AIX processes, it uses program stack and static data, and possibly shared memory to do inter-process communication (IPC) or map files. In addition, it has lots of other things that are dynamically allocated out of the native heap for JVM to use for internal processing, including buffers used by JIT, data structures used by garbage collector, and any dynamic allocation using non-Java middleware invoked by the JVM. If your Java application uses JNI to invoke native code, the dynamic memory allocations done in the native code will also be coming out of the native heap. Thread stacks for all but the main threads are dynamically allocated from the heap as new threads get created. If native heap runs out during program execution, JVM may issue error messages telling you no more thread can be created.
Java heap, on the other hand, is an entity entirely managed by JVM. As JVM interprets your Java program, any pure Java objects created in your program will be allocated by JVM in the Java heap. You can think of Java heap as a big buffer pool managed by JVM. If Java heap is exhausted during execution, JVM will issue java.lang.OutOfMemory exceptions. Because JVM runs inside of an AIX process, the Java heap it manages has to either come out of its native heap or shared memory segments. In fact, the approaches taken in the various AIX Java releases have been changing, evolving as new scalability features become available on AIX.
How AIX Java releases implement Java heap
Table 1 below shows the properties related to Java heap implementation for various AIX Java releases. The memory model of Java 1.4.1 is discussed separately because it supports the very large address-space memory model, which is much more flexible than the large address space model.
1.1.8 | 1.2.2 | 1.3.1 | 1.4.0 | |
default maxdata value | 0x00000000 | 0x50000000 | 0x80000000 | 0x80000000 |
Java heap implementation | mmap( ) | mmap( ) | Xmx<1GB : malloc( ),Otherwise: mmap( ) 1 | Xmx<1GB : malloc( ), Otherwise: mmap( ) 1 |
-Xms default | 1 MB | 1 MB | 1 MB | 4 MB |
-Xmx default | 32 MB | 64 MB | 64 MB | 64 MB |
-Xmx maximum2 | 2 GB - 1 | 1280 MB | 1 GB | 1GB |
Max. Java heap possible | 8 segments ( 2 GB - 1 ) | 8 segments ( 2 GB - 1) | 10 segments( 2.5 GB ) | 10 segments( 2.5 GB ) |
Max. native heap possible3 | < 2.5 GB - Java heap | < 2.5 GB - Java heap | Maxdata=0: <<256MB, Maxdata >0 : < 2.5GB -Java heap | Maxdata=0: <<256MB, Maxdata >0 : < 2.5GB - Java heap |
1 Starting from JDK 1.3.1, Java heap is "malloc( )"ed when Java heap requested is less than 1GB, otherwise it is "mamp( )"ed. However, you can force Java to mmap its heap, regardless of heap size, by exporting the environment variable IBM_JAVA_MMAP_JAVA_HEAP=true.
2 Without patching the maxdata value in the Java command, the maximum value you can specify with -Xms and -Xmx is what's possible based on the combination of the maxdata and how Java heap is implemented. The rationale for the values in this and the next rows is:
mmap( )
. Because maxdata is 0x0000000, all the segments between 0x3 and 0xC are available for
mmap( )
. However, due to a bug in the code that translates the command-line parameters to signed 32-bit integers, you have to stay at or below 0x7FFFFFFF bytes. The maximum value you can specify on the command line is 2147483647 (or 2GB -1). The value in the next row is the same without the need to modify maxdata value. For more details, please see the article " How to increase memory in AIX for Java applications."
mmap( )
. The maxdata value is 0x50000000, and JVM requires Java heap to be a contiguous address space, so by default there are only five consecutive segments (0x8 - 0xC ) for JVM to allocate with
mmap( )
. If you launch the out-of-the-box JVM, you can specify only up to1280 MB Java heap size. To get the maximum possible value shown in the next row, you do need to modify maxdata value to 0x20000000 or lower.
mmap( )
. Ironically, JVM switches to using
mmap( )
to allocate Java heap whenever the heap size requested in the command line is larger than 1GB. Any Java invocation requesting more than 1GB Java heap is not going to work without changing the maxdata value, which is why the value shown in this row is 1GB. The maximum possible value in the next row is 2.5 GB, which is larger than previous JDK releases. To get 2.5GB Java heap size, you need to do two things: specify -Xmx 2560m, and set maxdata to 0x00000000 to free up all 10 consecutive segments (0x3 -0xC) for
mmap( )
.
3 While you're trying to get the biggest Java heap possible, you need to take into consideration how big a native heap JVM and your application need. As explained previously, lots of memory required by JVM to operate comes from native heap, including JIT buffers, thread stacks, and other JNI dynamic memory allocations. The rationale for the values in this row is:
mmap( )
, the 10 segments (0x3 - 0xC), or 2.5 GB, is divided between Java heap and native heap, which is why the maximum native heap possible is "< 2.5 GB - Java heap". The "<" sign is to account for the space taken up by initialized and uninitialized data that also goes into program data segment.
How JDK 1.4.1 implements Java heap on AIX
As mentioned, JDK 1.4.1 supports the very large address-space model, which is much more flexible. When the very large address space model is enabled by having maxdata value as 0xN0000000/dsa, the available segments are dynamically distributed between program data and shared memory, with 0xN0000000 serving only as the upperbound for program data. Below is a quick summary of how big Java heap and native heap are for Java process running with the three types of very large address-space models:
With JDK 1.4.1, users do not have to be bothered with the setting of the LDR_CNTRL=MAXDATA environment variable any more. The 1.4.1 JVM now sets an appropriate maxdata value based on the maximum Java heap size requested by users using the -Xmx option specified in java commands. If LDR_CNTRL=MAXDATA is set before you start the JVM, the JVM uses the specified value; otherwise, the JVM uses the following algorithm to set LDR_CNTRL=MAXDATA:
If the heap size is greater than 1 GB, LDR_CNTRL=MAXDATA is set to an appropriate value. Note that this is an inverse relationship because as the heap size increases, fewer segments are reserved through the LD_CNTRL=MAXDATA value. For example, for a 1 GB heap LDR_CNTRL=MAXDATA is set to 0X60000000, while for a 1.5 GB heap, the LDR_CNTRL=MAXDATA value is 0X40000000.
If the heap size is smaller than 1 GB, LDR_CNTRL=MAXDATA is set to 0X80000000.
Back to top
64-bit JVM memory model on AIX
Fortunately, things are so much simpler in the 64-bit world. There will be no need to juggle with the limited number of segments any more. If even the very large address-space model used by the 32-bit JVM cannot accommodate your Java application's appetite for Java heap or native heap, you might want to consider switching to the 64-bit version of Java releases. In this section, we'll provide a brief summary of the 64-bit user process address space on AIX to show how simple the memory model is.
The address space of a 64-bit process on AIX
The 64-bit user process model shares the same concept of segments with the 32-bit user process model. The segment size is still 256 MB, but the number of available segments in the address space is now 232, instead of 24. Therefore, the 64-bit user process can address up to 1 EB (exabytes), which can be calculated as follows:
232 segments x 256 [MB/segment] = 232x 228 bytes = 260 = 1 EB |
If you're wondering what the unit EB means, Table 2 below shows the definitions of prefixes commonly used in the IT industry.
Table 2
Prefix | Symbol(s) | Power of 10 | Power of 2 | Number of bytes |
kilo- | k or K ** | 103 | 210 | 1,024 |
mega- | M | 106 | 220 | 1,048,576 |
giga- | G | 109 | 230 | 1,073,741,824 |
tera- | T | 1012 | 240 | 1,099,511,627,776 |
peta- | P | 1015 | 250 | 1,125,899,906,842,624 |
exa- | E | 1018 * | 260 | 1,152,921,504,606,846,976 |
To address this tremendously huge space, the pointer type is defined as 64-bit in the 64-bit user process model. Figure 6 below shows how segments are used in this huge address space.
The first 16 segments (0 - 4GB) are exempt from general use in order to keep the compatibility with the 32-bit user process model.
Segment 16 to 7*167 (4 GB - 448 PB) are used for user text, data, and heap. The user text is mapped into the first segment in this area. Also, user data is mapped into another segment in this area. In both cases, if a segment is not sufficient to contain text or data, another segment will be contiguously attached to the process address space.
The next 167 segments (448 PB - 512 PB) are for a 64-bit process to call shmat()
or mmap()
routines to attach shared memory to the process.
The next 167 segments (512 PB - 576 PB) are for objects that are loaded into the address space privately, such as using dlopen( )
or load( )
system calls, or using shared object files that do not allow others to read and execute. Sometimes this is done by third party middlewares without your awareness. Again, please see the redbook Developing and porting C and C++ applications on AIX, SG245674, for assistance.
The next 167 segments (576 PB - 640 PB) are used to load 64-bit shared library text and data that are to be shared by all 64-bit user processes on the system. Also, shared library data will be created in another segment in this area for this process' private use. In both cases, if there is a segment that has enough free space to contain shared text or shared library data, that segment will be used. Otherwise, another segment will be attached to the process address space.
Segments 10*167 to 15*167 (640 PB - 960 PB) are reserved by the system and prohibited from the user process access.
The last 167 segments (960 PB - 1 EB) are used by a 64-bit user process for user stack. The stack grows from the last address, 0x0FFF_FFFF_FFFF_FFFF, toward the first address in this area, to use more than one segment for user process stack.
The -Xms and -Xmx options for 64-bit Java
As you can imagine, with the 64-bit releases of Java you don't have to worry much about running out of memory. The Java heap size can be very big. However, to run 64-bit Java, all native code used as 3rd party middleware by your Java program, or through JNI by your Java program or other middleware indirectly has to be enabled for 64-bit and compiled and linked as 64-bit programs. One additional thing to worry about would be the possible long pause time while garbage collector goes through the huge heap.
Back to top
Conclusions
The detailed discussion of AIX 32-bit and 64-bit process address space in this article should equip you with enough problem determination skill to deal with most issues caused by inappropriate Java heap size settings.
Please stay tuned for part 3 in this series, which will include a discussion of the Java garbage collector and thread implementation. The information will help you get the performance and scalability you need on AIX.
Resources
About the author
Lee Cheng works as a senior consultant for RS/6000 and AIX software vendors. She provides support to them in the areas of application benchmarks, performance tuning, application porting, and internationalization. Before joining the RS/6000 ISV Technical Support group, she was a developer for compilers and the AIX system management component. She holds a M.S. degree in Computer Science from the University of Kentucky. Her publications includes AIX Performance Tuning: CPU usage, AIX Performance Tuning: Focus on Memory. You can contact Lee at [email protected].