Bookmark Go to End
Doc ID: Note:260152.1
Subject: Summary About the Large SGA & Address Space on RH Linux
Type: BULLETIN
Status: PUBLISHED
Content Type: TEXT/X-HTML
Creation Date: 23-DEC-2003
Last Revision Date: 07-NOV-2005
PURPOSE
-------
This article will assist you to clarify which is the
max SGA size value according the differents kernels and methodology to increase it.
SCOPE & APPLICATION
-------------------
This is intended for Linux system administrators and for IA-32 architecture.
On Itanium processor there are not particular limits because
bypassed in a 64 bit executable.
INTRODUCTION
------------
~ Very Large Memory (VLM) on systems with up to 64GB of RAM
is the ability to use up to 64GB pagecache on 32-bit system.
The Advanced Server 2.1 and 3.0 kernel allows Oracle to allocate and use more than
4GB of memory for the database buffer cache on a 32-bit Intel platform. This
feature is also called VLM (Very Large Memory) in Oracle documents. With
Oracle9iR2 on Red Hat Linux Advanced Server, the SGA can be increased
to a theoretical limit of about 62GB depending on the available RAM memory
on the system. Current hardware limitations and practical considerations further
limit the actual size of SGA that can be configured, but it is still several times
larger than the size of the SGA without VLM. Running in VLM mode requires
some setup changes to Linux and imposes some limitations on what features and
init.ora parameters can be used.
~ PAE (Page Address Extensions)
(see http://en.wikipedia.org/wiki/Physical_Address_Extension)
In order to get above 4GB virtual memory on IA-32 architecture a technique
known as PAE (Page Address Extensions) is used.
It is a method that translates 32-bit (2**32 = 4GB) linear addresses
to 36-bit (2**36 = 64GB) physical addresses. In the linux kernel, the support
is provided through a compile time option that produces two
separate kernels the SMP kernel which supports only upto 4GB VM
and the enterprise kernel which can go up to 64GB VM
(also called Very Large Memory 'VLM' capable).
The enterprise kernel is able to use up to 64GB pagecache
without any modifications. This means applications like oracle
can make use of the large memory and scale up
to a large number of users without loss of performance or reliability.
(shmfs/tmpfs,ramfs is implemented using PAE)
- Red Hat Linux Advanced Server 2.1
~ Large SGA size capability through changing of mapped_base (Ref. [1], [2])
The default address where oracle is loaded from is 0x50000000 and the
default mapped base is 0x40000000 (decimal 1073741824). The space between
0x40000000 to 0x5000000 is reserved for loading oracle libraries. If the
mapped_base is lowered to 0x10000000 (decimal 268435456), then the space
between 0x10000000 to 0x15000000 is used for loading the oracle libraries.
Oracle executable will start getting loaded from 0x15000000 thus allowing a
bigger SGA and a larger window size. The lowered values of mapped base
specified above are examples, and the user should set these to values
appropriate for their systems. In order to use this method
It will be necessary relocating the SGA attach address.
||| |
||| |
||| |
||| |
|| oracle libraries| 0x50000000 |
||+------------+
||| |
||Original Base| 0x40000000 | decimal 1073741824
|+------------+
|| |
|| |
|| |
|| |
|oracle libraries| 0x15000000 |
|+------------+
|| |
|Lowered Base| 0x10000000 | decimal 268435456
+------------+
| |
+------------+
Note:
Lowering mapped_base is a method available only
on RedHat Advanced Server (RHAS) 2.1
this is automatically done in Red Hat Enterprise Linux 3.0
~ Shared memory file-system(shmfs) support (Ref. [1], [2])
Memory-based file system optimized for shared memory operations
and for larger SGA size.
- Red Hat Enterprise Linux 3.0 - 4.0
~ hugemem kernel (Ref. [3], [4], [5])
Red Hat Enterprise Linux 3.0/4.0 includes a new kernel known as the
hugemem kernel. This kernel supports a 4GB per process user space
(versus 3GB for the other kernels), and a 4GB direct kernel space.
Using this kernel allows Red Hat Enterprise Linux to run on systems
with up to 64GB of main memory. The hugemem kernel is required in
order to use all the memory in system configurations containing more
than 16GB of memory. The hugemem kernel can also benefit
configurations running with less memory (if running an application
that could benefit from the larger per process user space, for example.)
The hugemem kernel feature is also called 4GB-4GB Split Kernel
A classic 32-bit 4GB virtual address space is split 3GB
for user processes and 1GB for the kernel
The new scheme (4gb/4gb) permits 4GB of virtual address space
for the kernel and almost 4GB for each user process
3GB:1GB Split
0Gb 1Gb 2Gb 3Gb 4Gb
|-------------|-------------|-------------|-------------|
|--> Per-User Process Kernel
4GB:4GB Split
0Gb 1Gb 2Gb 3Gb 4Gb
|-------------|-------------|-------------|-------------|
|--> Per-User Process (*) Kernel “trampoline”
0Gb 1Gb 2Gb 3Gb 4Gb
|-------------|-------------|-------------|-------------|
|(*)--> Kernel
~ VLM Option
Using RHEL3/4 you have two options to use the VLM :
* Use shmfs/tmpfs much as you would in RHAS2.1:
mount a shmfs with a certain size to /dev/shm, and
set the correct permissions.
Keep in mind that in RHEL3, shmfs allocate memory is pageable.
Better to use tmpfs since there is no need to specify size
Mount shmfs:
# mount -t shm shmfs -o size=20g /dev/shm
Edit /etc/fstab:
shmfs/dev/shmshmsize=20g00
---- OR ----
Mount tmpfs:
# mount –t tmpfs tmpfs /dev/shm
Edit /etc/fstab:
none/dev/shmtmpfsdefaults00
* Use ramfs (Ref. [4], [5]):
ramfs is similar to shmfs, except that pages are not pageable/swappable.
This approach provides the commonly desired effect.
Ramfs is created by
# mount -t ramfs ramfs /dev/shm (unmount /dev/shm first).
The only difference here is that the ramfs pages are not backed by big pages.
* When the shmfs/tmpfs,ramfs is available,
Oracle server should know whether to use it or not
Need to use the parameter 'use_indirect_data_buffers=true' remains the same;
SAMPLE CASES STUDY AND REFERENCE
--------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ RedHat Advanced Server (RHAS) 2.1
~
~ Default Configuration w/o Enterprise Kernel with <4Gb of RAM
~
~ >>>SGA MAX Size 1.7 GB<<<
~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Configuration : 3+1 3Gb user process +1Gb Kernel
On normal kernel we have the following default memory layout for every process:
1.0 GB Text/code
1.7 GB available process memory for mmap, malloc /
address space (e.g. usable by shared Pool)
0.3 GB Stack
1.0 GB Kernel
==> 4 – 1 – 1 – 0.3 = 1.7GB left for SGA
A picture of Memory Layout:
---------------------------
4GB +------------------+ 0xFFFFFFFF
| |
| Kernel stuff |
| | 0xE0000000 SSKGMTOP
| | (from sskgm.h may17 label)
| |
3GB +------------------+ 0xC0000000 __PAGE_OFFSET
| | Stack grows | (include/asm-i386/page.h)
| v down... |
2.98GB |------------------| 0xBF000000
| |
| |
| Oracle SGA |
| max 1776MB |
| max 1.75GB |
| |
| |
| |
| |
| |
| |
| |
| |
1.25GB +------------------| 0x50000000 GENKSMS_SGA_ADDR
| Shared libraries |
| lib*.so |
| |
| |
| |
| |
1GB +------------------+ 0x40000000 TASK_UNMAPPED_BASE
| applicaton code |
| (Oracle .text) | ^
| | |
| | 0x20000000 SSKGMBOTTOM |
| | (from sskgm.h may17 label) |
| | |
| | |
128MB |------------------| 0x08000000 |
| | |
| | |
0 +------------------+ 0x00000000 |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ RedHat Advanced Server (RHAS) 2.1
~
~ Configuration : 3+1 & lower mapped Base Address
~ ->> 3Gb user process + 1gb Kernel + lower mapped Base Address
~
~ >>>SGA MAX Size 2.7 GB<<<
~
~ details on [1]Note 211424.1 & [2]Note 200266.1
~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If we lower the the mapped Base Address from 0x40000000
(decimal 1073741824) to 0x15000000 (decimal 352321536),
we reduce the space for Text/Code and can get another 0.75Gb (768Mb) for the SGA
0.3 GB Text/code
2.7 GB available process memory for mmap, malloc /
address space (e.g. usable by shared Pool)
0.3 GB Stack
1.0 GB Kernel
How to Lowering mapped_base in short, for details see [1]Note 211424.1:
- Modify shmmax:
% echo 3000000000 > /proc/sys/kernel/shmmax
- Relocating the SGA (low SGA attach address):
% cd $ORACLE_HOME/rdbms/lib
% cp ksms.s ksms.s_orig
% genksms -s 0x15000000 > ksms.s
% make -f ins_rdbms.mk ksms.o
% make -f ins_rdbms.mk ioracle
- lower the mapped base for a single bash terminal session
1. Open a terminal session (Oracle session).
2. Open a second terminal session and su to root (root session).
3. Find out the process id for the Oracle session.
For example: do "echo $$" in the Oracle session.
4. Now lower the mapped base for the Oracle session to 0x10000000.
From the root session, echo 268435456 >/proc//mapped_base,
where is the process id determined in step 4.
5. From the Oracle terminal session, startup the Oracle instance.
The SGA now begins at a lower address, so more of the address
space can be used by Oracle.
Now you can increase the init.ora values of db_cache_size or db_block_buffers
to increase the size of the database buffer cache.
In this case the max SGA size will be 2.7 GB
Linux memory layout after lowering lower mapped Base Address:
-------------------------------------------------------------
4GB +------------------+ 0xFFFFFFFF
| |
| Kernel stuff |
| |
| |
| |
3GB +------------------+ 0xC0000000 __PAGE_OFFSET
| | Stack grows | (include/asm-i386/page.h)
| v down... |
2.98GB |------------------| 0xBF000000 SSKGMTOP
| |
| |
| Oracle SGA |
| max 2720MB |
| max 2.65GB |
| |
| |
| |
| |
| |
| |
| |
| |
| |
336MB +------------------+ 0x15000000 SSKGMBOTTOM
| Shared libraries | (GENKSMS_SGA_ADDR)
| lib*.so |
256MB |------------------| 0x10000000 TASK_UNMAPPED_BASE
| applicaton code |
I (Oracle .text) I
128MB +------------------+ 0x08000000
| |
| |
0 |------------------| 0x00000000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ RedHat Enterprise Linux (RHEL) 3.0 - 4.0
~
~ Default Configuration : 4gb/4gb split (hugemem kernel)
~
~ >>>SGA MAX Size 2.7 GB<<<
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We split the kernel/proc memory in two equal pieces and
we have the following layout of virtual address space:
1.0 GB Text/code
2.7 GB available process memory for mmap, malloc /
address space (e.g. usable by shared Pool)
0.3 GB Stack
4.0 GB Kernel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ RedHat Enterprise Linux (RHEL) 3.0 - 4.0
~
~ Configuration : 4gb/4gb split (hugemem kernel) + low SGA attach address
~
~ >>>SGA MAX Size 3.42 GB<<<
~
~ details on [1]Note 211424.1
~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using the default configuration : 4gb/4gb split
we can lower the SGA attach address which will result in 3.4Gb process memory
3.42 Gb available process memory for mmap, malloc /
address space (e.g. usable by shared Pool)
0.25 Gb Kernel “trampoline”
0.33 Gb sga base
4.00 Gb Kernel
==> 4GB – 0.33GB – 0.25GB = 3.42
How to lower the SGA attach address in short, for details see [1]Note 211424.1:
- Modify shmmax:
% echo 3000000000 > /proc/sys/kernel/shmmax
- Relocating the SGA:
% cd $ORACLE_HOME/rdbms/lib
% cp ksms.s ksms.s_orig
% genksms -s 0x15000000 > ksms.s
% make -f ins_rdbms.mk ksms.o
% make -f ins_rdbms.mk ioracle
- Note:
Lowering 'mapped_base' for a single bash terminal session is a method
available only on RedHat Advanced Server (RHAS) 2.1
This is not need on Red Hat Enterprise Linux 3.0/4.0 because automatically done.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ RedHat Advanced Server (RHAS) 2.1 (shmfs/tmpfs)
~ RedHat Enterprise Linux (RHEL) 3.0 (shmfs/tmpfs, ramfs)
~
~ Configuration : VLM mode + in-memory filesystem (shmfs/tmpfs, ramfs)
~
~ >>>SGA MAX Size 62GB<<<
~
~ details on [1]Note 211424.1 / [4]Note 262004.1
~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since shmfs/tmpfs, ramfs is a memory file system, its size can
be as high as the maximum allowable VM size which is 64GB.
SGA MAX Size 62GB theoretic (depending on block size)
Only the buffer cache part of the SGA can take advantage of the additional memory
For RHEL3/4 to use the VLM option to create a very large buffercache,
you have two options (details on [4]Note 262004.1):
* Use shmfs/tmpfs much as you would in RHAS2.1:
mount a shmfs with a certain size to /dev/shm, and
set the correct permissions.
Keep in mind that in RHEL3, shmfs allocate memory is pageable.
Better to use tmpfs since there is no need to specify size
Mount shmfs:
# mount -t shm shmfs -o size=20g /dev/shm
Edit /etc/fstab:
shmfs/dev/shmshmsize=20g00
---- OR ----
Mount tmpfs:
# mount –t tmpfs tmpfs /dev/shm
Edit /etc/fstab:
none/dev/shmtmpfsdefaults00
* Use ramfs (Ref. [4], [5]):
ramfs is similar to shmfs, except that pages are not pageable/swappable.
This approach provides the commonly desired effect.
Ramfs is created by
# mount -t ramfs ramfs /dev/shm (unmount /dev/shm first).
The only difference here is that the ramfs pages are not backed by big pages.
* When the shmfs/tmpfs,ramfs is available,
Oracle server should know whether to use it or not
Need to use the parameter 'use_indirect_data_buffers=true' remains the same;
If any one of DB_CACHE_SIZE, DB_xK_CACHE_SIZE are set,
convert them to DB_BLOCK_BUFFERS
. How to use the memory file system shmfs in short,
for details see [1]Note 211424.1:
- Mount the shmfs file system as root using command:
% mount -t shm shmfs -o nr_blocks=8388608 /dev/shm
- Set the shmmax parameter to half of RAM size at most 4294967296
$ echo 4294967296 >/proc/sys/kernel/shmmax
- Set the init.ora parameter use_indirect_data_buffers=true
- Startup oracle.
. How to use the memory file system ramfs in short,
for details see [4]Note 262004.1:
- Mount the shmfs file system as root using command:
% umount /dev/shm
% mount -t ramfs ramfs /dev/shm
% chown oracle:dba /dev/shm
- Increase the "max locked memory" ulimit (ulimit -l)
Add the following to /etc/security/limits.conf:
oracle soft memlock 3145728
oracle hard memlock 3145728
(in case of ssh see details on [4]Note 262004.1)
- Set the init.ora parameter use_indirect_data_buffers=true
- Startup oracle.
CONCLUSION
----------
A process on a 32 bit platform can not address more than 4Gb directly.
This means we can only get 1Gb more if we use RHEL 3.0 with 4gb/4gb split kernel.
If we need larger shared pool (large Buffer Cache) We have to use VLM,
this means shmfs and indirect buffers.
With a <4GB SGA you should use the VLM configuration
not hugemem (4gb/4gb split kernel) due to some overhead issues
What is the reason to use a 4/4 kernel. Beside of the addressing problem, we
have a problem in kernel memory if we have a lot of memroy. We need to build
a PTE for all available memory pages and this restricts the memory we can
address. Typically the kernel memory of 1gb is copletely used if we reach
24gb of memory. To overcome this problem we need a large kernel memory and
this is the effect of a 4/4 kernel.
What are the advantages for oracle? We can use VLM shmfs/ramfs only for block
buffer cache. By default we use 512mb to manage VLM. This means on standard
we have only 1.2 GB left for shared_pool, large_pool, java_pool_size and
this could be too small. If we use a 4/4 kernel, we have another GB for these pools.
Another point is how we manage VLM. If we use shmfs we have to map
additional pages into PTE and this also increase the PageTable. On a
standard kernel, it could happen that we can not allocate all availale
memory for block buffer cache because of PageTable size problems.
SUMMARY TABLE
-------------
Kernel Naming Version
- RHAS2.1 for ia32
2.4.9-e.XX Uniprocessor kernel
2.4.9-e.XX-smp kernel capable of handling up to 4GB of physical memory
2.4.9-e.XX—enterprise kernel capable of handling up to about 16GB of physical memory
- RHEL3 for ia32
2.4.21-XX.EL—Uniprocessor kernel
2.4.21-XX.ELsmp kernel capable of handling up to 16GB of physical memory
2.4.21-XX.ELhugemem kernel capable of handling beyond 16GB, up to 64GB
(XX = number of the errata kernel)
The other difference with the hugemem kernel is that the kernel
and userspace address spaces are split 4GB/4GB, meaning that with the hugemem kernel, a userspace program has access to its 4GB.
With the smp kernel, the default SGA size is the same as in RHAS2.1.
However, using the hugemem kernel allows you to create an SGA of up to 3.7GB
without having to use the VLM option.
|---------------------------------------------------------------------------------|
| RedHat | Kernel Version | RAM | Method | Max SGA (*) |Notes|
|---------------------------|-----|---------------------------|-------------|-----|
| AS 2.1 | all | <4Gb| none - default | 1.7 GB | |
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 2.1 | all | <4Gb| lower mapped Base Address | 2.7 GB |[1,2]|
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 2.1 | Enterprise | >4Gb| lower mapped Base Address | 2.7 GB |[1,2]|
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 2.1 | all | <4Gb| in-memory filesystem | 2.7 GB | [1] |
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 2.1 | Enterprise | 16Gb| in-memory filesystem | 14 GB (**) | [1] |
|========|==================|=====|===========================|=============|===
==|
|========|==================|=====|===========================|=============|===
==|
| AS 3/4 | Uniprocessor/SMP | <4Gb| none - default | 1.7 GB | |
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 3/4 | Uniprocessor/SMP | <4Gb| low SGA attach address | 2
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 3/4 | Uniprocessor/SMP | <4Gb| in-memory filesystem | 2.7 GB |[4,9]|
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 3/4 | smp—SMP | 16Gb| in-memory filesystem | 14 GB (**) |[4,9]|
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 3/4 | hugemem—SMP | <4Gb| none(***) | 2.7 GB | |
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 3/4 | hugemem—SMP | <4Gb|low SGA attach address(***)| 3.42 GB | |
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 3/4 | hugemem—SMP | >4Gb|low SGA attach address(***)| 3.42 GB | |
|--------|------------------|-----|---------------------------|-------------|-----|
| AS 3/4 | hugemem—SMP | 64Gb| in-memory filesystem | 62 GB (*) |[4,9]|
|---------------------------------------------------------------------------------|
(*) theoretic depending on phisical RAM and on block size
(**) depending on block size
(***) +4gb/4gb split, with a <4GB SGA you should use the VLM configuration not hugemem
due to some overhead issues
Note: on RH AS 2.1 It's possible to use MAX 16Gb of RAM
RELATED DOCUMENTS
-----------------
[1] Note 211424.1
How to Enable a Large SGA(over 1.7GB) on RedHat Advanced Server 2.1 - An Overview
[2] Note 200266.1
Increasing usable address space for Oracle on 32-bit Linux
[3] Note 261889.1
Bigpages vs. Hugetlb on RedHat Linux
[4] Note 262004.1
Configuring RHEL 3 and Oracle 9iR2 with Hugetlb and Remap_file_pages
[5] Note 259772.1
Red Hat Release 3.0; Advantages for Oracle
[6] Note 264236.1
Big Performance Degradation Using 'hugemem' Kernel vs SMP Kernel
[7] Note 275318.1
The Bigpages Feature on Linux
[8] Note 270382.1
HowTo Verify the Current SGA Attach Address on Linux
[9]
Note 317055.1
How to Configure RHEL 3.0 for Very Large Memory with ramfs and hugepages
[10]
Note 317141.1
How to Configure RHEL 4.0 for Very Large Memory with ramfs and hugepages
The following two white papers are available from OTN:
- Oracle9iR2 on Linux: Performance, Reliability and Manageability
Enhancements on Red Hat Linux Advanced Server 2.1
---> http://otn.oracle.com/tech/linux/pdf/9iR2-on-Linux-Tech-WP-Final.PDF
- Linux Virtual Memory in Red Hat Linux Advanced Server 2.f/linuxVM-WP-022 REFERENCE
.
Copyright © 2005, Oracle. All rights reserved. Legal Notices and Terms of Use.
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/509190/viewspace-837314/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/509190/viewspace-837314/