Data Guard 10G Best Practices and Tuning
Brent Bigonger, Senior Database Administrator - ServerCare Inc.
Natik Ameen, Senior Database Administrator - ServerCare Inc.
Glossary Of Terms
FAL The Fetch Archive Log process on the primary database which provides a client-server mechanism with the standby database to resolve gaps during communication failures.
LNS Log Writer Network Server process used by the Log Writer process to transmit the redo date to the standby database.
LSP The Logical Standby Process applies the SQL redo to the logical standby.
MRP The Managed Recovery Process on the standby applies the redo data to the physical standby database.
RFS This Remote File Server process on the standby database receives redo data from the primary.
SDU The Session Data Unit is the buffer which is used by Oracle Net to encapsulate the data before transmitting across the network.
ARCH and LGWR Redo transport services use archiver processes (ARCn) or the log writer process (LGWR) to collect transaction redo data and transmit it to standby destinations.
SYNC and ASYNC Network I/O to be done synchronously or asynchronously when using the log writer process (LGWR).
AFFIRM All disk I/O to archived redo log files and standby redo log files is performed synchronously and completes successfully before the log writer process continues.
NOAFFIRM All disk I/O to archived redo log files and standby redo log files is performed asynchronously; the log writer process on the primary database does not wait until the disk I/O completes before continuing.
Architecture
A minimum Data Guard configuration will be composed of a primary database and at least one standby database. These databases are connected via Oracle Net and can be geographically distributed across thousands of miles. Regardless of the configuration, a standby database will be a transitionally consistent copy of the primary database.
With the complexity of an Oracle Data Guard configuration consisting of many different components that rely on one another, performance and optimal setup can play a vital role in the success of a Data Guard implementation. Throughout this best practices and tuning guide, some sections have been summarized for brevity while others have been presented exactly as they appear in various Oracle documents referenced at the end of this document.
Overview of the Modes
Since Data Guard will be implemented for a variety of different reasons, Oracle provides several different modes of data protection that Data Guard can be run in. These different modes are intended to address different business requirements of uptime, performance and data loss. While a bank may require no amount of data loss at all, an online retailer may be more concerned with performance. There are pros and cons to each mode and the business requirements should dictate the appropriate mode of protection.
1. Maximum Performance Mode
Maximum performance mode is the default mode that Data Guard is configured in. The emphasis of this mode is on performance and Oracle takes measures to provide a high level of data protection, without affecting the performance of the primary database. Unlike the other protection modes, redo from the primary database is written asynchronously to the standby database. This ensures that the primary database is able to commit a transaction as soon as it is written to its local online redo log. The redo log is also written to the standby database, but the primary will not wait for this action to be completed. In this way, Data Guard is able to perform. similar to a single database environment.
2. Maximum Availability Mode
In this mode, Data Guard's focus is in the uptime of the primary database. For a transaction to be committed the redo data must be written to the local online redo log and additionally it must also be written to a remote standby redo log. Writing the redo data to both the local online log and remote standby log is done synchronously and Oracle will wait until both have completed. If a problem however prevents the redo from being written to the remote standby log, the primary database will only write the local online log and operate in Maximum Performance mode temporarily until the problem is fixed. During this time, the primary database will continue to operate normally to users. Once the problem is fixed and the "gaps" in redo from primary to secondary have been resolved, the primary database will automatically return to operating in Maximum Availability mode. This mode offers a very high level of data protection while also ensuring the maximum uptime of the primary database.
3. Maximum Protection Mode
Quite simply, this mode guarantees that there will be no data loss if the primary database fails. In this mode, like Maximum Availability Mode, a transaction must be written to both a local online redo log and a remote standby redo log. If a problem occurs that prevents the writing to a remote standby redo log, the primary database will be shut down. In this way, Oracle is able to guarantee that no data loss can occur. The downside of this however is that there is a trade-off with availability.
Comparison
Performance
· Default
· Least impact primary database
· Most choices for configuration options
Availability
· Compromise between Performance and Protection
· Requires data to be written to local and standby redo log for a transaction to commit
· Will not shut down if unable to write to standby redo log
Protection
· Requires data to be written to local and standby redo log for a transaction to commit
· Will shut down if unable to write to standby redo log of at least one standby database
1. Physical Standby (Redo Apply) vs. Logical Standby (SQL Apply)
When designing a Data Guard solution, often one point of confusion can be the differences between a physical standby and a logical standby and the methods they use to apply redo to each. A physical standby is synonymous with using Redo Apply. On the standby system, the database is maintained by applying redo data from its archived redo log files or directly from standby redo log files. A logical standby is maintained using SQL Apply. SQL apply applies data from its archived redo log file or standby redo log file by transforming the data in the log files into SQL statements and then executing the SQL statements on the logical standby database. Both methods have their benefits and drawbacks.
Feature |
Physical Standby |
Logical Standby |
Disaster recovery and high availability |
Yes |
Yes |
Data protection |
Yes |
Yes |
Performance |
Most Efficient - Redo Apply bypass all SQL level code layers |
Redo must be converted to SQL before it is applied |
Reduction in primary database workload |
Limited read only reporting (no applying of redo) |
Unrestricted read only reporting |
Efficient use of standby hardware resources |
Limited read reporting |
Hosting additional database schemas with unrestricted read/write. |
Data type restrictions |
No restrictions |
Does not include LONG, LOB LONG RAW, object type and collections |
Rolling upgrades |
Not available |
Yes |
2. ARCn Redo Transport vs. LGWR Redo Transport
Regardless of the type of standby database used, there are two choices for transmitting data to the standby destination. By default, redo transport services use ARCn processes to archive the online redo log files on the primary database. After a log switch on the primary database, an ARCn process transmits redo from the local archived redo log files to the remote standby destination. This ARCn type of processing only supports the maximum performance mode of data protection. In order to use the maximum protection or maximum availability modes of data protection, the LGWR method of redo transport must be used.
3. Overview
|
Maximum Protection |
Maximum Availability |
Maximum Performance |
Redo archival process |
LGWR |
LGWR |
LGWR or ARCH |
Network transmission mode |
SYNC |
SYNC |
SYNC or ASYNC when using LGWR process. SYNC if using ARCH process |
Disk write option |
AFFIRM |
AFFIRM |
AFFIRM or NOAFFIRM |
Standby redo log required? |
Yes |
Yes |
No, but it is recommended |
Monitoring and Tuning Areas
1. Total Traffic
The use of 'netstat -i' can be used to see the total volume of network traffic. With the 'watch' command, you will be able to see how much data is being passed for a specified amount of time, in this case every two seconds.
watch netstat -i
Every 2.0s: netstat -i Mon Mar 12 21:34:44 2007
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 10695277 0 0 0 8978945 0 0 0 BMRU
lo 16436 0 24955 0 0 0 24955 0 0 0 LRU
Every 2.0s: netstat -i Mon Mar 12 21:34:46 2007
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 10695799 0 0 0 8979103 0 0 0 BMRU
lo 16436 0 24955 0 0 0 24955 0 0 0 LRU
2. Disk I/O Throughput
Although this is certainly not a definitive test, you can use 'dd' to get an approximation of overall disk speed:
time dd if=/dev/zero f=/home/bbigonger/bb_test.zero bs=1024k count=1000
1000+0 records in
1000+0 records out
real 0m19.949s
user 0m0.005s
sys 0m5.595s
ls -lh /home/bbigonger
total 1000M
-rw-r--r-- 1 root root 1000M Mar 12 16:33 bb_test.zero
Use iostat to look at disk activity:
iostat -xtd hda 3 3
Linux 2.6.9-42.0.3.EL (shag) 03/12/2007
Time: 04:30:47 PM
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
hda 0.02 1.60 0.37 0.19 36.99 14.27 18.50 7.13 91.52 0.00 7.00 2.15 0.12
Time: 04:30:54 PM
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
hda 0.00 10560.96 0.42 51.63 3.39 84923.34 1.70 42461.67 1631.61 99.42 1784.11 18.74 97.54
Time: 04:30:57 PM
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
hda 0.00 14476.25 0.00 77.26 2.68 116484.28 1.34 58242.14 1507.77 199.43 1565.11 12.82 99.03
Alternatively, this information can be obtained using Enterprise Manager, AWR reports or performance views such as V$SYSTEM_EVENT, V$ASM_DISK and V$OSSTAT.
Metrics for Redo Log Transport
The first step in any performance tuning exercise is to gather a good baseline. Before implementing Data Guard transport services, we should know redo volume will be written during normal and peak loads. Oracle 10g's AWR reports are a good way to gather a baseline at both normal and peak workloads. It is recommended to reduce the AWR automatic snapshot interval to about 10-20 minutes to gather performance peaks during stress testing or peak loads. Normally however, a 60 minute interval should be sufficient.
1. AWR key items
The following will give you a good indication of production database throughput:
· Redo volume – the amount of redo bytes generated during this report
· Transactions – Transactions Per Second (TPS) for the report.
· Redo writes – Number of redo writes made during this report
When the ‘redo volume’ is divided by the ‘redo writes’, it will give the average redo write size in bytes. This can be an important metric if you are analyzing LGWR SYNC or ASYNC performance.
2. V$SYSMETRIC_HISTORY key indicators
You can also measure production database response time by examining the V$SYSMETRIC_HISTORY view:
· Response Time Per Txn - Response time for transactions
· SQL Service Response Time - Response time per user call in centiseconds
· Database Time Per Sec - DB time in centiseconds / elapsed time in secs
After obtaining a good baseline enable Data Guard and gather AWR reports during both normal and peak operations. These AWR reports can be used to compare against the baseline to derive the production database performance with Data Guard enabled.
3. All Redo Transports
Improvements common to all redo transports:
· It is important to tune standby redo logs for efficient I/O. This minimizes the time the RFS process writes on the standby and thus from slowing down sending processes (LNS and ARCH) on the production database.
· Ensure that standby redo logs are properly placed on the fastest disks.
· Do not multiplex standby redo logs. Remove additional standby redo log members to prevent additional writes.
Network Metrics
One simply way to determine throughput in terms of redo volume is to collect AWR reports during normal and peak workload and determine the number of bytes per second of redo data the production database is producing. For example, if the application is producing 3MB/sec of redo data during peak periods, the network link between the primary and standby databases should be able to transmit a minimum of 3MB/sec of network bandwidth. Network bandwidth can be described in terms of Megabits/second, (Mbps).
Using 3MB/sec, the network throughput computes to 25.2Mbps (1048576 bytes in 1MB, 8 bits in 1 byte and 1 million bits in a megabit, equals 25.2Mbps). This volume is well beyond the capacity of a T1 link which is approximately 1.544 Mbps. This would fall into the range of a T3/DS-3 link, which can be 44.7Mbps.
Other characteristics of the network should also be taken into consideration. These may include the overhead caused by network acknowledgements, network latency, and other factors. Their impact will be unique to your network and will reduce the actual network throughput that you will be able to achieve.
Another consideration in determining bandwidth requirements should be the Data Guard protection mode and transport service utilized. For example, LGWR SYNC synchronous transport will impact production database performance if there is insufficient bandwidth available to handle redo data generation rates as the production database must wait for a response from the secondary site before a commit is complete. The Data Guard LGWR ASYNC transport, in contrast, uses online redo logs to buffer peaks in redo generation that may temporarily exceed the maximum network throughput that can be achieved without impacting production database performance.
Optimize Network
Overall, the goal for all Data Guard configurations is to ship redo data to the remote disaster recovery site fast enough to meet recovery time and recovery point objectives. If there is insufficient bandwidth available to handle the required volume however, no amount of tuning can achieve this goal. In order to figure out how much bandwidth is needed, the volume of data that is generated by the production database will need to be determined. Ideally, this can be found by measuring an existing production database or a database which has been set up in a test environment.
1. SDU (Session Data Unit)
Before sending data across the network, Oracle Net buffers data into the Session Data Unit (SDU). When large amounts of data are being transmitted or when the message size is consistent, increasing the size of the SDU buffer can improve performance and network utilization. You can configure SDU size within an Oracle Net connect descriptor or globally within the sqlnet.ora.
For Data Guard broker configurations configure the DEFAULT_SDU_SIZE parameter in the sqlnet.ora file:
DEFAULT_SDU_SIZE=32767
For non Data Guard broker configurations that use a connect descriptor, you can override the current settings in the primary database sqlnet.ora file. If you are setting the SDU in a connect descriptor you must use a static SID. Using a dynamic service registration will use the default SDU size defined by DEFAULT_SDU_SIZE. This example uses a connect descriptor with the SDU parameter in the description.
sales.us.acme.com=
(DESCRIPTION=
(SDU=32767)
(ADDRESS=(PROTOCOL=tcp)
(HOST=sales-server)
(PORT=1521))
(CONNECT_DATA=
(SID=sales.us.acme.com))
)
On the standby database, set SDU in the SID_LIST of the listener.ora file:
SID_LIST_listener_name=
(SID_LIST=
(SID_DESC=
(SDU=32767)
(GLOBAL_DBNAME=sales.us.acme.com)
(SID_NAME=sales)
(ORACLE_HOME=/usr/oracle)))
2. TCP socket buffer size
TCP socket buffer settings will control how much network bandwidth can be used. This setting does not depend on the bandwidth available in the network circuit; it is simply how much bandwidth can be used. In order to improve utilization of available bandwidth, socket buffer sizes need to be increased from their default values. When network latency is high, larger socket buffer sizes are needed to fully utilize network bandwidth.
Oracle recommends that the optimal socket buffer size be set to three times the size of the Bandwidth Delay Product (BDP). In order to compute the BDP, the bandwidth of the link and the network Round Trip Time (RTT) are required. RTT is the time required for a network communication to travel from the production database to the standby and back and is measured in milliseconds (ms). Oracle provides the following example, which assumes a gigabit network link with a RTT of 25 ms:
BDP= 1,000 Mbps * 25msec (.025 sec)
1,000,000,000 * .025
25,000,000 Megabits / 8 = 3,125,000 bytes
Given this example, the optimal send and receive socket buffer sizes are calculated as follows:
socket buffer size = 3 * bandwidth * delay
= 3,125,000 * 3
= 9,375,000 bytes
The size of the socket buffers can be set at the operating system level or at the Oracle Net level. Since socket buffer size requirements can become quite large, depending on network conditions, it is probably more appropriate and recommended to set them at the Oracle Net level so that normal TCP sessions do not use additional memory. Some operating systems have parameters that set the maximum size for all send and receive socket buffers. This means you must ensure that these values have been adjusted in order to allow Oracle Net to use a larger socket buffer size.
For Data Guard broker or Enterprise Manager configurations, configure the sqlnet.ora file to reflect the desired send and receive buffer sizes. For example:
standby =
(DESCRIPTION=
(SEND_BUF_SIZE=9375000)
(RECV_BUF_SIZE=9375000)
(ADDRESS=(PROTOCOL=tcp)
(HOST=hr1-server)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=standby)))
The socket buffer sizes must be configured at all sites within a Data Guard configuration. On a standby database this can be accomplished within either the sqlnet.ora or listener.ora file.
In the listener.ora file, you can either specify the buffer space parameters for a particular protocol address or for a description.
LISTENER=
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)
(HOST=sales-server)(PORT=1521)
(SEND_BUF_SIZE=9375000)
(RECV_BUF_SIZE=9375000)))
3. Network queues sizes
A system's network queues sizes can also be adjusted to optimize performance. You can regulate the size of the queue between the kernel network subsystems and the driver for network interface card. Any queue should be sized so that losses do not occur due to local buffer overflows. Therefore, careful tuning is required to ensure that the sizes of the queues are optimal for your network connection, particularly for high bandwidth networks.
These settings are especially important for TCP because losses on local queues cause TCP to fall into congestion control, which limits the TCP sending rates.
For Linux there are two queues to consider, the interface transmit queue and the network receive queue. The transmit queue size is configured with the network interface option txqueuelen. The network receive queue size is configured with the kernel parameter netdev_max_backlog. For example:
echo 20000 > /proc/sys/net/core/netdev_max_backlog
echo 1 > /proc/sys/net/ipv4/route/flush
ifconfig eth0 txqueuelen 10000
The default value of 100 for txqueuelen is usually inadequate for long-distance, high-throughput network links. For example, a gigabit network with a latency of 100ms would benefit from a txqueuelen of at least 10000.
4. Overall Network
· Ensure bandwidth is sufficient for the volume of redo data to be shipped to the standby location.
· Set the Oracle Net RECV_BUF_SIZE and SEND_BUF_SIZE parameters equal to 3 times the Bandwidth Delay Product (BDP). This will produce the largest increase in network throughput.
· Use an Oracle Net Session Data Unit (SDU) size of 32767.
· Increase the default send and receive queue sizes (TXQUEUELENGTH and NET_DEV_MAX_BACKLOG) associated with networking devices as explained below. As a proactive measure to prepare for future role transitions it is helpful to change both parameters on the production and all standby databases.
· Ensure that the Oracle Net TCP_NODELAY parameter is set to YES, which is the default value.
From Oracle's Testing:
The following table shows network improvements seen from Oracle's testing based on the adjustments related to TCP socket buffer sizes and network device queue sizes.
Testing Stage |
Test duration (seconds) |
Amount of data transferred |
Network throughput achieved (Megabits/sec) |
% change |
Prior to tuning |
60 |
77.2 MB |
10.8 Mbps |
N/A |
After increasing network socket buffer size to 3*BDP from default of 16K |
60 |
5.11 GB |
731.0 Mbps |
665% improvement over baseline prior to tuning |
After above adjustment and increase of device queue lengths to 1,000 from default of 100 |
60 |
6.55 GB |
937.0 Mbps |
28% improvement |
Optimize ARCn Redo Transport
When ARCH redo transport services are configured the local archive will complete first and a different way.
ARCn redo transport occurs in the following sequence:
1. Read 10 megabytes from the local archive log and issue a network send to the RFS process on the standby
2. The RFS process receives the redo sent by the ARCH process and performs I/O into either the standby redo log or archive redo logs, depending upon how the user has configured Data Guard
3. Once the I/O has completed the RFS sends an acknowledgement back to ARCH
4. ARCH reads the next 10 megabytes and then repeats the above process
As with LGWR SYNC/ASYNC, it is important to have efficient disk I/O on the standby and primary flash recovery area as well as properly sized network socket buffers. The same tuning items performed for ASYNC should be performed for ARCH. In addition to those tuning items the remote destination should be configured with the MAX_CONNECTIONS attribute set to 5. This will enable parallel ARCH transfer and improve overall throughput for individual archive log files.
1. Max processes
In order to quickly resolve archive log gaps that can occur during extended network or standby database outages it is possible to add additional ARCn processes. Specifying a larger number of processes will also provide enough ARCn processes to support remote archiving parallelism, which can be enabled using the MAX_CONNECTIONS attribute.
The LOG_ARCHIVE_MAX_PROCESSES initialization parameter can be also be set as high as 30. With Oracle 10g R2 default is 2. It is almost always a good idea to increase this level beyond the default when using Data Guard. This is because ARCH processes may be consumed by local archival, by resolving archive log gaps at the standby or by normal remote archival if Redo Transport Services have been configured to utilize ARCn transport.
Increase the LOG_ARCHIVE_MAX_PROCESSES value to a minimum level needed to accommodate what you configure for MAX_CONNECTIONS. If you have the bandwidth to support additional redo flow, consider setting LOG_ARCHIVE_MAX_PROCESSES to a value as high as your network can accommodate. This makes it possible to send multiple archive logs in parallel to handle peaks in workload or to more quickly resolve log archive gaps caused by network or standby failures.
However, setting LOG_ARCHIVE_MAX_PROCESSES to a high value may increase contention with other applications that use the same network resources. Therefore, you should consider the impact on other applications when determining the optimal value for LOG_ARCHIVE_MAX_PROCESSES. Determining the optimal value can only be achieved through testing large archive log gap scenarios in your own environment.
2. Max connections
The MAX_CONNECTIONS parameter specifies the maximum number of network connections that will be used to perform. remote archival to the destination. If the MAX_CONNECTIONS attribute has been configured for the remote archive destination then up to five ARCH processes, depending upon the configuration, can participate in sending a single archive log remotely.
Examining the ARCH wait on SENDREQ wait event to assess ARCHs performance can be misleading however. Network activity by the ARCH ping mechanism that checks standby availability is captured by this wait event however and it can lead to misleading numbers. The most reliable method to assess ARCH performance is to enable log_archive_trace to level 128 to get additional messages printed to the alert log. Using the timestamps and the archive log file size you can assess overall throughput.
It is a good idea to set MAX_CONNECTIONS attribute to 2 or higher (on the LOG_ARCHIVE_DEST_n initialization parameter) for all destinations. Doing so enables remote parallel archiving which can significantly reduce the overall time needed to transfer an archive log.
Optimize LGWR Redo Transport
1. NET_TIMEOUT
The NET_TIMEOUT parameter specifies the number of seconds that the LGWR on the production database waits for a response to a LGWR request. In Oracle Database 10g Release 2 the default NET_TIMEOUT value is 180 seconds. This value can be reduced to decrease the impact of network outages on production database performance. Oracle recommends 10 seconds as a minimum to avoid disconnecting from the standby database unnecessarily. Setting this parameter can have a significant impact on the primary database and setting this parameter too low can cause problems. For example, if the parameter were set to 5 seconds and due to network problems or slow a network and responses to a requests take 6 seconds, there will be frequent timeouts that will impact the data protection levels. For max availability mode, this means it will frequently drop down into max performance mode before connectivity is re-established.
2. SYNC Redo Transport
When using LGWR SYNC for transport, commits are not returned to the foreground until the redo for that transaction has been written locally on the primary and remotely on the standby.
SYNC redo transport occurs in the following sequence:
1. LGWR writes redo to the online redo log at the production database.
2. The Data Guard LNS process on the production database performs a network send to the Data Guard RFS process on the standby database.
3. The RFS process receives the redo being sent by LNS and completes the I/O into the standby redo log.
4. RFS process sends acknowledgment back to LNS that the redo has been received and written to disk.
5. LNS posts LGWR that the all the redo has been successfully received and written to disk by the standby.
Since LGWR SYNC is greatly affected by I/O on the standby, by knowing the LNS send size you can begin to test network and I/O for time spent in various stages. To determine the network send size for LNS run the query listed below on the production database. To get the average LNS write size divide TOTAL_WRITE by WRITE_COUNT.
SELECT OPEN_COUNT, CLOSE_COUNT, WRITE_COUNT, MINIMUM_WRITE, MAXIMUM_WRITE, TOTAL_WRITE, LOGS_SKIPPED, TERMINATIONS FROM X$KCRRASTATS WHERE TOTAL_WRITE > 0;
Although LGWR SYNC will wait for confirmation that redo has been successfully received from the standby as described above, the parameter COMMIT NOWAIT can be used so that commits are returned to the application without waiting for redo to be written to disk. If applications or transactions are able to utilize COMMIT NOWAIT, significant improvement in response time and database throughput can be seen. This means that even though control is given back to the application, a commit is not guaranteed as it is with a COMMIT WAIT, which is the default COMMIT.
From Oracle's Testing:
For users who wish to greatly reduce the impact of LGWR SYNC on production database performance, evaluate if your application or parts of your application can utilize COMMIT NOWAIT or non-durable commits. With COMMIT NOWAIT IMMEDIATE, the foreground process posts LGWR to do the log I/O but does not wait for the redo to be written. With COMMIT NOWAIT BATCH, the foreground simply returns. In both cases the foreground does not have to wait for the completion of the local LGWR write or for the redo to be received by the standby prior to resuming work. With LGWR SYNC enabled, COMMIT NOWAIT BATCH compared with the default COMMIT IMMEDIATE WAIT we see the following in a sample OLTP workload on Real Application Clusters:
· Increased production redo rate by 10-35%
· Increased user calls rate by 10-35%
· Increased production txn rate by 10-33%
· Reduced user call response time by 92%
· Reduced txn response time by 90%
This option may be ideal for customers who can tolerate non-durable commits in the case of instance or database failures. Examples of these type of applications may be shopping cart applications, but not for the purchase transaction or sampling or tracking systems for trends or maybe Data Warehouse or Data Mart applications.
3. ASYNC Redo Transport
LGWR ASYNC is an asynchronous transport service where user commits do not have to wait for redo to be sent remotely.
ASYNC redo transport occurs in the following sequence:
1. LGWR writes redo to the online redo log at the production database
2. The Data Guard LNS process on the production database reads the online redo log and performs a network send to the Data Guard RFS process on the standby database
3. The RFS process receives the redo being sent by LNS
4. The RFS process sends acknowledgement back to LNS that the redo has been received
5. The RFS Process writes the redo to a standby redo log
While production database overhead and the impact of tuning LGWR ASYNC is less than LGWR SYNC, it is still important to have efficient network transfer to the standby to avoid any delay in shipping redo and minimize potential data loss.
4. Additional tuning parameter suggestions
· On the standby system increase /proc/sys/fs/aio-max-size to 1048576 from the default of 131072.
· Remove second logfile members for the standby log file groups
· Add additional disks to the ASM diskgroup on the standby
· Set RECV_BUF_SIZE and SEND_BUF_SIZE to 3 times the bandwidth delay product BDP
· Increase device queue sizes associated with the network interfaces from the default of 100 to 10,000
· This can be accomplished on Linux by 'ifconfig eth0 txqueuelen 10000'
Optimize Checkpoint
Checkpoint Phase on Primary
On the primary database, the checkpoint rate needs to be monitored and the log group size adjusted to ensure that these do not place excessive burden on the system. The checkpoint occurs whenever there is either a log switch, expiration of the LOG_CHECK_TIMEOUT or when the LOG_CHECKOUT_INTERVAL has been reached.
Log Group
To reduce the log switch interval it is generally recommended to resize the redo log file to 1GB on both primary and secondary. Ideally it is recommended that the checkpoint occur every 15 minutes. This will reduce the repeated updating of the file headers which occurs during the switch. Using the query below, we can determine the frequency of the checkpoint by comparing the output over a period of time, to ensure that the checkpoint does not occur too often.
COL NAME FOR A35;
SELECT NAME, VALUE, TO_CHAR(SYSDATE, ‘HH:MI:SS’) TIME
FROM V$SYSSTAT
WHERE NAME = 'DBWR checkpoints';
NAME VALUE TIME
----------------------------------- ---------- --------
DBWR checkpoints 264 08:15:43
SQL> /
NAME VALUE TIME
----------------------------------- ---------- --------
DBWR checkpoints 267 08:34:06
It is also recommended that the LOG_BUFFER parameter be set to a minimum of 8MB. This will ensure that the database allocates maximum memory for writing Flashback Database Logs.
Optimize Redo Read, Apply and Recovery
1. Block Checking and Checksum
While DB_BLOCK_CHECKSUM is set to true by default, DB_BLOCK_CHECKING is not on by default. Although DB_BLOCK_CHECKSUM will catch most block corruptions, Oracle recommends turning on DB_BLOCK_CHECKING on the primary database and also on the secondary, if the secondary meets performance expectations. It can be set to LOW, MEDIUM or FULL and will have a performance impact on the database. Oracle estimates the impact between one and 10 percent, so be cautious.
2. Log Read I/O rate
On the standby database, an important step in the redo apply phase is the reading of these redo logs into the database buffer cache. Due to this, the read rate is an important metric in the redo apply phase. If the read rate is low then this can adversely effect the total recovery time. The maximum redo read rate can be measured using the “dd” command.
>time dd if=/u01/oradata/docprd/redo01.log f=/dev/null bs=4096k
32+1 records in
32+1 records out
real 0m0.170s
user 0m0.000s
sys 0m0.170s
Estimated Read Rate (200 MB log file) = (32 * 4 MB) / .170s = 21.76 MB/sec
The read rate for the redo log can also be obtained from the trace dump using the command below.
SQL> ALTER SYSTEM DUMP LOGFILE '/u01/oradata/docprd/redo01.log’ validate;
System altered.
>vi docprd_ora_3560.trc
Mon Mar 12 08:59:52 2007
………………
………………
----- Redo read statistics for thread 1 -----
Read rate (ASYNC): 4527Kb in 0.58s => 6.90 Mb/sec
Longest record: 19Kb, moves: 0/7586 (0%)
Change moves: 4340/18026 (24%), moved: 2Mb
Longest LWN: 92Kb, moves: 1/1365 (0%), moved: 0Mb
Last redo scn: 0x0000.01272351 (19342161)
3. Redo Apply Phase
The time required for the actual application of the redo logs on the standby database will dictate if there is going to be a delay on the standby when the primary fails over to the secondary. Oracle recommends having the # of standby redo logs equal to the sum of all production online log groups for each thread plus the number of threads (instances).
It is also recommended that real-time apply be used. Using real-time apply ensures that redo data is applied to the standby database as soon as it is received.
In order to enable real-time apply for a physical standby database use the following:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
In order to enable real-time apply for a logical standby database use the following:
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
In addition to optimizing the number of standby redo logs and using real-time apply, the rate of redo apply for the standby redo logs should be known and tuned, if necessary. The rate at which the redo logs are applied can be obtained from the steps below which are outlined in the Oracle document 'MAA_WP_10gRecoveryBestPractices.pdf'. The SQL should be run when the actual recovery is taking place to obtain the current recovery rate.
1. First determine Log Block Size (LEBSZ) since it’s different for each operating system. This query only needs to be executed once.
SELECT LEBSZ FROM X$KCCLE WHERE ROWNUM=1;
Summary of max sequence per thread that has been applied
SELECT THREAD#, MAX(SEQUENCE#)
FROM V$LOG_HISTORY GROUP BY THREAD#
2. Next get the recovery blocks applied for at least 2 snapshots.
This query is for Media Recovery Cases (e.g. recover [standby] database) and it describes how many redo blocks have been applied for a specific log sequence#.
SELECT TYPE, ITEM, SOFAR,
TO_CHAR(SYSDATE, ‘DD-MON-YYYY HH:MI:SS’) time from
V$RECOVERY_PROGRESS
WHERE ITEM=’Redo Blocks’ and TOTAL=0;
This query is for Managed Recovery Cases (e.g. recover managed standby database…) and it describes the number redo blocks (block#) for a specific log sequence#
SELECT PROCESS, SEQUENCE#, THREAD#, block#, BLOCKS,
TO_CHAR(SYSDATE, ‘DD-MON-YYYY HH:MI:SS’) time
from v$MANAGED_STANDBY WHERE PROCESS=’MRP0’;
3. To determine the recovery rate (MB/sec) for a specific archive sequence number, use one of these formulas:
Media Recovery Case:
((SOFAR_END – SOFAR_BEG) * LOG_BLOCK_SIZE) /
(TIME_END – TIME_BEG) * 1024 * 1024
Managed Recovery Case:
((BLOCK#_END – BLOCK#_BEG) * LOG_BLOCK_SIZE) /
(TIME_END – TIME_BEG) * 1024 * 1024
Comparison of the redo generated on the primary to the read rate on the standby will provide us information on what we should expect and the need for further tuning if required.
Oracle recommends the following tuning metrics:
Redo Generation Rate vs. Redo Apply Rate |
Recommendation |
2 * Max Primary DB Redo Generation Rate < Redo Apply Rate |
Excellent - No Tuning Required |
Max Primary DB Redo Generation Rate < Redo Apply Rate < 2 * Max Primary DB Redo Generation Rate |
Good - Tuning is Optional |
Avg. Primary Redo Generation Rate < Redo Apply Rate |
OK - Need Tuning |
Avg. Primary Redo Generation Rate > Redo Apply Rate |
Bad - Need Tuning |
4. Recovery Phase
The following are some of the things which can enhance the last stage of the actual recovery on the standby database.
Parallel Recovery
Application of the redo logs on the standby requires reading the blocks into the database buffer cache then the recovery slave processes applies the redo logs to the database. Prior to 10.1.0.5, the PARALLEL option needed to be specified for usage of the parallel recovery by issuing the “RECOVER MANAGED STANDBY DATABASE PARALLEL” command. However in versions after 10.1.0.5, the PARALLEL option is used by default and is equal to the number of CPU’s on the system. The number of the parallel salves however can further be increased by specifying the number of degree in the command as ‘RECOVER MANAGED STANDBY DATABASE PARALLEL 5;’
Parallel Execution Message Size
This parameter controls the size of the buffer which is used to pass the messages between the slaves and the query coordinator. If the message is larger than the default 2k value, it will be passed in chunks, resulting in some performance loss. For most systems modification of the default value for the PARALLEL_EXECUTION_MESSAGE_SIZE parameter to 8k can improve the recovery time tremendously.
Cache Sizes
Setting the DB_CACHE_SIZE to a larger value than then the primary also significantly improves the recovery time as larger number of blocks can be placed in the buffer cache. Setting DB_KEEP_CACHE_SIZE and DB_RECYCLE_CACHE_SIZE to 0 can also help since media recovery does not require these caches or a large SHARED_POOL_SIZE. The memory can be used for an increased DB_CACHE_SIZE.
However before the roles are switched the SGA component size must be reset to the values on the primary database, so if changing these values it should be documented.
Wait Events
Wait events on the primary and the standby database can be used to verify the health of the system and identify areas that can use improvement. Some tuning changes will also help speed up the standby database for switchover or failover operations. The wait events below are specific to each step of the process.
1. Wait Events on the Primary
On the primary database there are two categories of wait events which are either related to the ARC process or the LGWR process. The descriptions of these events are below.
1. Arch process wait events
ARCH wait on ATTACH monitors the amount of time spent by all archiver processes to spawn an RFS connection.
ARCH wait on SENDREQ monitors the amount of time spent by all archiver processes to write the received redo to disk as well as open and close the remote archived redo logs.
ARCH wait on DETACH monitors the amount of time spent by all archiver processes to delete an RFS connection.
2. LGWR SYNC Wait Events
LGWR wait on ATTACH monitors the amount of time spent by all log writer processes to spawn an RFS connection.
LGWR wait on SENDREQ monitors the amount of time spent by all log writer processes to write the received redo to disk as well as open and close the remote archived redo logs.
LGWR wait on DETACH monitors the amount of time spent by all log writer processes to delete an RFS connection.
3. LGWR ASYNC Wait Events
LNS wait on ATTACH monitors the amount of time spent by all network servers to spawn an RFS connection.
LNS wait on SENDREQ monitors the amount of time spent by all network servers to write the received redo to disk as well as open and close the remote archived redo logs.
LNS wait on DETACH monitors the amount of time spent by all network servers to delete an RFS connection.
LGWR wait on full LNS buffer monitors the amount of time spent by the log writer (LGWR) process waiting for the network server (LNS) to free up ASYNC buffer space. If buffer space has not been freed in a reasonable amount of time, availability of the primary database is not compromised by allowing the archiver process (ARCn) to transmit the redo log data. This wait event is not relevant for destinations configured with the LGWR SYNC=PARALLEL attributes.
2. Wait Events on the Secondary
RFS Write is the elapsed time for the write to standby redo log or archive log to occur as well as non I/O work such as redo block checksum validation.
RFS Random I/O is the elapsed time for the write to a standby redo log to occur.
RFS Sequential I/O is the elapsed time for the write to an archive log to occur.
10Gr2 Improvements
·Multiple archive processes can transmit a redo log in parallel to the standby database, reducing the time for the redo transmission to the secondary. The MAX_CONNECTIONS attribute of the LOG_ARCHIVE_DEST_n control the number of these processes. This can be very beneficial during batch loads.
·Parallel Recovery for Redo apply is set to be equal to the number of the CPU’s in 10.1.0.5 and 10.2.0.1. Prior to this, for parallel recovery, “PARALLEL” needed to be specified.
·Fast-Start Failover automatically fails over to a previously chosen physical standby database, without any intervention. The old primary database is automatically reconfigured as a new standby database once it reconnects to the Data Guard configuration.
·Asynchronous redo transmission using LGWR ASYNC uses a new process, LNSn, to transmit the redo data directly from the online redo log to the standby database. Previously the LGWR process was responsible for transferring this redo data to the standby destination; however this was done at the cost of some performance. With this new separate LNSn process, the LGWR is able to continue writing redo data to the online redo logs without having to process the redo transmission.
·A physical database can be flashed back temporarily for reporting or testing purposes, to a particular SCN number or a time in the past. After completion of reporting or testing, the database can again be placed in the recovery mode, automatically retrieving the redo logs from the primary and applying them.
·A Logical standby database automatically deletes the archived log after they have been applied by the SQL apply process.
·RMAN automatically creates temporary datafiles after recovery.
Best Practices
1. General
· Ensure the standby hardware configuration is the same as the primary database. The secondary however needs to be tuned for more write intensive operations.
· Before deploying Data Guard in production, perform. a load test and obtain bench marks on the largest volume of redo, paying particular attention to the network and the I/O performance of the storage.
· Verify that the parameter settings for the database and the OS are configured to their recommended values.
· Perform. switchover testing and fully document a failover procedure. This will alleviate confusion during a stressful time.
· Use FORCE LOGGING mode to ensure that all database data changes are logged and the standby remains consistent with the production.
· Use real-time apply so that redo data is applied to the standby database as soon as it is received.
· Use the Data Guard Broker to create, manage and monitor the Data Guard configuration.
· Enable Flashback Database on both primary and secondary databases. Doing this will ensure that the old primary database can be easily reinstated as a new standby database following a failover.
· Consider not using the “AFFIRM” redo transport parameter. The AFFRIM attribute ensures that the I/O on the standby is synchronously completed before the control is returned to the primary, however if AFFIRM is used, it can cause performance issues on the primary.
· Disk I/O can be improved by configuring Oracle to use asynchronous I/O. For older OSs, this may need to be implemented by installing operating system-specific patches. The throughput of the disks can be measured using the dd command.
· The number of standby redo logs should be equal to the sum of all online log groups for each thread (instance) plus the number of threads.
· Do not multiplex the redo logs on the standby database.
· Setting the parameter PARALLEL_EXECUTION_MESSAGE_SIZE to 8192 dramatically increases the performance of the parallel recovery.
· Consider setting turning on DB_BLOCK_CHECKING on primary and possibly secondary, although there may be a minor performance impact.
· Ensure that standby redo logs are properly placed within the fast disk group or on the fastest disks.
· The standby database site should be geographically separated from the primary database site.
· Do not use Maximum Protection Mode unless you have at least two standby databases.
· If applications or transactions are able to utilize COMMIT NOWAIT, significant improvement in response time and database throughput can be seen.
2. Network
· Ensure that there is appropriate bandwidth, between the primary and the secondary. OS commands like ifconfig and the netstat can be used to obtain the statistics on the network performance. High values for “errors” and “dropped” packets indicate a problem in the network.
· Increase the default send and receive queue sizes associated with the networking devices. The TXQUEUELENGTH network option should be increased on the sending side and NET_DEV_MAX_BACKLOG on the receiving side. Incorrect sizing of the device queue, can cause loss of the data, due to buffer overflow which then triggers retransmission of data. This repeated retransmission of data can cause network saturation, resource consumption and response delays.
· Session Data Unit (SDU) is the buffer which is used by Oracle Net to encapsulate the data, before sending it across the network. Optimizing the size of this can improve the performance and reduce the network utilization. Adjustment of the value to 32767 can show performance improvement during large data transmissions. Both the hosts and the clients must be set to this value adjusted for the modified value to be used.
Conclusion
The performance of the Oracle 10g Data Guard product has been enhanced greatly, especially with Oracle's newest releases. These changes described have been made to the different stages from areas such as the redo transmission process to the recovery of the standby. This has enabled failovers and switchovers to be completed in as short of a time as 15 seconds and 1 minute respectively. However, for efficient use of Data Guard, all areas of the system including that of the OS, Network, database parameters as well as monitoring of wait events is required.
Speaker’s Background
Brent Bigonger has been involved with Oracle for many years and has been a production Oracle DBA for the past three years. He graduated from San Diego State University in 2001 with a B.S. in computer science. His most recent experience includes working with 10g's high availability features like RAC and Data Guard. He has been involved in both the creation and management of HA systems. His background also includes many years as a system administrator. He has worked in environments from start-ups to Fortune 500 companies.
Natik Ameen has been managing Oracle databases for over 7 years and has worked in the Gaming, Aviation, Finance and Web hosting industries. He specifically has considerable experience in managing 10g RAC and standby databases. He holds the Oracle Certified Professional Certification in 8/8i/9i/10g and has taught the Oracle Certification programs at Washington State University, VA. Currently he works for ServerCare Inc. as a Senior Oracle DBA, which offers a wide range of remote database and system administration services.
References
· Oracle Database 10g Best Practices: Data Guard Redo Apply and Media Recovery. September 2005
· Data Guard Redo Transport & Network Best Practices Oracle Database 10g Release 2. February 2007
· Oracle Data Guard in Oracle Database 10g Release 2: Business Continuity for the Enterprise. November 2006
· Oracle Data Guard Concepts and Administration Oracle.Com. March. 2006. Mar. 2007
.
Bibliography
· To, Lawrence, High Availability Systems Team, Vinay Srihari, and Recovery Team. Oracle.Com. Sept. 2005. Mar. 2007 .
· Smith, Michael T.. Oracle.Com. Feb. 2007. Mar. 2007 .
· Smith, Michael. Oracle.Com. May 2004. Mar. 2007 .
· Ray, Ashish. Oracle.Com. Nov.d 2006. Mar. 2007 .
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/280958/viewspace-723441/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/280958/viewspace-723441/