The interconnectin an Oracle RAC environment is the backbone of your cluster. A highly performing, reliable interconnect is a crucial ingredient in making Cache Fusion perform well. Remember that theassumption in most cases is that a read from another node’s memory via the interconnect is much faster than a read from disk—except perhaps for Solid State Disks (SSDs).(译文:在大部分情况下,假设的前提是从另一个节点的内存上读取数据并借由内部网络传送数据到自己节点的内存上,要比从磁盘读取的快,除非该磁盘是固态硬盘)The interconnect is used to transfer data and messages among the instances.(说明了interconnect 的作用) An Oracle RAC cluster requires a high-bandwidth solution with low latency wherever possible. If you find that the performance of the interconnect is subpar, your Oracle RAC cluster performance will also most likely be subpar. In the case of subpar performance in an Oracle RAC environment,the interconnect configuration, including both hardware and software, should be one of the first areas you investigate.
You want the fastest possible network to be used for the interconnect. To maximize your speed and efficiency on the interconnect, you should ensure that theUser Datagram Protocol (UDP) buffers are set to the correct values. On Linux, you can check this via the following command:
sysctl net.core.rmem_max net.core.wmem_max net.core.rmem_default net .core.wmem_default net.core.rmem_max = 4194304 net.core.wmem_max = 1048576 net.core.rmem_default = 262144 net.core.wmem_default = 262144
Alternatively, you can read the associated values directly from the respective files in the directory /proc/sys/net/core. These values can be increased via the following SYSCTL commands:
sysctl -w net.core.rmem_max=4194304 sysctl -w net.core.wmem_max=1048576 sysctl -w net.core.rmem_default=262144 sysctl -w net.core.wmem_default=262144
The numbers in this example are the recommended values for Oracle RAC on Linux and are more than sufficient for the majority of configurations. Nevertheless, let’s talk about some background of the UDP buffers. The values determined by rmem_max and wmem_max are on a “per-socket” basis. So if you setrmem_max to 4MB, and you have 400 processes running, each with a socket open for communications in the interconnect, then each of these 400 processes could potentially use 4MB, meaning that the total memory usage could be 1.6GB just for this UDP buffer space. However, this is only “potential” usage. So if rmem_default is set to 1MB andrmem_max is set to 4MB, you know for sure that at least 400MB will be allocated (1MB per socket). Anything more than that will be allocated only as needed, up to the max value.So the total(UDP buffers) memory usage depends on thermem_default, rmem_max, the number of open sockets, and the variable piece of how much buffer space each process is actually using(实际上进程用了多大的buffer,它大小介于rmem_default, rmem_max之间). 【该语句说明UDP buffers大小受哪些因素影响,这些因数相乘即是它的大小】This is an unknown—but it could depend on the network latency or other characteristics of how well the network is performing and how much network load there is altogether. To get the total number of Oracle-related open UDP sockets, you can execute this command:
netstat -anp -udp | grep ora | wc -l
NOTE
Our assumption here is that the UDP is being used for the interconnect. Although that will be true in the vast majority of cases, there are some exceptions. For example, on Windows, TCP is used for Cache Fusion traffic. When InfiniBand is in use (more details on InfiniBand are provided later in the section “Interconnect Hardware”), the Reliable Datagram Sockets (RDS) protocol may be used to enhance the speed of Cache Fusion traffic. However, any other proprietary interconnect protocols are strongly discouraged, so starting with Oracle Database 11g, your primary choices are UDP, TCP (Windows), or RDS (with InfiniBand).
注释:如果RAC使用的操作系统是window,则TCP协议is being used for the interconnect.
InfiniBand 是一个网络设备,它用来提高网络传输速度的。如果RAC的私有网络上使用该设备,则RDS协议is being used for the interconnect.
Another option to increase the performance of your interconnect is the use of jumbo frames.When you use Ethernet, a variable frame(分片或说片段,即一个完整数据包的一个分块) sizeof 46–1500 bytes is the transfer unit used between all Ethernet participants. The upper bound is 1500MTU (Maximum Transmission Unit). Jumbo frames allows the Ethernet frame to exceed the MTU of 1500 bytes up to a maximum of 9000 bytes[这里说明了Jumbo frames技术的作用](on most platforms—though platforms will vary). In Oracle RAC, the setting ofDB_BLOCK_SIZE multiplied by theMULTI_BLOCK_READ_COUNT (即这两参数值的乘积)determines the maximumsize of a message for the global cache, and thePARALLEL_EXECUTION_MESSAGE_SIZE determines the maximum size of a message used in Parallel Query. These message sizes can range from 2K to 64K or more, and hence will get fragmented more so(so,修饰more的副词) with a lower/ default MTU(译文:这些消息报的大小从2k到4k不等,或者更大,因此用默认的MTU或更小 的frame来传送这些消息报的话,要分割这些消息报成更多的frame). Increasing the frame size (by enabling jumbo frames) can improve the performance of the interconnect by reducing the fragmentation when shipping large amounts of data across that wire. A note of caution is in order, however: Not all hardware supports jumbo frames. Therefore, due to differences in specific server and network hardware requirements, jumbo frames must be thoroughly tested before implementation in a production environment.
In addition to the tuning options, you have the opportunity to implement faster hardware such as InfiniBand or 10 Gigabit Ethernet (10 GigE). InfiniBand is available and supported with two options. Reliable Datagram Sockets (RDS) protocol is the preferred option, because it offers up to 30 times the bandwidth advantage and 30 times the latency reduction over Gigabit Ethernet. IP over InfiniBand (IPoIB) is the other option, which does not do as well as RDS, since it uses the standard UDP or TCP, but it does still provide much better bandwidth and much lower latency than Gigabit Ethernet.
Another option to increase the throughput of your interconnect is the implementation of 10 GigE technology, which represents the next level of Ethernet. Although it is becoming increasingly common, note that 10 GigE does require specific certification on a platform-by-platform basis, and as of the writing of this book, it was not yet certified on all platforms. Check with Oracle Support to resolve any certification questions that you may have on your platform.
This article is a complimentary excerpt from Oracle Database 11g Release 2 High Availability, published by McGraw- Hill.Visit mcgrawhill.com to purchase the book.
All information in this article is copyrighted by McGraw-Hill and is reprinted here by express permission of the publisher.
注释:
1、为甚么说该篇文章与并行操作的性能相关?
因为本文是说明如何调整RAC上的 Interconnect Performance(性能),而并行操作跨节点时就会受到 Interconnect Performance(性能)的影响,具体影响如下:
For this cross-node or inter-node parallel execution to perform, the interconnection in the Oracle RAC environment must be size appropriately because inter-node parallel execution may result in a lot of interconnect traffic.
If the interconnection has a considerably lower bandwidth in comparison to the I/O bandwidth from the server to the storage subsystem, it may be better to restrict the parallel execution to a single node or to a limited number of nodes.
译文:如果interconnection所拥有的(传送数据的)带宽远远低于从服务器进程传送数据到存储子系统上的I/O带宽的话,那么提高并行查询最好的方法就是限制并行查询到一个节点上或是指定的几个节点上,不要在所有的节点上运行该并行操作。
参考:http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel002.htm Parallel Execution Using Oracle RAC
2、McGraw-Hill是一个出版社。
3、Oracle Database 11g Release 2 High Availability 百度
InfiniBand 百度
frame 网络 MTU 百度