相较于rhel6/centos6 下的HA配置,rhel7下使用pcs 取代了原先的ccs和luci ,本文就结合pcs+pacemaker+corosyn+oracle做一个oracle的HA配置,并迁移数据共享平台数据库。使用corosync作为集群消息事务层(Massage Layer),pacemaker作为集群资源管理器(Cluster Resource Management),pcs作为CRM的管理接口工具
编辑host文件
pcs01 : 10.228.110.126 pcs02 :10.228.110.128 VIP : 192.168.110.133 |
配置节点间互信,可使用如下脚本及命令配置
./sshUserSetup.sh -user root -hosts "hacluster01 hacluster02" -advanced –noPromptPassphrase |
./sshUserSetup.sh -user root -hosts "hacluster01 hacluster02" -advanced –noPromptPassphrase |
systemctl start firewalld systemctl stop firewalld systemctl disable firewalld 查看防火墙状态 systemctl status firewalld.service 关闭防火墙 systemctl stop firewalld.service 查看开机是否启动防火墙 systemctl is-enabled firewalld.service 开机不启动 systemctl disable firewalld.service
firewall-cmd --permanent --add-service=high-availability # firewall-cmd --permanent --add-service=high-availability # firewall-cmd --add-service=high-availability #firewall-cmd --reload |
vi /etc/selinux/config selinux=disabled setenforce 0 |
sysemctl stop NetworkManager sysemctl disable NetworkManager sysemctl status NetworkManager |
mount /home/patrol/rhel-server-7.4-x86_64-dvd.iso /mnt 或者 直接在IBMC管理口上加载ISO镜像 然后 mount /dev/sr0 /mnt |
[root@pcs02 yum.repos.d]# cat ha.repo [rhel-yum] name=ha2 ---这个名字要一样, baseurl=file:///mnt/addons/HighAvailabilit-----ISO里双机软件包的位置 enabled=1 gpgcheck=0 [ha] name=ha2 baseurl=file:///mnt enabled=1 gpgcheck=0 |
1 使用yum源可以直接安装corosync pacemaker以及pcs软件: yum install pcs pacemaker fence-agents-all yum install lvm2-cluster gfs2-utils ---我是直接把gfs2-utils这个rpm拷贝到系统里 rpm -ivh gfs2-utils.rpm这样安装的这个包, lvm2-cluster这个包没有找到。。。没安装 。
2 在安装linux系统时,选定其实在安装操作系统的时候在软件源那儿,选中“HIGH AVAILABILITY ADD-ON”这个选项,安装系统的时候就安装上双机软件了,这些就不用操作了。Hacluster用户也自动有了。 yum install pcs pacemaker fence-agents-all |
两台机器分别更改hacluster用户密码
passwd hacluster
两台机器密码改成一样的
开启pcsd服务,两台都要开启
systemctl start pcsd systemctl enable pcsd
systemctl start pcsd.service systemctl enable pcsd.service |
pcs cluster auth node1 node2 ... nodeN --语法 [root@z1 ~]# pcs host auth z1.example.com z2.example.com [root@pcs01 patrol]# pcs cluster auth pcs01 pcs02 Username: hacluster Password: pcs01: Authorized pcs02: Authorized [root@pcs01 patrol]#
[root@pcs02 Packages]# pcs cluster auth pcs01 pcs02 pcs01: Already authorized pcs02: Already authorized [root@pcs02 Packages]# |
Create a cluster named mycluster with both nodes as cluster members. This command creates and starts the cluster in one step. You only need to run this from one node in the cluster because pcs configuration commands take affect for the entire cluster. On one node in cluster, run the following command. [root@z1 ~]# pcs cluster setup my_cluster --start z1.example.com z2.example.com # pcs cluster setup mycluster --start pcs01 pcs02 |
https://nodename:2224,或者 nodename换成ip
添加节点、cluster
添加两个节点,然后把两个节点做成一个 cluster
或者pcs cluster setup -name mycluster pcs01 pcs02
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.120 cidr_netmask=24 op monitor interval=30s |
Resource id 资源名称 随意
IP 浮动IP地址
Nic 浮动ip 绑定到哪块网卡上,默认,或者指定
Volgrpname VG名字
添加Filesystem资源配置
Device 文件系统对应的设备 如:/dev/ftpvg/lv_backup
Directory mount点 如:/reportdata.backup
Fstype 文件系统类型 如: ext4
也可以同时配置资源之间的启停依赖顺序关系。
其中oracle资源主要包括tns和rdbms instance两种。
按照如下步骤进行监听资源添加以及测试,图形添加:
type 部分选择oralsnr
resource group 选择分配的组。
resource_id 为资源的名称
sid为tnsnames.ora里面的别名
Home为数据库ORACLE_HOME
User为数据库软件安装用户
tns_admin为$ORACLE_HOME/network/admin
配置完成操作的案例:
数据库资源
按照如下步骤进行数据库资源添加以及测试,图形添加:
Resource_id为资源名称.
SID为数据库实例名称.
Home值为数据库ORACLE_HOME值
User为数据库软件安装用户
以下为实际操作部署的资源信息:
节点的IMM管理口 IP 、用户名、密码、
A Red Hat High Availability cluster requires that you configure fencing for the cluster. The reasons for this requirement are described in Fencing in a Red Hat High Availability Cluster. For this introduction, however, which is intended to show only how to use the basic Pacemaker commands, disable fencing by setting the stonith-enabled cluster option to false. WARNING The use of stonith-enabled=false is completely inappropriate for a production cluster. It tells the cluster to simply pretend that failed nodes are safely powered off. # pcs property set stonith-enabled=false |
Port 节点名
pcs status pcs status –full pcs status resources –full pcs status resources arch pcs status resources dsdb pcs resource show pcs resource show vip |
pcs constraint location ora prefers pcs01=150 pcs constraint location ora prefers pcs02=50 pcs constraint location arch prefers pcs01=150 pcs constraint location arch prefers pcs02=50 pcs constraint location data prefers pcs01=150 pcs constraint location data prefers pcs02=50 pcs constraint location VIP prefers pcs01=150 pcs constraint location VIP prefers pcs02=50 pcs constraint location LVM prefers pcs01=150 pcs constraint location LVM prefers pcs02=50 pcs constraint location lsnrctl prefers pcs01=150 pcs constraint location lsnrctl prefers pcs02=50 pcs constraint location database prefers pcs01=150 pcs constraint location database prefers pcs02=50 |
禁用与启用资源
pcs resource enable resource_id pcs resource disable resource_id --关闭资源,并不再启动资源 |
资源组切换
pcs resource clear dsdb pcs resource move dsdb pcs resource move dsdb pcs01 pcs resource move dsdb pcs02 |
其他命令
1、pcs resource move 资源名/资源组名 --把资源/资源组 移动到另一个节点 2、执行 pcs resource clear pcs resource cleanup WebSite资源名/资源组名 --清楚此资源在此节点上的 location限制,清楚限制之后,此资源/组,才可以在一次切换到这个节点,否则这个节点不能接收资源/组 了。
pcs status 查看cluster 整体状态 pcs status nodes ftp2 显示ftp2节点状态 pcs status groups ftpgroup 显示资源组状态 pcs config 显示所有配置信息 pcs resource show --full 显示所有资源信息
pcs resource show VIP 查看资源信息 fence_ipmilan -a 10.228.230.98 -l USERID -p PASSW0RD -o status 测试 fence 设备
pcs cluster stop [--all] [node] [...] 停止cluster pcs cluster start 启动cluster
使 z1.example.com 進入 standby 模式中 # pcs cluster standby z1.example.com
z1 從 standby 模式中退出 # pcs cluster unstandby z1.example.com |
模块信息如下:
pcs cluster status
如何保障pacemaker corssync以及pcsd均属于enable状态?
systemctl enable corosync pacemaker pcsd
模拟监听故障,手工停止监听程序,则cluster软件(资源参数intervals)过一段时间,会自动启动监听程序。
[oracle@pcs01 trace]$ lsnrctl stop listener_dsdb <--模拟监听故障 LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 09-NOV-2018 17:53:44 Copyright (c) 1991, 2013, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.228.110.133)(PORT=1521))) The command completed successfully [oracle@pcs01 trace]$ ps -ef|grep tns root 183 2 0 11月06 ? 00:00:00 [netns] oracle 4393 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns [oracle@pcs01 trace]$ ps -ef|grep tns root 183 2 0 11月06 ? 00:00:00 [netns] oracle 4468 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns [oracle@pcs01 trace]$ ps -ef|grep tns root 183 2 0 11月06 ? 00:00:00 [netns] oracle 4513 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns [oracle@pcs01 trace]$ ps -ef|grep tns root 183 2 0 11月06 ? 00:00:00 [netns] oracle 4516 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns [oracle@pcs01 trace]$ ps -ef|grep tns root 183 2 0 11月06 ? 00:00:00 [netns] oracle 4777 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns [oracle@pcs01 trace]$ ps -ef|grep tns ß 监听程序已经启动 root 183 2 0 11月06 ? 00:00:00 [netns] oracle 5110 1 0 17:53 ? 00:00:00 /sjgxpt/oracle/product/11.2/db_1/bin/tnslsnr listener_dsdb -inherit oracle 5233 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns [oracle@pcs01 trace]$ ps -ef|grep tns root 183 2 0 11月06 ? 00:00:00 [netns] oracle 5110 1 1 17:53 ? 00:00:00 /sjgxpt/oracle/product/11.2/db_1/bin/tnslsnr listener_dsdb -inherit oracle 5376 46314 0 17:53 pts/0 00:00:00 grep --color=auto tns [oracle@pcs01 trace]$ |
ABORT方式模拟数据库宕库故障
[oracle@pcs01 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Mon Nov 12 11:43:05 2018 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> shutdown abort ORACLE instance shut down. SQL> exit |
数据库alert日志如下:
Mon Nov 12 11:44:05 2018 Shutting down instance (abort) License high water mark = 33 USER (ospid: 8659): terminating the instance Instance terminated by USER, pid = 8659 Mon Nov 12 11:44:07 2018 Instance shutdown complete Mon Nov 12 11:46:03 2018 Adjusting the default value of parameter parallel_max_servers from 1280 to 970 due to the value of parameter processes (1000) Starting ORACLE instance (normal) ************************ Large Pages Information ******************* Per process system memlock (soft) limit = 13 GB
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB) Large Pages unused system wide = 0 (0 KB) Large Pages configured system wide = 0 (0 KB) Large Page size = 2048 KB
RECOMMENDATION: Total System Global Area size is 20 GB. For optimal performance, prior to the next instance restart: 1. Increase the number of unused large pages by at least 10241 (page size 2048 KB, total size 20 GB) system wide to get 100% of the System Global Area allocated with large pages 2. Large pages are automatically locked into physical memory. Increase the per process memlock (soft) limit to at least 20 GB to lock 100% System Global Area's large pages into physical memory ******************************************************************** LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Initial number of CPU is 32 Number of processor cores in the system is 16 Number of processor sockets in the system is 4 Picked latch-free SCN scheme 3 Autotune of undo retention is turned on. IMODE=BR ILAT =168 LICENSE_MAX_USERS = 0 SYS auditing is disabled NUMA system with 8 nodes detected Oracle NUMA support not enabled The parameter _enable_NUMA_support should be set to TRUE to enable Oracle NUMA support Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options. ORACLE_HOME = /sjgxpt/oracle/product/11.2/db_1 System name: Linux Node name: pcs01 Release: 3.10.0-693.el7.x86_64 Version: #1 SMP Thu Jul 6 19:56:57 EDT 2017 Machine: x86_64 Using parameter settings in server-side spfile /sjgxpt/oracle/product/11.2/db_1/dbs/spfiledsdb.ora System parameters with non-default values: processes = 1000 sessions = 1536 sga_target = 20G control_files = "/sjgxpt/oracle/data/dsdb/control01.ctl" control_files = "/sjgxpt/oracle/data/dsdb/control02.ctl" db_block_size = 8192 compatible = "11.2.0.4.0" log_archive_dest_1 = "LOCATION=/sjgxpt/oracle/arch" log_archive_format = "%t_%s_%r.dbf" undo_tablespace = "UNDOTBS1" remote_login_passwordfile= "EXCLUSIVE" db_domain = "" dispatchers = "(PROTOCOL=TCP) (SERVICE=dsdbXDB)" local_listener = "(DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST =10.228.110.133)(PORT = 1521)) (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521)) ) )" java_jit_enabled = TRUE audit_file_dest = "/sjgxpt/oracle/product/admin/dsdb/adump" audit_trail = "DB" db_name = "dsdb" open_cursors = 300 pga_aggregate_target = 12875M diagnostic_dest = "/sjgxpt/oracle/product" Mon Nov 12 11:46:03 2018 PMON started with pid=2, OS id=16298 Mon Nov 12 11:46:04 2018 PSP0 started with pid=3, OS id=16300 Mon Nov 12 11:46:05 2018 VKTM started with pid=4, OS id=16309 at elevated priority VKTM running at (1)millisec precision with DBRM quantum (100)ms Mon Nov 12 11:46:05 2018 GEN0 started with pid=5, OS id=16313 Mon Nov 12 11:46:05 2018 DIAG started with pid=6, OS id=16315 Mon Nov 12 11:46:05 2018 DBRM started with pid=7, OS id=16317 Mon Nov 12 11:46:05 2018 DIA0 started with pid=8, OS id=16319 Mon Nov 12 11:46:05 2018 MMAN started with pid=9, OS id=16321 Mon Nov 12 11:46:05 2018 DBW0 started with pid=10, OS id=16323 Mon Nov 12 11:46:05 2018 DBW1 started with pid=11, OS id=16325 Mon Nov 12 11:46:05 2018 DBW2 started with pid=12, OS id=16327 Mon Nov 12 11:46:05 2018 DBW3 started with pid=13, OS id=16329 Mon Nov 12 11:46:05 2018 LGWR started with pid=14, OS id=16331 Mon Nov 12 11:46:05 2018 CKPT started with pid=15, OS id=16333 Mon Nov 12 11:46:05 2018 SMON started with pid=16, OS id=16335 Mon Nov 12 11:46:05 2018 RECO started with pid=17, OS id=16337 Mon Nov 12 11:46:05 2018 MMON started with pid=18, OS id=16339 Mon Nov 12 11:46:05 2018 MMNL started with pid=19, OS id=16341 starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'... starting up 1 shared server(s) ... ORACLE_BASE not set in environment. It is recommended that ORACLE_BASE be set in the environment Reusing ORACLE_BASE from an earlier startup = /sjgxpt/oracle/product Mon Nov 12 11:46:05 2018 ALTER DATABASE MOUNT Successful mount of redo thread 1, with mount id 605261437 Database mounted in Exclusive Mode Lost write protection disabled Completed: ALTER DATABASE MOUNT Mon Nov 12 11:46:09 2018 alter database open Beginning crash recovery of 1 threads parallel recovery started with 31 processes Started redo scan Completed redo scan read 47 KB redo, 30 data blocks need recovery Started redo application at Thread 1: logseq 250, block 12222 Recovery of Online Redo Log: Thread 1 Group 1 Seq 250 Reading mem 0 Mem# 0: /sjgxpt/oracle/data/dsdb/redo01.log Completed redo application of 0.02MB Completed crash recovery at Thread 1: logseq 250, block 12317, scn 4277695 30 data blocks read, 30 data blocks written, 47 redo k-bytes read LGWR: STARTING ARCH PROCESSES Mon Nov 12 11:46:10 2018 ARC0 started with pid=54, OS id=16641 ARC0: Archival started LGWR: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Thread 1 advanced to log sequence 251 (thread open) Mon Nov 12 11:46:10 2018 ARC1 started with pid=55, OS id=16667 Mon Nov 12 11:46:10 2018 ARC2 started with pid=56, OS id=16683 Mon Nov 12 11:46:10 2018 ARC3 started with pid=57, OS id=16697 ARC1: Archival started ARC2: Archival started ARC1: Becoming the 'no FAL' ARCH ARC1: Becoming the 'no SRL' ARCH ARC2: Becoming the heartbeat ARCH Thread 1 opened at log sequence 251 Current log# 2 seq# 251 mem# 0: /sjgxpt/oracle/data/dsdb/redo02.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Archived Log entry 244 added for thread 1 sequence 250 ID 0x23e8dee0 dest 1: [16498] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:501291664 end:501291704 diff:40 (0 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is AL32UTF8 No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Mon Nov 12 11:46:10 2018 QMNC started with pid=58, OS id=16794 Completed: alter database open Starting background process CJQ0 Mon Nov 12 11:46:11 2018 CJQ0 started with pid=59, OS id=16892 |
主机重启方式进行模拟:
[root@pcs01 ~]# reboot Connection to 10.228.110.126 closed by remote host. Connection to 10.228.110.126 closed. |
如果需要对oracle数据库进行维护,包括停止数据库或者监听等需要采用维护模式,避免资源重启以及切换:
进入维护模式:pcs resource unmanage resource1
解除维护模式 pcs resource manage resource1
资源状态察看pcs状态
[root@pcs01 ~]# pcs status Cluster name: cluster1 Stack: corosync Current DC: pcs02 (version 1.1.16-12.el7-94ff4df) - partition with quorum Last updated: Thu May 30 10:07:00 2019 Last change: Wed Apr 3 15:40:08 2019 by hacluster via crmd on pcs01
2 nodes configured 9 resources configured Online: [ pcs01 pcs02 ]
Full list of resources:
ibmca (stonith:fence_ipmilan): Stopped ibmcb (stonith:fence_ipmilan): Started pcs01 Resource Group: dsdb VIP (ocf::heartbeat:IPaddr): Started pcs01 LVM (ocf::heartbeat:LVM): Started pcs01 orahome (ocf::heartbeat:Filesystem): Started pcs01 arch (ocf::heartbeat:Filesystem): Started pcs01 data (ocf::heartbeat:Filesystem): Started pcs01 database (ocf::heartbeat:oracle): Started pcs01 lsnrctl (ocf::heartbeat:oralsnr): Started pcs01
Failed Actions: * ibmca_start_0 on pcs02 'unknown error' (1): call=239, status=Timed Out, exitreason='none', last-rc-change='Wed Apr 3 14:42:27 2019', queued=0ms, exec=20177ms * ibmcb_monitor_60000 on pcs01 'unknown error' (1): call=182, status=Timed Out, exitreason='none', last-rc-change='Wed May 29 19:30:59 2019', queued=0ms, exec=20136ms
Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@pcs01 ~]# |
将节点置为维护模式:
[oracle@pcs01 20190415psu_11.2.0.4]$ lsnrctl stop listener_dsdb
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 30-MAY-2019 10:10:11
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.228.110.133)(PORT=1521))) The command completed successfully [oracle@pcs01 20190415psu_11.2.0.4]$ [oracle@pcs01 29141056]$ opatch apply Oracle 涓存椂琛ヤ竵绋嬪簭瀹夎绋嬪簭鐗堟湰 11.2.0.3.21 鐗堟潈鎵€鏈(c) 2019, Oracle Corporation銆備繚鐣欐墍鏈夋潈鍒┿€
Oracle 涓荤洰褰 锛sjgxpt/oracle/product/11.2/db_1 涓讳骇鍝佹竻鍗曪細/sjgxpt/oracle/oraInventory 鏉ヨ嚜 锛sjgxpt/oracle/product/11.2/db_1/oraInst.loc OPatch 鐗堟湰 锛1.2.0.3.21 OUI 鐗堟湰 锛1.2.0.4.0 鏃ュ織鏂囦欢浣嶇疆锛sjgxpt/oracle/product/11.2/db_1/cfgtoollogs/opatch/opatch2019-05-30_10-13-39涓婂崍_1.log
Verifying environment and performing prerequisite checks... OPatch continues with these patches: 28204707 28729262 29141056
鏄惁缁х画? [y|n] y User Responded with: Y All checks passed. 鎻愪緵鐢靛瓙閭欢鍦板潃浠ョ敤浜庢帴鏀舵湁鍏冲畨鍏ㄩ棶棰樼殑閫氱煡, 瀹夎 Oracle Configuration Manager 骞跺惎鍔ㄥ畠銆傚鏋滄偍浣跨敤 My Oracle Support 鐢靛瓙閭欢鍦板潃/鐢ㄦ埛鍚 鎿嶄綔灏嗘洿绠€鍗曘€ 鏈夊叧璇︾粏淇℃伅, 璇疯闂http://www.oracle.com/support/policies.html銆 鐢靛瓙閭欢鍦板潃/鐢ㄦ埛鍚
灏氭湭鎻愪緵鐢靛瓙閭欢鍦板潃浠ユ帴鏀舵湁鍏冲畨鍏ㄩ棶棰樼殑閫氱煡銆 鏄惁涓嶅笇鏈涙敹鍒版湁鍏冲畨鍏ㄩ棶棰(鏄[Y], 鍚[N]) [N] 鐨勯€氱煡: y
璇峰叧闂湰鍦扮郴缁熶笂鍦ㄦ ORACLE_HOME 涔嬪杩愯鐨Oracle 瀹炰緥銆 (Oracle 涓荤洰褰= '/sjgxpt/oracle/product/11.2/db_1')
鏈湴绯荤粺鏄惁宸插噯澶囨墦琛ヤ竵? [y|n] y User Responded with: Y Backing up files... Applying sub-patch '28204707' to OH '/sjgxpt/oracle/product/11.2/db_1' Applying changes to emctl script on the home: /sjgxpt/oracle/product/11.2/db_1 ...
姝e湪涓虹粍浠oracle.rdbms, 11.2.0.4.0 鎵撹ˉ涓..
姝e湪涓虹粍浠oracle.oracore.rsf, 11.2.0.4.0 鎵撹ˉ涓..
姝e湪涓虹粍浠oracle.rdbms.rsf, 11.2.0.4.0 鎵撹ˉ涓..
姝e湪涓虹粍浠oracle.ldap.rsf, 11.2.0.4.0 鎵撹ˉ涓..
Jvm Opatch apply Startup upgrade |
取消维护模式.
pcs resource clear dsdb
pcs resource move dsdb pcs01
pcs resource move dsdb pcs02