In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA

本文为In-memory Computing with SAP HANA on Lenovo X6 Systems第七章Business continuity and resiliency for SAP HANA的读书笔记。

In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第1张图片

Overview of business continuity options


Developing a business continuity plan highly depends on the type of business a company is doing, and it differs (among other factors) by country, regulatory requirements, and employee size.

* Recovery Time Objective (RTO) defines the maximum tolerated time to get a system online again.
* Recovery Point Objective (RPO) defines the maximum tolerated time span to which data must be restored. It also defines the amount of time for which data is tolerated to be lost. An RPO of zero means that the system must be designed to not lose data in any of the considered events.
* Recovery Consistency Objective (RCO) defines the level of consistency of business processes and data that is spread out over multitier environments.


HA covers a hardware failure (for example, one node becomes unavailable because of a faulty processor, memory DIMM, storage, or network failure)
HA is implemented by introducing standby nodes. During normal operation, these nodes do not actively participate in processing data, but they do receive data that is replicated from the worker nodes. If a worker node fails, the standby node takes over and continues data processing.

DR covers the event when multiple nodes in a scale-out configuration fail, or a whole data center goes down because of a fire, flood, or other disaster, and a secondary site must take over the SAP HANA system.

1. 基础设施层 - 底层数据复制,例如基于General Parallel File System (GPFS)的存储复制
2. 应用层 - 两端执行相同的指令,可通过SAP HANA System Replication (SSR)实现,SSR不支持自动failover

GPFS based storage replication


SAP HANA System Replication



Every SAP HANA process that is running on the primary system’s worker nodes must have a corresponding process on a secondary worker node to which it replicates its activity.
The only difference between the primary and secondary system is the fact that one cannot connect to the secondary HANA installation and run queries on that database. They can also
be called active and passive systems.
Upon start of the secondary HANA system, each process establishes a connection to its primary counterpart and requests the data that is in main memory, which is called a snapshot.
After the snapshot is transferred, the primary system continuously sends the log information to the secondary system that is running in recovery mode. At the time of this writing, SSR
does not support replaying the logs immediately as they are received; therefore, the secondary site system acknowledges and persists the logs only. To avoid having to replay
hours or days of transaction logs upon a failure, SSR asynchronously transmits a new incremental data snapshot periodically.

SSR复制中,standby node可以承载非生产应用。

Special considerations for DR and long-distance HA setups


HA and DR for single-node SAP HANA

先解释一下single node:

High availability (HA) scenarios for SAP Business Suite with SAP HANA are supported, but are restricted to the simplest case of two servers, one being the worker node and one acting as a standby node. In this case, the database is not partitioned, but the entire database is on a single node. This configuration is sometimes also referred to as a single-node HA configuration. Because of these restrictions with regards to scalability, SAP decided to allow configurations with a higher memory per core ratio, specifically for this use case.

single node就是只有一个work node,即非scale out的情形。物理上可以有2-3个node。

1. 所有的HA方案都是可以自动切换的;而所有的DR都必须手工切换
2. 所有的HA方案,standby node都不能接受工作负载。而DR方案都可以。
3. 所有的HA方案,GPFS都是一套,而DR方案是两套。
4. HA的复制是同步的,DR的复制可以是同步或异步。

High availability (by using GPFS)

单个数据中心,三个物理node,分别为worker(active), standby 和quorum node。

worker node接受所有工作负载,standby node只用于接管,不能处理工作负载。quorum node用于防止split brain。

In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第2张图片


Stretched high availability (by using GPFS)

与single node HA相比,距离更长,其它都一样。
称为stretched HA。

quorum node应放置在第三站点,如果条件不具备,就放在主站点。

In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第3张图片

Disaster recovery (by using GPFS)

quorum node应放置在第三站点,如果条件不具备,就放在主站点。

注意到这个图和前面两个非常类似,唯一不同是HANA DB只在一个worker node上,而前面两个图,HANA DB都是跨worker node和standby node。


但好处是standby node可以接受工作负载,例如开发和测试。


Disaster recovery (by using SAP HANA System Replication)


In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第4张图片


HA plus DR (by using GPFS)

In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第5张图片


HA (by using GPFS) plus DR (by using SSR)

In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第6张图片



HA and DR for scale-out SAP HANA

In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第7张图片

Scale-out SAP HANA installations can implement two levels of redundancy to keep their database instance from going offline. The first step is to add a server node to the scale-out
cluster that acts as a hot-standby node. The second step is to set up another scale-out cluster in a distinct data center that takes over operation if there is a disaster at the primary site.


HA by using GPFS storage replication

使用的GPFS文件系统的复制(HA是总共两份数据),既然是scale-out,使用的就是GPFS FPO版本。

DR by using GPFS storage replication


In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第8张图片


quorum node防止主点和备点直接网络中断导致的脑裂。




HA by using GPFS replication plus DR by using SAP HANA Replication

单节点失效可通过主点的standby node 接管(HA), 多节点失效可通过DR切换到备点。


HA and DR for SAP HANA on Flex System

Flex System是一体机而已,其它概念相同,此处略。

Backup and restore

Basic operating system backup and recovery


Basic database backup and recovery

Saving the savepoints and the database logs technically is impossible in a consistent way, and thus does not constitute a consistent backup from which it can be recovered. Therefore, a simple file-based backup of the persistency layer of SAP HANA is insufficient.

SAP HANA Studio 或 SAP HANA SQL 接口可启动备份,HANA只支持全备,不支持增量备份。

The backup files are saved to a defined staging area that might be on the internal disks, an external disk on an NFS share,8 or a directly attached SAN subsystem. In addition to the data backup files, the SAP HANA configuration files and backup catalog files must be saved to be recovered. For point-in-time recovery, the log area also must be backed up.


File-based backup tool integration

Database backups by using GPFS snapshots

GPFS supports a snapshot feature with which you can take a consistent and stable view of the file system that can then be used to create a backup (which is similar to enterprise storage snapshot features). While the snapshot is active, GPFS stores any changes to files in a temporary delta area. After the snapshot is released, the delta is merged with the original data and any further changes are applied on this data.

Taking only a GPFS snapshot does not ensure that you have a consistent backup that you can use to perform a restore. SAP HANA must be instructed to flush out any pending changes to disk to ensure a consistent state of the files in the file system.


Backup tool integration with Backint for SAP HANA

HANA提供API与第三方备份工具集成,即Backint,可以认为类似于Oracle DB中的RMAN。


目前认证的有Symentec NBU, EMC networker, IBM和Commvault等。

Tivoli Storage Manager for ERP 6.4

Symantec NetBackup 7.5 for SAP HANA

Backup and restore as a DR strategy

The use of backup and restore as a DR solution is a basic way of providing DR. Depending on the RPO, it might be a viable way to achieve DR. The basic concept is to back up the data on the primary site regularly (at least daily) to a defined staging area, which might be an external disk on an NFS share or a directly attached SAN subsystem (this subsystem does not need to be dedicated to SAP HANA). After the backup is done, it must be transferred to the secondary site, for example, by a simple file transfer (can be automated) or by using the replication function of the storage system that is used to hold the backup files.

In-memory Computing with SAP HANA读书笔记 - 第七章:Business continuity and resiliency for SAP HANA_第9张图片

本书的笔记到本章就结束了,Thanks for you time, enjoy reading!
