This is the second article in the “Backup and Restore (or Recovery) in SQL Server” stairway series (see the full TOC below). This article deals with the different phases of data management life cycle and it encompasses the following topics:
这是“ SQL Server的备份和还原(或恢复)”阶梯系列中的第二篇文章(请参见下面的完整目录)。 本文介绍了数据管理生命周期的不同阶段,其中包括以下主题:
Data is said to be corrupted when it attains an undesirable state different from the original data. Data corruption also refers to data errors that can occur during any of the stages of data writing, data processing or even reading, at the storage, or during transmission.
当数据达到与原始数据不同的不良状态时,就称该数据已损坏。 数据损坏还指在存储或传输过程中,在数据写入,数据处理甚至读取的任何阶段都可能发生的数据错误。
We all know that at the very core, data is nothing but a series of 1s and 0s. When data is written, what actually happens is that the 1s and 0s are written to the destination. This sequence of 1s and 0s is interpreted during the data reading operation, giving it some meaning. If these bits don’t get written in the intended sequence, the data is said to have been corrupted.
我们都知道,从根本上讲,数据只不过是一系列的1和0。 写入数据时,实际发生的是将1和0写入目标。 在数据读取操作期间会解释1和0的顺序,使其具有一定的含义。 如果未按预期顺序写入这些位,则表示数据已损坏。
Small data means smaller sequences, and relatively lower chances of corruption. When the amount of data increases, the chances of corruption also increases. The chances are particularly high when data generation is high, and the subsequent storage operations hit unprecedented rates.
小数据意味着较小的序列,并且损坏的机会相对较低。 当数据量增加时,损坏的可能性也会增加。 当数据生成量很高时,机会特别高,随后的存储操作达到空前的速度。
Some of the most common reasons for data corruption include hardware or software issues with regard to I/O subsystems.
数据损坏的一些最常见原因包括与I / O子系统有关的硬件或软件问题。
In general, SQL Server isn’t aware of data corruption until it comes across errors when accessing a resource that’s corrupt. Therefore, the issue is that if the data wasn’t backed up before the corruption occurred, there would be no way to retrieve the data, which would mean permanent data loss.
通常,SQL Server在访问损坏的资源时遇到错误之前,不会意识到数据损坏。 因此,问题在于,如果未在损坏发生之前备份数据,则将无法检索数据,这将意味着永久性数据丢失。
Let’s consider a scenario where a mission-critical application database encountered data corruption. Let us remember that SQL Server doesn’t notice corrupt data until it tries to access it. Suppose a certain chunk of data was not read for a few weeks, because there was no necessity to, and data backup retention period was over resulting in the backup being purged. And then one day, after the backup containing the data was purged, someone queried the data, and SQL Server reported that it was corrupt. The value of the data chunk and the impact of the loss to the business depend on the nature of that data. Had the issue been detected within the span when the data could’ve been restored, the process may have even been considered insignificant. Now, however, this may be a loss to the business or a loss of job for the database administrator. Performing required data consistency checks on the database at regular intervals is a critical responsibility of a database administrator.
让我们考虑一个关键任务应用程序数据库遇到数据损坏的情况。 让我们记住,SQL Server在尝试访问数据之前不会注意到损坏的数据。 假设几周内没有读取某些数据,因为没有必要,而且数据备份保留期已过,导致清除了备份。 然后一天,清除包含数据的备份后,有人查询了数据,SQL Server报告该数据已损坏。 数据块的价值以及损失对业务的影响取决于数据的性质。 如果在可以恢复数据的时间范围内检测到问题,则该过程甚至可能被认为无关紧要。 但是现在,这可能是业务损失或数据库管理员的工作损失。 定期对数据库执行所需的数据一致性检查是数据库管理员的一项重要职责。
SQL Server has several built-in commands to mitigate the effects of corruption on the data during data operations.
SQL Server具有多个内置命令,以减轻数据操作期间数据损坏的影响。
One of the ways to look for data corruption is by enabling the CHECKSUM option for page verification.
查找数据损坏的方法之一是通过启用CHECKSUM选项进行页面验证。
ALTER DATABASE ProdSQLShackDemo SET PAGE_VERIFY CHECKSUM
This automatically checks the integrity of the data even as it is read from the table. All page verification operations are performed in the buffer pool.
即使从表中读取数据,这也会自动检查数据的完整性。 所有页面验证操作均在缓冲池中执行。
If the data is stored over an extended period of time, then CHECKSUM is not a good-enough option to validate the data. In those cases, we need to predominantly depend on the database consistency checker commands.
如果数据存储时间较长,则CHECKSUM不是验证数据的理想选择。 在这些情况下,我们主要需要依赖数据库一致性检查器命令。
Early detection is the key for successful data management. The sooner we notice issues, the easier it would be to manage these issues and the easier would it be to control the situation.
早期检测是成功进行数据管理的关键。 我们越早发现问题,就越容易管理这些问题,就越容易控制局势。
Always plan for a good backup strategy. A good backup strategy is always helpful, and helps recover from any disaster. Not only does it guarantee smooth functioning of the business, but also protects the data against user-caused errors, along with other types of failures.
始终计划好的备份策略。 一个好的备份策略总是有帮助的,并有助于从任何灾难中恢复。 它不仅可以保证业务的平稳运行,而且可以保护数据免受用户引起的错误以及其他类型的故障的影响。
It’s best to never place all of your eggs in the same basket. That is, it’s always better to have multiple options of recovery—as many as possible. However, no matter how many options we have, they’ll be helpful only when the issue is detected at an early stage. If not, the corruption may become too expensive to handle.
最好不要将所有鸡蛋放在同一个篮子里。 就是说,有尽可能多的恢复选项总是更好。 但是,无论我们有多少选择,它们只有在早期发现问题时才会有用。 如果不是这样,腐败可能变得太昂贵而无法处理。
SQL Server has a number of built-in tools to check the structure of the databases and their internals using DBCC commands that provide useful information to troubleshoot corruption issues. These commands can be divided into four categories:
SQL Server具有许多内置工具,它们使用DBCC命令检查数据库的结构及其内部,这些命令提供了有用的信息来解决损坏问题。 这些命令可以分为四类:
The most comprehensive check for data corruption can be done using the DBCC CHECKDB command.
可以使用DBCC CHECKDB命令对数据损坏进行最全面的检查。
We have three options for the DBCC CHECKDB command:
对于DBCC CHECKDB命令,我们有三个选项:
It’s interesting to know that the DBCC commands actually create an internal snapshot of the database, and then performs checks against that snapshot. This prevents concurrency issues. There are a few instances where the snapshot isn’t created, as well, for example, in a read-only database or a database in single-user mode or using hints.
有趣的是,DBCC命令实际上创建了数据库的内部快照,然后针对该快照执行检查。 这样可以防止并发问题。 在某些情况下,也不会创建快照,例如,在只读数据库或单用户模式或使用提示的数据库中。
On a VLDB database, CHECKDB command may take a while to complete. If the database design uses multiple file groups, we can reduce the load by initiating the CHECKDB process on individual file groups. It can also be scheduled to run at a larger interval, such as alternate days in place of every day.
在VLDB数据库上,CHECKDB命令可能需要一段时间才能完成。 如果数据库设计使用多个文件组,则可以通过在单个文件组上启动CHECKDB进程来减少负载。 还可以安排它以较大的间隔运行,例如以隔天代替每天。
Let’s now dive into some examples to understand the DBCC concepts and syntax:
现在,让我们深入研究一些示例以了解DBCC概念和语法:
-- Chnge the databases
USE ProdSQLShackDemo;
GO
-- run DBCC command
DBCC CHECKDB;
GO
-- The CHECKDB runs the following three console commands
DBCC CHECKALLOC;
DBCC CHECKTABLE ('SQLShackAuthor') WITH PHYSICAL_ONLY;
DBCC CHECKCATALOG WITH NO_INFOMSGS
DBCC CHECKCATALOG (ProdSQLShackDemo);
-- Checktable can be further simplified for specific index
DECLARE @indid int;
SET @indid = (SELECT index_id
FROM sys.indexes
WHERE object_id = OBJECT_ID('SQLShackAuthor') AND name like '%PK__SQLShack__3214EC27014935CB%')
DBCC CHECKTABLE ('SQLShackAuthor',@indid);
-- If corruptions are identified the repair options are available
-- database must be in single user mode
DBCC CHECKDB ('ProdSQLShackDemo', REPAIR_ALLOW_DATA_LOSS); -- REPAIR_FAST or REPAIR_REBUILD
-- CHECKFILEGROUP for specific filegroup
USE ProdSQLShackDemo;
GO
DBCC CHECKFILEGROUP (0, NOINDEX)
WITH PHYSICAL_ONLY,
ESTIMATEONLY;
GO
With the CHECKFILEGROUP command, we need to specify the name of the file group or its filegroupID. In this example, 0 is used, which represents primary filegroups. The NOINDEX option ignores the non-clustered index and runs only for clustered indexes and heap. The physical_only option validates the structure of the data pages, page headers and structure of the files on the disk. The estimate_only option figures out how much space is required in tempdb to create the snapshot.
使用CHECKFILEGROUP命令,我们需要指定文件组的名称或其文件组ID。 在此示例中,使用0表示主要文件组。 NOINDEX选项将忽略非聚集索引,并且仅对聚集索引和堆运行。 physical_only选项将验证数据页的结构,页头以及磁盘上文件的结构。 estimate_only选项指出在tempdb中创建快照需要多少空间。
Database protection is the process of safeguarding the data from corruption. We are in the era of little-to-no tolerance for outages and downtime. Hence, data protection is the key for many successful organizations.
数据库保护是保护数据免受损坏的过程。 我们正处于对故障和停机几乎没有容忍的时代。 因此,数据保护是许多成功组织的关键。
Data protection can be classified further into Availability and Management
数据保护可以进一步分为可用性和管理
Data availability is a process to ensure to smoothly run the business without any interruptions or data loss.
数据可用性是确保业务平稳运行而不会造成任何中断或数据丢失的过程。
Data management is a process used to understand the nature of the data, govern the data in such a way that the data is available at any cost irrespective of any forms corruption and it’s agreed with the framework of business policy and government bodies. Here are a few points to keep in mind:
数据管理是用于理解数据性质,以某种方式管理数据的过程,以使数据能够以任何成本获得,而不管其是否存在任何形式的损坏,并且已与商业政策和政府机构框架达成一致。 请记住以下几点:
SQL Server has several built-in mechanisms called page protection options to automatically identify data corruption.
SQL Server具有几种内置的机制,称为页面保护选项,可以自动识别数据损坏。
It’s important for any database administrator to understand the data lifecycle and the nature of the particular business, in order to have the ability to recover artifacts of business value from any sort of data disruptions.
任何数据库管理员都必须了解数据生命周期和特定业务的性质,以便能够从任何类型的数据中断中恢复业务价值工件,这一点很重要。
There are a number of ways in which corruption can creep into the system, and what makes it worse is its random nature, on when and where it occurs. In other words, corruption can never be eliminated from the systems. The administrators should perform proactive monitoring and guard the data against undesirable but inevitable issues of corruption. If corruption gets more frequent, then you should start working on a migration plan to move the database to a new hardware.
腐败可以通过多种方式蔓延到系统中,更糟糕的是它的随机性,发生时间和地点。 换句话说,腐败永远无法从系统中消除。 管理员应执行主动监视,并保护数据免受不良但不可避免的损坏问题的影响。 如果损坏变得更加频繁,那么您应该开始制定迁移计划,以将数据库移至新硬件。
Educate yourself—As the preceding points have noted, many causes of data corruption and loss can be avoided by taking appropriate measures to protect your data and avoid the common causes of data corruption.
自我教育—正如前面提到的那样,可以通过采取适当措施保护您的数据并避免造成数据损坏的常见原因来避免造成数据损坏和丢失的许多原因。
Implement a comprehensive Backup and Restore/Recovery solution and test it regularly to ensure everything is squeaky clean.
实施全面的备份和还原/恢复解决方案并定期对其进行测试,以确保所有内容都干净整洁。
Repair options should be considered as the last resort. It is advisable to make sure that you’re working on a cloned copy of the database; that way you can investigate what data, if any, was actually lost or modified.
维修选择应被视为万不得已。 建议确保您正在处理数据库的克隆副本。 这样,您可以调查实际上丢失或修改了哪些数据(如果有)。
Understand the scope of the business before tailoring the high availability feature set for the enterprise. No matter what high availability features we enable, ultimately, in most of the cases, recovering from a corruption relies on having a good backup strategy and routine monitoring.
在为企业定制高可用性功能集之前,请先了解业务范围。 无论我们启用哪种高可用性功能,最终,在大多数情况下,从损坏中恢复都取决于拥有良好的备份策略和常规监视。
Database Backup and Restore process in SQL Server – series intro |
An overview of the process of SQL Server backup-and-restore |
Understanding the SQL Server Data Management Life Cycle |
Understanding SQL Server database recovery models |
Understanding SQL Server Backup Types |
Backup and Restore (or Recovery) strategies for SQL Server database |
Discussing Backup and Restore Automation using SQLCMD and SQL Server agent |
Understanding Database snapshots vs Database backups in SQL Server |
SqlPackage.exe – Automate SQL Server Database Restoration using bacpac with PowerShell or Batch techniques |
Smart database backup in SQL Server 2017 |
How to perform a Page Level Restore in SQL Server |
Backup Linux SQL databases Using PowerShell and Windows Task Scheduler |
SQL Server database backup and restore operations using the Cloud |
Tail-Log Backup and Restore in SQL Server |
SQL Server Database Backup and Restore reports |
Database Filegroup(s) and Piecemeal restores in SQL Server |
In-Memory Optimized database backup and restore in SQL Server |
Understanding Backup and Restore operations in SQL Server Docker Containers |
Backup and Restore operations with SQL Server 2017 on Docker containers using Azure Data Studio |
Interview questions on SQL Server database backups, restores and recovery – Part I |
Interview questions on SQL Server database backups, restores and recovery – Part II |
Interview questions on SQL Server database backups, restores and recovery – Part III |
Interview questions on SQL Server database backups, restores and recovery – Part IV |
SQL Server中的数据库备份和还原过程–系列简介 |
SQL Server备份和还原过程概述 |
了解SQL Server数据管理生命周期 |
了解SQL Server数据库恢复模型 |
了解SQL Server备份类型 |
SQL Server数据库的备份和还原(或恢复)策略 |
讨论使用SQLCMD和SQL Server代理进行备份和还原自动化 |
了解SQL Server中的数据库快照与数据库备份 |
SqlPackage.exe –使用bacpac和PowerShell或Batch技术自动执行SQL Server数据库还原 |
SQL Server 2017中的智能数据库备份 |
如何在SQL Server中执行页面级还原 |
使用PowerShell和Windows Task Scheduler备份Linux SQL数据库 |
使用CloudSQL Server数据库备份和还原操作 |
SQL Server中的尾日志备份和还原 |
SQL Server数据库备份和还原报告 |
SQL Server中的数据库文件组和零碎还原 |
在SQL Server中进行内存优化的数据库备份和还原 |
了解SQL Server Docker容器中的备份和还原操作 |
使用Azure Data Studio在Docker容器上使用SQL Server 2017进行备份和还原操作 |
有关SQL Server数据库备份,还原和恢复的面试问题–第一部分 |
有关SQL Server数据库备份,还原和恢复的面试问题–第二部分 |
有关SQL Server数据库备份,还原和恢复的面试问题–第三部分 |
有关SQL Server数据库备份,还原和恢复的面试问题–第IV部分 |
翻译自: https://www.sqlshack.com/understanding-the-data-management-life-cycle/