了解SQL Server数据库静态数据及其如何适合数据库生命周期管理

什么是静态数据 (What is static data)

Static data (aka Code, Lookup, List or Reference data), in the context of a relational database management system, is generally data that represents fixed data, that generally doesn’t change, or if it does, changes infrequently, such as abbreviations for US states for example e.g. ME, MA, NC. This data is typically referenced, extensively, by transactional type data. For example, an customer table would have references to static table for City name, State or province, Country, Payment terms e.g. NET 30 etc.

在关系数据库管理系统的上下文中,静态数据(即代码,查找,列表或参考数据)通常是表示固定数据的数据,该数据通常不更改,或者如果不更改则不经常更改(例如缩写)例如美国的州,例如ME,MA,NC。 此数据通常由事务类型数据广泛引用。 例如,客户表将引用静态表,以获取城市名称,州或省,国家/地区,付款条件(例如NET 30等)。

Other examples of static data would be lists of things like Guitar manufacturers, internal abbreviations for company departments, names of all of the countries in the EU.

静态数据的其他示例包括诸如吉他制造商,公司部门的内部缩写,欧盟所有国家的名称之类的列表。

Static data usually isn’t large and in some cases, can be as little as two or three rows like surnames Mr. Mrs., Ms. etc.

静态数据通常不大,在某些情况下,可能只有两行或三行,例如姓先生,太太等。

仔细看看 (A closer look)

Static data is a key part of any normalized data structure. Rather than using data repetitively in a table, like “United States”, it is more efficient to create a table called countries, and assign an ID to each country, and reference the ID in transactional data, where it might be implemented thousands or even millions of times. Thus, the Country table becomes static data, referenced by other, transactional type tables whenever needed.

静态数据是任何规范化数据结构的关键部分。 与其像“美国”之类的表中重复使用数据,不如创建一个名为“国家”的表,并为每个国家/地区分配一个ID,然后在交易数据中引用该ID(可能在其中实现数千甚至什至是)的效率更高。数百万次。 因此,Country表成为静态数据,并在需要时由其他事务类型表引用。

Sure, new countries are created from time to time e.g. Southern Sudan, but in general the list is fairly static over time, particularly with lists like the names of the states in America, which hasn’t changed since 1959 *

当然,不时会建立新的国家,例如苏丹南部,但总的来说,清单随时间推移是相当静态的,尤其是像美国各州名称这样的清单,自1959年以来就没有改变过*

Here is an example of a Static data table:

这是静态数据表的示例:

 
CREATE TABLE [dbo].[Countries] (
        [CountryID]     [int] IDENTITY(1, 1) NOT NULL,
        [Name]          [nvarchar](1) NULL,
        CONSTRAINT [PK__Countrie__10D160BF81AE8538]
        PRIMARY KEY
        CLUSTERED
        ([CountryID])
    ON [PRIMARY]
) ON [PRIMARY]
GO
 

Static data can usually be found from drilling down from the user interface via picklists. Anything demonstrated to the user as a Literal e.g. Alabama, Arkansas etc., will have a corresponding Value that is actually stored with the record. In our example, the picklist will show the Literal e.g. Ireland but store the Value e.g. 107 in the users table

通常可以从用户界面通过选择列表向下钻取来找到静态数据。 向用户展示的任何文字,例如阿拉巴马州,阿肯色州等,都将具有与记录实际存储的对应 。 在我们的示例中,选择列表将显示文字(例如爱尔兰),但将值(例如107)存储在用户表中

了解SQL Server数据库静态数据及其如何适合数据库生命周期管理_第1张图片

* Images courtesy of ApexSQL Doc

*图片由ApexSQL Doc提供

为什么版本控制静态数据? (Why version control static data?)

As we know that not having static data in our database, isn’t a possibility, for the purpose of this article we assume that the alternatives would be generating synthetic test data for the static tables and/or in using data from production. Versioning static data will be compared to these options and advantages listed.

我们知道在数据库中没有静态数据是不可能的,出于本文的目的,我们假设替代方法将为静态表生成综合测试数据和/或使用生产数据。 将对静态数据进行版本控制与列出的这些选项和优点进行比较。

To manage tight coupling with business logic, static data referenced in business logic

为了管理与业务逻辑的紧密耦合,需要在业务逻辑中引用静态数据

Static data is often considered akin to the database structure itself, because, rightly or wrongly, static data is often coupled with database logic, particularly with poorly designed databases and clients.

静态数据通常被认为类似于数据库结构本身,因为静态数据正确或错误地经常与数据库逻辑结合,尤其是与设计不良的数据库和客户端有关。

If we want to calculate the amount of money we’ll pay in Sales tax, we would normally have a table by states, grouped by rates.

如果我们想计算我们将要支付的营业税金额,通常我们会有一张按州分类的表格,按费率分组。

But we might have a scenario where some logic references states by their IDs directly E.g. 44 for “North Carolina”. Worse, we may have someone using a “magic number” (although it isn’t actually a number) in the code of “NC”.

但是我们可能会遇到这样的情况:某些逻辑直接通过ID(例如,“北卡罗来纳州”)的ID引用状态。 更糟糕的是,我们可能有人在“ NC”代码中使用“幻数”(尽管实际上不是数字)。

In either of these two scenarios, forgetting for a moment the potential problems that could happen with changes to tax rates, etc. or the general brittleness of the system, a test plan that implements synthetic data could break the first scenario (if only 40 states were generated) and would most certainly break it if the random value of “NC” wasn’t by luck, created by the test data plan.

在这两种情况中的一种中,暂时忘记了税率等变化或系统总体脆弱性可能发生的潜在问题,实施综合数据的测试计划可能会破坏第一种情况(如果只有40个州生成),并且如果不是由测试数据计划创建的“ NC”随机值不是靠运气,那肯定会破坏它。

You could easily set up logic to ensure exactly 50 states were generated, or import the list of states into your test generation plan. But other scenarios can be more complicated and more difficult to simulate with test data. A simple solution might be to just use the actual static data itself.

您可以轻松地设置逻辑以确保准确生成了50个状态,或将状态列表导入到测试生成计划中。 但是其他情况可能更复杂,更难用测试数据进行模拟。 一个简单的解决方案可能是仅使用实际的静态数据本身。

Although these are kind of extreme examples, using poor design and coding practices, many other examples exist (see below).

尽管这些都是极端的示例,但使用不良的设计和编码实践,仍然存在许多其他示例(请参见下文)。

Here is one I found on StackOverflow

这是我在StackOverflow上找到的一个

“my users can change their setting, and their configuration is stored on the server. However, there’s a system-wide default value for each setting that is used in case the user didn’t override it. The table containing those default settings grows as more options are added to the program. This means that when a new feature/option is checked in, the system-wide default setting is usually created in the database as well.”

我的用户可以更改其设置,其配置存储在服务器上。 但是,如果用户没有覆盖默认设置,则会为每个设置使用系统范围的默认值。 随着更多选项被添加到程序中,包含这些默认设置的表将增加。 这意味着当签入新功能/选项时,通常也会在数据库中创建系统范围的默认设置。”

In general, we don’t want any chance of our test data breaking our application or database in QA e.g. false negatives. We want to be generating only real bugs.

通常,我们不希望测试数据在质量检查中破坏应用程序或数据库的任何可能性,例如假阴性。 我们只想生成真正的错误。

To avoid breaking hard coded unit tests

为了避免破坏硬编码的单元测试

Even on a system that is well designed, with no hard coded “magic” numbers in the business logic, we can still run into problems when the database is reviewed and tested as part of a continuous integration workflow using SQL Server unit tests. Writing good Unit tests for SQL Server is hard, and resisting the temptation to test vs hard coded values in the test phase of a deployment is even harder. So even the most well intentioned DBA, can wind up with broken unit tests, if static data isn’t managed properly during continuous integration process, by testing against production static data vs automatically generated, synthetic test data.

即使在设计合理的系统上,业务逻辑中也没有硬编码的“魔术”数字,当使用SQL Server单元测试将数据库作为连续集成工作流的一部分进行检查和测试时,我们仍然会遇到问题。 为SQL Server编写好的单元测试非常困难,而在部署的测试阶段抵制测试与硬编码值的诱惑就变得更加困难。 因此,如果在连续集成过程中无法正确管理静态数据,甚至可以通过针对生产静态数据和自动生成的综合测试数据进行测试,那么即使是最有主见的DBA也会因单元测试失败而告终。

Because we can; Static data is generally smaller data

因为我们可以; 静态数据通常是较小的数据

Static data going to be smaller in volume than transactional data due to the fact that it is based on some finite list. The number of transactions your company processes in a day might be in the millions, but the number of states in America has remained at 50 since 1959 *

由于静态数据基于某个有限列表,因此其静态数据量将小于事务数据量。 您的公司每天处理的交易数可能为数百万,但自1959年以来,美国的州数一直保持在50个*

The fact that static data sets are usually small, make it possible to version them in source control repositories, that aren’t designed to store large amounts of data, like a typical database is.

静态数据集通常很小,这一事实使得可以在源代码控制存储库中对它们进行版本控制,而源代码控制存储库并非像典型的数据库那样设计用于存储大量数据。

So, in this case, one strong reason for versioning static data is that we can

因此,在这种情况下,对静态数据进行版本控制的一个重要原因是我们可以

To track and audit changes

跟踪和审核更改

It is called static data, not because it doesn’t change, but because it doesn’t change often. There is a large spectrum from data that won’t ever change e.g. the list of Roman emperors to data that changes infrequently, but still does change, like the names of countries e.g. South Sudan.

之所以称为静态数据,不是因为它不会更改,而是因为它不会经常更改。 从不曾改变的数据(例如罗马皇帝的名单)到不经常改变但仍在改变的数据(例如南苏丹等国家的名字)存在着很大的范围。

Based on the fact that static data is so critical to a database processing correctly, tracking changes is also important. We want to know when our static data changes, who changed it and when, just like we want to know when a stored procedure was updated. Version controlling allows us to do that.

基于静态数据对于正确处理数据库至关重要的事实,跟踪更改也很重要。 我们想知道静态数据何时更改,谁更改了何时以及何时更改,就像我们想知道存储过程何时更新一样。 版本控制使我们能够做到这一点。

Production data might be pulled down to QA to load up static tables, but key information about changes wouldn’t be tracked. By version controlling static data, you gain the benefits of “real” production data but also layer on change tracking and auditing. All changes are tracked by date, user etc. and can easily be rolled back, if needed

生产数据可能会下拉到质量检查中以加载静态表,但不会跟踪有关更改的关键信息。 通过版本控制静态数据,您可以获得“真实”生产数据的好处,而且还可以进行变更跟踪和审计。 所有更改均按日期,用户等进行跟踪,并且可以根据需要轻松回滚

To see realistic data

查看真实数据

When building a QA database from scratch, 100% synthetic test data can be used, the operative word being “can”. Depending on the sophistication of the tool you use, you might have names like “Axd Bemioyz” or “Sam Shepherd”. The former may be Ok for transactional data, but even if your client software and database are designed well, without hard coded references, you may still want your test environments to reflect data that is as close to “real world” as possible. By synthetically generating static data, you may confuse or distract your testers with unfamiliar and strange data strings that are ubiquitous in your app, showing up in all picklists, reports etc.

从头开始建立质量检查数据库时,可以使用100%综合测试数据,操作词为“可以”。 根据所使用工具的复杂程度,您可能会使用“ Axd Bemioyz”或“ Sam Shepherd”之类的名称。 对于事务性数据,前者可能还可以,但是即使您的客户端软件和数据库设计良好,没有硬编码的引用,您仍然可能希望测试环境反映尽可能接近“真实世界”的数据。 通过综合生成静态数据,您可能会将测试人员与应用程序中普遍存在的陌生且陌生的数据字符串混淆或分散,从而出现在所有选择列表,报告等中。

SQL Server test data generators like ApexSQL Generate can narrow the gap between unrealistic data like “Aud Bemioyx” and realistic values like “Sam Shepherd”. It can even apply generators with realistic data like country names or user’s information to mimic real production data:

像ApexSQL Generate这样SQL Server测试数据生成器可以缩小“ Aud Bemioyx”之类的不真实数据与“ Sam Shepherd”之类的真实值之间的差距。 它甚至可以将具有真实数据(例如国家/地区名称或用户信息)的生成器应用于模拟实际生产数据:

了解SQL Server数据库静态数据及其如何适合数据库生命周期管理_第2张图片

After using the predefined generator, the overview of the static data prepared to be inserted is as follows:

使用预定义的生成器后,准备插入的静态数据概述如下:

了解SQL Server数据库静态数据及其如何适合数据库生命周期管理_第3张图片

Even though these tools are fast and easy to use, in most cases, it may be marginally less time to simply work with the real data itself, at least in the case of static data. The optional test plan uses version controlled static data in combination with automatically generated test data

即使这些工具快速且易于使用,但在大多数情况下,至少在静态数据的情况下,仅处理真实数据本身的时间可能会稍短一些。 可选的测试计划将版本控制的静态数据与自动生成的测试数据结合使用

如何对静态数据进行版本控制 (How to version control static data)

Now that we’ve reviewed static data and why it can be such an important part of a continuous integration and delivery process, and therefore why it should be versioned in source control, we can look at ways to implement it. We’ll look at ways to Get static data into a repository, managing changes and getting it out when you want

现在,我们已经回顾了静态数据,以及为什么静态数据可以成为持续集成和交付过程中如此重要的部分,以及为什么应该在源代码控制中对静态数据进行版本控制,所以我们将研究实现静态数据的方法。 我们将研究将静态数据放入存储库,管理更改并在需要时将其发布的方法

To do this I’ll link to articles showing, literally, the ins and outs of Static data using a variety of tools from ApexSQL

为此,我将链接到使用ApexSQL的各种工具从字面上显示静态数据的内容的文章。

Get in – The first challenge is getting static data into a source control repository

进入 –第一个挑战是将静态数据放入源代码控制存储库

  • How to link and initially commit SQL Server database static data 如何链接和最初提交SQL Server数据库静态数据
  • How to commit and/or update SQL Server database static data to a source control repository 如何将SQL Server数据库静态数据提交和/或更新到源代码控制存储库
  • How to commit SQL Server table static data to a source control repository 如何将SQL Server表静态数据提交到源代码管理存储库

Get it changed – Once it is in, you’ll want to know how to manage changes by checking out scripts, changing them and checking them back into the repository

进行更改 –进入后,您将想知道如何通过签出脚本,更改它们并将其签回到存储库中来管理更改

  • How to work with version controlled SQL Server database static data如何使用版本控制SQL Server数据库静态数据

Get it out – Finally, once static data has been successfully committed, and all changes has been updated, you’ll want to know how to get it out when you need to, to synchronize against a QA or PROD database or integrate into a Continuous integration or delivery workflow process

取得成功 –最后,一旦成功提交了静态数据并且更新了所有更改,您将想知道如何在需要时取得它,与QA或PROD数据库同步或集成到Continuous中。整合或交付工作流程

  • How to pull version controlled SQL Server database static data from a repository如何从存储库中提取版本控制SQL Server数据库静态数据
  • How to deploy static data from SQL source control to database如何将静态数据从SQL源代码管理部署到数据库
  • How to apply static data under source control to a SQL Server database如何在源代码管理下将静态数据应用于SQL Server数据库

* The last state added to the U.S was Hawaii. Hawaii was declared a state on August 21, 1959.

*最后添加到美国的州是夏威夷。 1959年8月21日,夏威夷宣布成为夏威夷州。

翻译自: https://www.sqlshack.com/understanding-sql-server-database-static-data-fits-database-lifecycle-management/

你可能感兴趣的:(数据库,python,java,大数据,数据分析)