culi3118

云中数据_云数据策略：防止云中的数据黑洞

云中数据

This article was originally published on mongoDB. Thank you for supporting the partners who make SitePoint possible.

本文最初在mongoDB上发布。 感谢您支持使SitePoint成为可能的合作伙伴。

Black holes are regions in spacetime with such a strong gravitational pull that nothing can escape. Not entirely destructive as you might have been lead to believe, their gravitational effects help drive the formation and evolution of galaxies. In fact, our own Milky Way galaxy orbits a supermassive black hole with 4.1 million times the mass of the Sun. Some theorize that none of us would be here were it not for a black hole.

黑洞是时空中具有强大引力的区域，没有任何东西可以逃脱。正如您可能会相信的那样，它们并不完全具有破坏性，它们的引力效应有助于推动星系的形成和演化。实际上，我们自己的银河系绕着质量为太阳质量四百一十万倍的超大质量黑洞运动。一些理论认为，如果不是黑洞，我们谁也不会在这里。

On the flip side, black holes can also be found hurtling through the cosmos — often at millions of miles per hour — tearing apart everything in their path. It’s said that anything that makes it into their event horizons, the “point of no return”, will never be seen or heard from again, making black holes some of the most interesting and terrifying objects in space.

另一方面，还可以发现黑洞刺穿整个宇宙，时速通常达数百万英里，将其路径上的所有东西都撕裂了。有人说，任何进入其事件视界的东西，即“无返回点”，都将再也看不见或听到，这使黑洞成为太空中一些最有趣和最恐怖的物体。

Why are we going on about black holes, gravitational effects, and points of no return? Because something analogous is happening right now in computing.

为什么我们要处理黑洞，引力效应和无归点？因为现在在计算中正在发生类似的事情。

First coined in 2010 by Dave McCrory, the concept of “data gravity” treats data as if it were a planet or celestial object with mass. As data accumulates in an environment, applications and services that rely on that data will naturally be pulled into the same environment. The larger the “mass” of data there is, the stronger the “gravitational pull” and the faster this happens. Applications and services each have their own gravity but data gravity is by far the strongest, especially as:

“数据引力”的概念由戴夫·麦克罗里(Dave McCrory )于2010年首次提出，将数据视为是质量的行星或天体。随着数据在环境中累积，依赖于该数据的应用程序和服务自然会被拉到同一环境中。数据的“质量”越大，“引力”越强，发生的速度越快。应用程序和服务各自具有自己的重力，但数据重力是最强的，尤其是：

The further away data is, the more drastic the impacts on application performance, and user experience. Keeping applications and services physically nearby reduces latency, maximizes throughput, and makes it easier for teams to build performant applications.
数据距离越远，对应用程序性能和用户体验的影响就越大。将应用程序和服务保持在物理位置附近可以减少延迟，最大程度地提高吞吐量，并使团队更轻松地构建高性能的应用程序。
Moving data around has a cost. In most cases, it makes sense to centralize data to reduce that cost, which is why data tends to amass in one location or environment. Yes, distributed systems do allow organizations to partition data in different ways for specific purposes — for example, fencing sets of data by geographic borders to comply with regulations — but within those partitions, minimal data movement is still desirable.
移动数据需要一定的成本。在大多数情况下，集中数据以降低成本是有意义的，这就是为什么数据倾向于在一个位置或环境中聚集的原因。是的，分布式系统确实允许组织出于特定目的以不同的方式对数据进行分区(例如，按地理边界围栏数据集以符合法规)，但是在这些分区内，仍然需要最少的数据移动。
And finally, efforts to digitize business and organizational activities, processes, and models (dubbed by many as “digital transformation” initiatives) succeed or fail based on how effectively data is utilized. If software is the engine by which digital transformation happens, then data is its fuel.
最后，根据有效利用数据的方式，将业务和组织活动，流程和模型(许多人称为“数字化转型”计划)数字化的努力是成功还是失败。如果软件是进行数字转换的引擎，那么数据就是其动力。

As in the real world, the larger the mass of an object, the harder it is to move, so data gravity also means that once your mass of data gets large enough, it is also harder (and in some cases, near impossible) to move. What makes this relevant now more than ever is the shift to cloud computing. As companies move to the cloud, they need to make a decision that will have massive implications down the line — where and how are they going to store their data? And how do they not let data gravity in the cloud turn into a data black hole?

就像在现实世界中一样，物体的质量越大，移动起来就越困难，因此数据引力也意味着一旦数据量变得足够大，它也就变得更难(在某些情况下几乎是不可能的)移动。现在比以往任何时候都更重要的是向云计算的转变。随着公司迁移到云中，他们需要做出一个决策，该决策将对整个流程产生重大影响-他们将在哪里以及如何存储数据？他们又如何不让云中的数据引力变成数据黑洞 ？

There are several options for organizations moving from building their own IT to consuming it as a service in the cloud.

对于组织，从构建自己的IT到将其作为云中的服务来使用，有多种选择。

专有表格(关系)数据库 (Proprietary Tabular (Relational) Databases)

The companies behind proprietary tabular databases often penalize their customers for running these technologies on any cloud platform other than their own. This should not surprise any of us. These are the same vendors that for decades have been relying on selling heavy proprietary software with multi-year contracts and annual maintenance fees. Vendor lock-in is nothing new to them.

专有表格数据库背后的公司通常会因在其自身以外的任何云平台上运行这些技术而对客户造成不利影响。这不应该使我们所有人感到惊讶。这些供应商几十年来一直依赖销售具有多年合同和年度维护费用的重型专有软件。供应商锁定对他们来说并不是什么新鲜事物。

Organizations choosing to use proprietary tabular databases in the cloud also carry over all the baggage of those technologies and realize few cloud benefits. These databases scale vertically and often cannot take advantage of cloud-native architectures for scale-out and elasticity without massive compromises. If horizontal scale-out of data across multiple instances is available, it isn’t native to the database and requires complex configurations, app-side changes, and additional software.

选择在云中使用专有表格数据库的组织还承担了这些技术的全部负担，几乎没有云优势。这些数据库是垂直扩展的，并且在没有大量妥协的情况下，通常无法利用云原生架构进行扩展和弹性。如果可以在多个实例之间横向扩展数据，则它不是数据库固有的，需要复杂的配置，应用程序侧更改和其他软件。

Lifting and shifting these databases to the cloud does not change the fact that they’re not designed to take advantage of cloud architectures.

将这些数据库提升并转移到云上并不会改变它们并非旨在利用云架构的事实。

开源表格数据库 (Open Source Tabular Databases)

Things are a little better with open source tabular databases insofar as there is no vendor enforcing punitive pricing to keep you on their cloud. However, similar to proprietary tabular databases, most of these technologies are designed to scale vertically; scaling out to fully realize cloud elasticity is often managed with fragile configurations or additional software.

在没有供应商执行惩罚性定价以保持您的云计算的范围内，开源表格数据库的情况要好一些。但是，类似于专有表格数据库，大多数这些技术都是垂直扩展的。向外扩展以完全实现云弹性通常是通过脆弱的配置或其他软件来管理的。

Many companies running these databases in the cloud rely on a managed service to reduce their operational overhead. However, feature parity across cloud platforms is nonexistent, making migrations complicated and expensive. For example, databases running on Amazon Aurora leverage Aurora-specific features not found on other clouds.

许多在云中运行这些数据库的公司都依赖托管服务来减少其运营开销。但是，不存在跨云平台的功能奇偶校验，这使迁移变得复杂且昂贵。例如，在Amazon Aurora上运行的数据库利用了其他云上没有的特定于Aurora的功能。

专有云数据库 (Proprietary Cloud Databases)

With proprietary cloud databases, it’s very easy to get into a situation where data goes in and nothing ever comes out. These database services run only in their parent cloud and often provide very limited database functionality, requiring customers to integrate additional cloud services for anything beyond very simple use cases.

使用专有的云数据库，很容易陷入数据流入而又什么也不会流出的情况。这些数据库服务仅在其父云中运行，并且通常提供非常有限的数据库功能，要求客户针对超出非常简单用例的所有内容集成其他云服务。

For example, many of the proprietary cloud NoSQL services offer little more than key-value functionality; users often need to pipe data into a cloud data warehouse for more complex queries and analytics. They also tend to be operationally immature, requiring additional integrations and services to address data protection and provide adequate performance visibility. And it doesn’t stop there. New features are often introduced in the form of new services, and before users know it, instead of relying on a single cloud database, they’re dependent on an ever-growing network of cloud services. This makes it all the more difficult to ever get data out.

例如，许多专有的云NoSQL服务仅提供键值功能。用户通常需要将数据通过管道传输到云数据仓库中，以进行更复杂的查询和分析。它们在操作上还很不成熟，需要额外的集成和服务来解决数据保护问题并提供足够的性能可见性。而且不止于此。新功能通常以新服务的形式引入，并且在用户不知不觉中，它们不再依赖单个云数据库，而是依赖于不断增长的云服务网络。这使得获取数据变得更加困难。

The major cloud providers know that if they’re able to get your data in one of their proprietary database services, they’ve got you right where they want you. And while some may argue that organizations should actually embrace this new, ultimate form of vendor lock-in to get the most out of the cloud, that doesn’t leave customers with many options if their requirements, or if data regulations, change. What if the cloud provider you’re not using releases a game-changing service you need to edge out your competition? What if they open up a data center in a new geographic region you’ve prioritized and yours doesn’t have it on their roadmap? What if your main customer dictates that you should sever ties with your cloud provider? It’s happened before.

主要的云提供商知道，如果他们能够通过自己的专有数据库服务之一获取您的数据，那么他们就可以将您带到想要的地方。尽管有些人可能认为组织实际上应该采用这种新的，最终的供应商锁定形式，以充分利用云，但如果客户的需求或数据法规发生变化，这并不会给客户留下很多选择。如果您不使用的云提供商发布了一项改变游戏规则的服务，您需要在竞争中脱颖而出，该怎么办？如果他们在您优先考虑的新地理区域内开设数据中心，而您的路线图上没有它，该怎么办？如果您的主要客户指示您应与云提供商断绝关系怎么办？以前发生过。

These are all scenarios where you could benefit from using a database that runs the same, everywhere.

在所有这些情况下，您都可以使用在任何地方都运行相同的数据库而受益。

运行相同的数据库……无处不在 (The Database That Runs the Same… Everywhere)

As you move into the cloud, how you prevent data gravity from turning against you and limiting your flexibility is simple — use a database that runs the same in any environment.

当您迁移到云中时，如何防止数据重力逆转并限制灵活性很简单-使用在任何环境中都运行相同数据库的数据库。

One option to consider is MongoDB. As a database, it combines the flexibility of the document data model with sophisticated querying and indexing required by a wide range of use cases, from simple key-value to real-time aggregations powering analytics.

要考虑的一种选择是MongoDB。作为数据库，它结合了文档数据模型的灵活性以及各种用例所需的复杂查询和索引，这些用例从简单的键值到支持分析的实时聚合。

MongoDB is a distributed database designed for the cloud at its core. Redundancy for resilience, horizontal scaling, and geographic distribution are native to the database and easy to use.

MongoDB是一个专为云计算而设计的分布式数据库。弹性，水平缩放和地理分布的冗余是数据库固有的，并且易于使用。

And finally, MongoDB delivers a consistent experience regardless of where it is deployed:

最后，无论部署在何处，MongoDB都能提供一致的体验：

For organizations not quite ready to migrate to the cloud, they can deploy MongoDB on premises behind their own firewalls and manage their databases using advanced operational tooling.
对于尚未准备好迁移到云的组织，他们可以在自己的防火墙后面的场所部署MongoDB，并使用高级操作工具管理数据库。
For those that are ready to migrate to the cloud, MongoDB Atlas delivers the database as a fully managed service across more than 50 regions on AWS, Azure, and Google Cloud Platform. Built-in automation of proven practices helps reduce the number of time-consuming database administration tasks that teams are responsible for, and prevents organizations from migrating their operational overhead into the cloud as well. Of course, if you want to self-manage MongoDB in the cloud, you can do so.
对于已准备好迁移到云中的数据库，MongoDB Atlas在AWS，Azure和Google Cloud Platform的50多个区域中以完全托管的服务形式提供数据库。可靠的实践内置自动化功能有助于减少团队负责的耗时的数据库管理任务的数量，并防止组织将其运营开销迁移到云中。当然，如果您想在云中自我管理MongoDB，则可以这样做。
And finally, for teams that are well-versed in cloud services, MongoDB Atlas delivers a consistent experience across AWS, Azure, and Google, allowing the development of multi-cloud strategies on a single, unified data platform.
最后，对于精通云服务的团队，MongoDB Atlas在AWS，Azure和Google上提供一致的体验，从而允许在单个统一数据平台上开发多云策略。

Data gravity will no doubt have a tremendous impact on how your IT resources coalesce and evolve in the cloud. But that doesn’t mean you have to get trapped. Choose a database that delivers a consistent experience across different environments and avoid going past the point of no return.

毫无疑问，数据引力将对您的IT资源在云中如何融合和发展产生巨大影响。但这并不意味着您必须被困住。选择一个可在不同环境中提供一致体验的数据库，并避免超过无回报的地步。

翻译自: https://www.sitepoint.com/cloud-data-strategies-preventing-data-black-holes-in-the-cloud/