ai人工智能的数据服务_AI如何帮助提高企业数据质量

ai人工智能的数据服务

Hardly anyone relying on data can say their data is perfect. There is always that difference between the dataset you have and the dataset you wish you had. This difference is what Data Quality is all about.

几乎没有人依赖数据可以说他们的数据是完美的。 您拥有的数据集和希望拥有的数据集之间总是存在差异。 这种差异就是数据质量的全部意义所在。

Data quality problem exists everywhere where data is used: in tech and non-tech businesses, in the public sector, in engineering, in science. Each of these domains has its data specifics and its own set of data quality criteria.

数据质量问题在使用数据的任何地方都存在:技术和非技术业务,公共部门,工程,科学。 这些域中的每一个都有其数据详细信息和自己的一组数据质量标准。

Enterprise data quality deals with data quality in ERP data — data describing the flow of business processes in organizations. These include financial transactions, sales transactions, contracts, inventories, as well as lists of customers, vendors, etc.

企业数据质量处理ERP数据中的数据质量-描述组织中业务流程的数据。 其中包括金融交易,销售交易,合同,库存以及客户,供应商列表等。

Any large organizations and most medium businesses use highly integrated Enterprise Resource Planning systems to run their business processes. ERP data is a central component of such applications; it drives and controls the automatic flow of business processes in them. Every tick of this flow sums up to the company’s financials. That is why any business would want to make sure their ERP data is good enough to support the consistent and correct circulation of their business processes.

任何大型组织和大多数中型企业都使用高度集成的企业资源计划系统来运行其业务流程。 ERP数据是此类应用程序的核心组成部分。 它驱动并控制其中的业务流程的自动流程。 流量的每一滴滴都等于公司的财务状况。 这就是为什么任何企业都希望确保其ERP数据足够好以支持其业务流程的一致和正确循环。

Companies understand this so much so that they spend up 50% of the time of their data analysts for finding and correcting data issues.

公司对此非常了解,因此他们花费了数据分析师50%的时间来查找和纠正数据问题 。

All modern tools and processes for maintaining Enterprise Data Quality are effectively rule-based, which means, in essence, they work by evaluating data against some set of pre-defined rules or conditions.

所有用于维护企业数据质量的现代工具和过程均有效地基于规则,这意味着,实质上,它们是通过根据一组预定义的规则或条件评估数据来工作的。

This approach was dominating business data landscapes since mainframe times, and its central principle hasn’t changed since. There is a good reason for that: it is robust and predictable.

自大型机时代以来,这种方法就一直主导着业务数据领域,自那以后它的中心原理就没有改变。 这样做有一个很好的理由:它是健壮且可预测的。

The world, however, has changed dramatically since then — corporate databases have grown thousands of times both in volume and complexity. Today, this old rule-based principle has started to show its disadvantages:

但是,此后,世界发生了翻天覆地的变化-企业数据库的数量和复杂性都增长了数千倍。 如今,这种基于规则的古老原则已开始显示其缺点

  1. As data becomes more diverse, the number of combinations and interactions in data grows exponentially, which means the number of rules required to maintain the same level of Data Quality grows exponentially too. For businesses, this means the costs and efforts they have to spend on data quality grow fast also. It explains why companies have to pay so much to maintain good data quality today.

    随着数据变得越来越多样化,数据中组合和交互的数量呈指数增长 ,这意味着维持相同水平的数据质量所需的规则数量也呈指数增长。 对于企业来说,这意味着他们必须花费在数据质量上的成本和精力也在快速增长。 它解释了为什么公司今天必须付出如此高的费用才能保持良好的数据质量。

  2. Any rule-based system has an intrinsic limitation — it can only deal with problems known to people maintaining the system. But because people learn on mistakes, this also means that every issue they know has shown itself before as a data incident, and most likely caused losses. This intrinsic dependency renders all rule-based processes reactive. It explains why in reality all Data Quality assurance systems so closely related to incident management.

    任何基于规则的系统都有其固有的局限性-它只能处理维护系统的人员所知道的问题。 但是,因为人们从错误中学到的东西,所以这也意味着他们所知道的每个问题以前都是作为数据事件显示出来的,并且很可能造成了损失。 这种内在的依赖性使所有基于规则的过程成为React性的 。 它解释了为什么实际上所有数据质量保证系统都与事件管理如此紧密地联系在一起。

  3. All rule-based systems are rigid. It adds a burden of updating the rule sets to keep up with an ever-evolving business. It also includes updating documentation, changing and testing new rules, cleaning up old and no longer relevant ones, and so on. For large and older businesses that have a long history of changes, this becomes very tricky.

    所有基于规则的系统都是严格的 。 它增加了更新规则集以适应不断发展的业务的负担。 它还包括更新文档,更改和测试新规则,清理旧的和不再相关的规则,等等。 对于具有悠久变化历史的大型和老式企业而言,这变得非常棘手。

In the past ten years, the pace of changes has only increased — more and more businesses migrating to modern cloud infrastructure and getting access to more powerful databases. The data an average company is using has exploded in size and complexity.

在过去的十年中,变化的步伐只是增加了-越来越多的企业迁移到现代云基础架构,并可以访问更强大的数据库。 普通公司使用的数据的规模和复杂性呈爆炸式增长。

As a result, the Data Quality function in any large organization is experiencing enormous pressure which will only get worse with time.

结果,任何大型组织中的数据质量部门都承受着巨大的压力,压力只会随着时间的推移而变得越来越糟。

Enterprise data quality is a big business dominated by such behemoths like Informatica, IBM, SAP, Oracle and others. To help businesses, they are offering all sorts of apps to simplify and accelerate rule management. But they do not question the foundation principle and therefore do not address the fundamental disadvantages of the rule-based model that has been in use for more than 60 years.

企业数据质量是由诸如Informatica,IBM,SAP,Oracle等庞然大物主导的大业务。 为了帮助企业,他们提供了各种应用程序来简化和加速规则管理。 但是他们不质疑基本原理,因此也没有解决已经使用了60多年的基于规则的模型的基本缺点。

Photo by Julius Silver from Pexels Pexels的 Julius Silver 摄影

Unlike others, we do question this model! In the past three years, we did extensive research in finding new ways of doing data quality in typical business data. And we found an answer in AI as you might have already guessed from the title.

与其他人不同, 我们确实对此模型提出了质疑! 在过去的三年中,我们进行了广泛的研究,以发现在典型业务数据中实现数据质量的新方法。 正如您可能已经从标题中猜到的那样,我们在AI中找到了答案。

We found that non-rule based approach to Enterprise Data Quality is possible and that this approach has many new benefits, which look so fantastic, they will make any data quality professional sceptical:

我们发现,基于非规则的企业数据质量方法是可行的,并且该方法具有许多新优点,这些优点看起来太奇妙了,它们会使专业人士对任何数据质量持怀疑态度:

  1. No need to maintain rules, and therefore, there is no scaling problem as your business processes become more complex, and as your data gets more diverse.

    无需维护规则 ,因此,随着您的业务流程变得越来越复杂以及数据变得越来越多样化,就不会出现扩展问题。

  2. An AI algorithm can discover not-yet-known issues, the issues that are already in data but that haven’t shown themselves as incident yet.

    AI算法可以发现尚未发现的问题 ,这些问题已经存在于数据中,但尚未显示为事件。

  3. An AI algorithm can be self-learning, which means you don’t need to program it to understand your data or your business process. You don’t need to have up-to-date documentation describing your as-is state to start using it. All you need to do is feed your actual data into it.

    人工智能算法可以自学 ,这意味着你不需要它进行编程,以了解数据或业务流程。 您无需拥有描述现状的最新文档即可开始使用它。 您需要做的就是将实际数据输入其中。

  4. The algorithm is also self-adjusting which means it will automatically keep up with changes in business processes.

    该算法也是自我调整的 ,这意味着它将自动跟上业务流程的变化。

  5. Because of the above two properties, it can work in a deploy-and-forget mode.

    由于上述两个属性,它可以在部署并忘记模式下工作。

  6. It can not only find problems but also suggest a solution for every particular record found wrong.

    它不仅可以发现问题,而且可以为发现错误的每个特定记录提供解决方案

  7. It can potentially replace most rules in any existing Data Quality Assurance system.

    它可以潜在地替换任何现有数据质量保证系统中的大多数规则。
  8. And finally, it can form a closed-loop fully automated Data Quality assurance system where data issues are corrected before you know it. All you need to do is just watch reports showing how many data quality incidents the algorithm has prevented.

    最后,它可以形成一个闭环的全自动数据质量保证系统 ,在您不知不觉中就可以纠正数据问题。 您需要做的只是看报告,该报告显示该算法阻止了多少数据质量事件。

Looks too good to be true, isn’t it? Of course, it has downsides also.

看起来好得令人难以置信,不是吗? 当然,它也有缺点。

Like any other machine learning algorithm, it will not replace methods that work well without the need if AI, such as validating addresses, phone prefixes, email addresses. It will not work well when your data is small or when every record in your dataset is unique and does not follow any pattern.

像任何其他机器学习算法一样,它不会替代无需AI就能很好运行的方法,例如验证地址,电话前缀,电子邮件地址。 当您的数据很小或数据集中的每个记录都是唯一且未遵循任何模式时,它将无法正常工作。

Horses have been the main mean of transport for thousands of years until Henry Ford made cars affordable. Now horses are more of a tradition that creates warm feelings in us. 几千年来,马一直是主要的交通工具,直到亨利·福特(Henry Ford)买得起汽车。 现在,马更像是一种在我们身上创造温暖感觉的传统。

But the critical unfixable problem of this approach is precisely what makes it so fantastic: it is non-rule based. Because business applications, in general, have been using business rules for years, business rule mindset is deeply rooted in business culture everywhere. Introducing AI algorithms questioning this core principle will not be easy.

但是,这种方法无法解决的关键问题恰恰是使它如此出色的原因:它不是基于规则的。 通常,由于业务应用程序已经使用业务规则多年,因此业务规则的思维方式已深深植根于各地的业务文化中 。 引入质疑这一核心原理的AI算法并非易事。

But complicated doesn’t mean impossible! With such an impressive list of benefits and gradual step-by-step implementation plan, AI methods such as this will eventually shift the business culture from scepticism to cautious enthusiasm. Just like it happened to Big Data platforms and Cloud infrastructure in the past ten years.

但是复杂并不意味着不可能! 有了如此令人印象深刻的好处清单和逐步的逐步实施计划,诸如此类的AI方法最终将把商业文化从怀疑主义转变为谨慎的热情。 就像过去十年中的大数据平台和云基础架构一样。

You can find us on LinkedIn, Twitter, Facebook or at dataright.ai

您可以在 LinkedIn Twitter Facebook dataright.ai 上找到我们。

翻译自: https://towardsdatascience.com/how-can-ai-help-to-make-enterprise-data-quality-smarter-9a16fcd4df64

ai人工智能的数据服务

你可能感兴趣的:(人工智能,python,大数据,java,数据分析)