数据中台是下一代大数据
We have all heard the term ‘Citizen Data Scientist’ made mainstream by Gartner since 2016. If you haven’t, then you may be in one of the ever-shrinking circle of industries that has yet to be upheavaled by digital disruption.
自2016年以来,我们都听说过“公民数据科学家”一词已成为Gartner的主流。如果您还没有意识到,那么您可能正处于一个日益萎缩的行业圈子中,而这一行业尚未受到数字化颠覆的困扰。
A Citizen data scientist is a person who can use or generate models that utilise advanced diagnostic analytics or predictive and prescriptive capabilities. They often use automation tools such as Alteryx or Power BI, and can typically generate slightly more complex insights than your average data analyst.
公民数据科学家是可以使用或生成利用高级诊断分析或预测和说明功能的模型的人。 他们通常使用诸如Alteryx或Power BI之类的自动化工具,并且通常可以比一般的数据分析师生成更复杂的见解。
Citizen data scientists are an organisation’s way of digitizing by upskilling their existing workforce to tap into the data the can usually be found lying around the organisation unused. Ernst & Young (EY) does this by providing employees access to a myriad of technology courses for free on Udemy.
公民数据科学家是组织的一种数字化方式,可以提高其现有员工的技能,以利用通常可以在未使用的组织周围找到的数据。 安永(EY)通过为员工提供有关Udemy的大量免费技术课程的访问权 。
But is this enough? Can Citizen data scientists unlock the potential treasure trove of insights within organisations, having to be able to juggle between their day-to-day operations and practices, while trying to be more data-centered?
但是够了吗? 公民数据科学家能否在组织以数据为中心的同时能够在日常运营和实践之间进行调整的情况下,释放组织内部潜在的见解宝库?
输入企业数据科学家。 (Enter the Enterprise Data Scientist.)
什么是企业数据科学家? (What is an Enterprise Data Scientist?)
The main difference between the Citizen data scientist and the Enterprise data scientist, is the focus and scope of work. Typically coming from an organizational or industrial background, the Enterprise data scientist has completely pivoted into the role of a data scientist. Their once industry or client-facing role since has since ceased, allowing them to focus primarily on data science and data engineering research and tasks.
公民数据科学家和企业数据科学家之间的主要区别是工作的重点和范围。 企业数据科学家通常来自组织或行业背景,因此已经完全扮演了数据科学家的角色。 从那时起,他们曾经在行业或面向客户的角色就停止了,从而使他们可以将主要精力放在数据科学,数据工程研究和任务上。
The Enterprise Data Scientist is a full-fledged data scientist, but coming from an organisational or industrial background with a different lens.
企业数据科学家是一位成熟的数据科学家,但来自组织或行业背景,但观点不同。
Let’s face it, the work of a data scientist is not easy. A large amount of time is spent in research and understanding the nature of the data, then testing and developing initial models, then refining and re-refining further. That is not all, the methodology must be carefully implemented, processes documented and findings recorded.
面对现实,数据科学家的工作并不容易。 要花费大量时间来研究和理解数据的本质,然后测试和开发初始模型,然后进一步细化和重新细化。 这还不是全部,必须认真执行方法,记录过程并记录发现。
This is done over and over again.
这是一遍又一遍的。
After all, the backbone of science is built upon meticulous documentation and the ability to replicate results. We aren’t interested in one-hit wonders.
毕竟,科学的基础是建立在精心的文档编制和复制结果的能力之上。 我们对一击奇观不感兴趣。
This isn’t a luxury that Citizen data scientists have. Their bottom-line is still a delivery of a product or service, with technology merely as an enabler. There will be short-cuts and half-baked efforts, but it is still definitely better than doing things traditionally, without any technological enablement.
这不是公民数据科学家所拥有的奢侈品。 他们的底线仍然是产品或服务的交付,而技术只是作为推动者。 会有捷径和半途而废的努力,但它绝对比没有任何技术支持的传统方式要好。
Enterprise Data Scientists however fill-in this gap by positioning themselves internally, for example in an organization’s R&D or Innovation department. There, they can work with Citizen data scientists on developing their initial ideas or prototypes into full-fledged applications or even software.
但是,企业数据科学家通过内部定位(例如,在组织的研发或创新部门中)来填补这一空白。 在这里,他们可以与Citizen数据科学家合作,将其最初的想法或原型开发成成熟的应用程序甚至软件。
I write this as an Enterprise Data Scientist myself. Previously coming from a client-facing role, I have since pivoted internally to focus on researching and developing data software and platforms for internal and commercial usage.
我本人是作为企业数据科学家写的。 之前,我曾担任过面向客户的职位,之后我一直在内部进行工作,重点研究和开发用于内部和商业用途的数据软件和平台。
成长中的联盟 (Growing Alliances)
The growing data needs of organizations and businesses have not gone on unnoticed. Some organizations have started taking strategic steps to further accelerate the digitization of work. Just recently both EY and IBM announced a global multi-year alliance that will undoubtedly benefit the clientele of both consulting practices.
组织和企业不断增长的数据需求并没有被忽视。 一些组织已经开始采取战略步骤,以进一步加速工作的数字化。 就在最近,安永和IBM都宣布了一项全球多年联盟 ,这无疑将使这两种咨询业务的客户受益。
Aside from the commercial value-add, collaborations of this nature present rare opportunities for research and development, that would arguably drive a greater proportion of the value in the long run.
除了商业增值之外,这种性质的合作为研发提供了难得的机会,从长远来看,可以说会带来更大的价值份额。
EY is a multinational conglomerate with a business presence that spans all industries across the globe; while IBM Watson Group specialises in commercial and enterprise artificial intelligence and natural language.
安永是一家跨国企业集团,业务遍及全球所有行业。 而IBM Watson Group则专门研究商业和企业人工智能以及自然语言。
Watson has its own suite of services that readily enables day-to-day data science activities. Notably IBM Cloud Pak® for Data with a fully-integrated data platform that allows storage of assets, management of notebooks and provisioning of AI services. Most importantly, this provides an environment where Enterprise data scientists can directly collaborate with IBM’s data experts on building models, applications and products.
沃森拥有自己的服务套件,可随时启动日常数据科学活动。 值得注意的是, IBM CloudPak®for Data具有完全集成的数据平台,该平台允许存储资产,管理笔记本和提供AI服务。 最重要的是,这提供了一个环境,企业数据科学家可以在此环境中与IBM的数据专家直接合作构建模型,应用程序和产品。
W
w ^
角色与职能 (Role and Functions)
连接点 (Point of Connection)
For the past couple years, my role as an Enterprise data scientist for EY, has had me working closely with IBM. Two organizations with two traditionally different business models, you can imagine that there would be a significant gap in the ways of working.
在过去的几年中,我作为EY的企业数据科学家,使我与IBM紧密合作。 两个拥有两种传统上不同的业务模型的组织,您可以想象在工作方式上会有很大的差距。
However, I am here to share that the experience has been somewhat seamless, due to the way the roles for each team had been set up for the joint delivery of technology products. The illustration below is our typical data science team structure during the development setting of such products.
但是,在这里我要分享的是,由于每个团队的角色是为联合交付技术产品而设置的,因此经验是无缝的。 下图是在此类产品开发过程中我们典型的数据科学团队结构。
Illustration of the various roles in a collaborative technology development setting 协作技术开发环境中各种角色的插图The industry-native Enterprise data scientists (being the in-house data experts) should maintain the first point of contact with their external data science counterparts from the tech-native firm. They can be further assisted by data analysts or citizen data scientists who can perform simple data processes and analysis. Unlike the fully-dedicated Enterprise data scientists, Citizen data scientists usually serve a rotational role and are brought in to contribute specific insights gained from their specific domain.
行业本地企业数据科学家(作为内部数据专家)应与技术本地公司的外部数据科学同行保持第一联系。 可以执行简单数据处理和分析的数据分析师或公民数据科学家可以进一步协助他们。 与专职的企业数据科学家不同,公民数据科学家通常扮演轮换角色,并被邀请为从其特定领域获得的特定见解做出贡献。
The enterprise data scientist is responsible for designing the overall data strategy for the organization given their understanding of domain knowledge and fluency with data architecture. This is communicated to the external tech-native data scientists, architects and engineers who can handle the pure technical aspects of the development.
考虑到他们对领域知识的理解以及对数据架构的流利性,企业数据科学家负责为组织设计总体数据策略。 这将传达给可以处理开发的纯技术方面的外部技术数据科学家,建筑师和工程师。
结合工业和技术专长 (Marrying Industrial and Technical expertise)
Because the Enterprise data scientists are industry/domain focused, they often have the most complete understanding of how data in their domain is expressed and the nuances of using them in modelling.
由于企业数据科学家专注于行业/领域,因此他们通常对领域数据如何表达以及在建模中使用它们的细微差别有最全面的了解。
Being a practitioner with 10 years of experience in say the Healthcare domain, does not mean you are an expert in healthcare data. The opposite is often true, where experienced domain practitioners can harbor biases based on their experience and are unable to view the data objectively.
作为拥有10年医疗保健领域经验的从业人员,并不意味着您是医疗保健数据方面的专家。 情况恰恰相反,有经验的领域从业者可能会基于自己的经验而存在偏见,而无法客观地查看数据。
On the other end of the spectrum, non-domain-specific data scientists are equipped with statistical expertise and know-how in dealing with data. However an extensive amount of time is required to tune their expertise towards a new and complex domain that they have no experience in.
另一方面,非特定领域的数据科学家具备处理数据的统计专业知识和专有技术。 但是,需要大量时间才能将他们的专业知识调整到他们没有经验的新的复杂领域。
By having internally-housed Enterprise data scientists, organizations save precious time and resources in bridging this wide gap.
通过内部拥有企业数据科学家,组织可以弥合这一巨大差距,从而节省宝贵的时间和资源。
解释级别 (Levels of Explainability)
In the end, the successful collaboration between industry-native organisations and tech-native organisations hinges on the level of communication achieved.
最后,行业组织和技术组织之间的成功合作取决于所达到的沟通水平。
Any organization that has experienced rounds of digital restructuring understands that technological literacy is a real issue.
任何经历过数轮数字重组的组织都知道,技术素养是一个真正的问题。
Unless you work in technology, it is highly unlikely that you will be able to understand data processes. Likewise tech-natives have a hard time understanding domains that require decades of experience to master.
除非您从事技术工作,否则您不太可能理解数据过程。 同样,技术人员很难理解需要数十年经验的领域。
Back to the function of bridging the gap, Enterprise data scientists serve as communicators, or as an additional layer of explainability. For the benefit of both industry-natives and tech-natives.
回到弥合鸿沟的功能,企业数据科学家充当沟通者,或作为可解释性的附加层。 为了行业本地人和技术本地人的利益。
From an industry perspective, they can capture and explain data in a way that makes sense to the business, thus driving easier adoption. From a technology perspective, they understand the architectural and engineering limitations, and can keep painfully unreasonable requests to a minimum.
从行业角度来看,他们可以以对业务有意义的方式捕获和解释数据,从而推动更容易的采用。 从技术角度来看,他们了解架构和工程上的局限性,并且可以将痛苦的不合理请求降至最低。
To seal the case on the need for Enterprise data scientists, consider this example:
为了确定需要企业数据科学家的理由,请考虑以下示例:
In the world of accounting, we have external auditors and internal accountants. Organisations would never hire a person that ‘knows something about accounting’ for the internal role; they will hire an actual accountant.
在会计领域,我们有外部审计师和内部会计师。 组织永远都不会聘请一个“ 对会计有所了解 ”的人担任内部角色。 他们将聘请实际会计师。
Likewise in a future where technology and data takes center stage, organisations need to have in-house data science capability.
同样,在以技术和数据为中心的未来中,组织也需要具有内部数据科学能力。
Organisations need to have their own Enterprise Data Scientists.
组织需要有自己的企业数据科学家 。
Thanks for Reading!
谢谢阅读!
Share if you found this article useful! And let me know if you would like to see more stories from any of my experiments above.
如果您觉得本文有用,请分享! 并告诉我您是否想从上述任何实验中看到更多故事。
翻译自: https://towardsdatascience.com/next-gen-enterprise-data-scientists-d9ef729d80b9
数据中台是下一代大数据