阿里ai人工智能平台
Artificial intelligence (AI) is widely used in today’s business such as for data analytics, natural language processing, or process automation. The inclusion of artificial intelligence bits and pieces into digital business models creates value by improving back-office efficiency and increasing customer experience. The emergence of artificial intelligence is based on decades of research for solving difficult computer science tasks and is now rapidly transforming business model innovation. Companies that are not considering artificial intelligence will be vulnerable to those companies that are equipped with artificial intelligence technology. While companies like Google, Amazon, and Tesla have already innovated their business models with artificial intelligence, medium and small caps have limited budgets for putting much effort into setting up such capabilities. One high-effort task in creating artificial intelligence services is the pre-processing of data and the training of machine learning models. To meet the speed of the market it most often is not enough to set up internal capabilities to perform the pre-processing. Google for example makes use of a very pragmatic solution — the task of data labeling and validation for their machine learning models are outsourced to all those who are Google users. Have you ever thought about the aim of Google Captcha? Sure, it is used to pretend bots from intruding applications but besides this, daily, millions of users are part of the Google analytics pre-processing team which are validating machine learning algorithms — for free. If you are not one of the Googles out there you might be interested in how you can meet the rising artificial intelligence needs.
人工智能(AI)在当今的业务中被广泛使用,例如用于数据分析,自然语言处理或流程自动化。 将人工智能点点滴滴纳入数字业务模型可通过提高后台效率和增加客户体验来创造价值。 人工智能的兴起基于数十年来为解决困难的计算机科学任务而进行的研究,并且正在Swift改变商业模式的创新。 不考虑人工智能的公司将容易受到那些配备了人工智能技术的公司的攻击。 虽然像Google,Amazon和Tesla这样的公司已经通过人工智能创新了他们的商业模式,但是中小型企业的预算有限,他们在建立此类功能上投入了大量精力。 创建人工智能服务的一项艰巨任务是数据的预处理和机器学习模型的训练。 为了满足市场的速度,大多数情况下不足以设置内部功能来执行预处理。 以Google为例,它使用了非常实用的解决方案-将其机器学习模型的数据标记和验证任务外包给所有Google用户。 您是否考虑过Google验证码的目标? 当然,它可以用来防止机器人入侵应用程序,但除此之外,每天有数百万用户是Google Analytics(分析)预处理团队的成员,这些团队正在免费验证机器学习算法。 如果您不是那里的Google之一,您可能会对如何满足不断增长的人工智能需求感兴趣。
机器学习的数据标签 (Data Labeling for Machine Learning)
Machine learning involves using algorithms to learn how to solve a specific task by relying on patterns from sample data whether it is from training or practice. As there are several approaches on how to perform machine learning, supervised learning approaches heavily rely on labeled data to create machine learning models. The following examples highlight use cases with the need for labeling huge amounts of data:
机器学习涉及使用算法来学习如何通过依靠样本数据中的模式(无论是来自培训还是来自实践)来解决特定任务。 由于存在几种执行机器学习的方法,因此监督学习方法在很大程度上依赖于标记数据来创建机器学习模型。 以下示例突出显示了需要标记大量数据的用例:
- Autonomous driving with the need for identifying pedestrians, vehicles, and traffic lights 自动驾驶需要识别行人,车辆和交通信号灯
- Service desks requests with the need for urgency classification before involving humans 服务台要求在涉及人员之前进行紧急分类
- Quality inspection of production products for waste determination 对生产产品进行质量检查以确定废物
- Personal assistance systems for understanding conversation contexts 个人帮助系统,用于理解对话环境
Data scientists spend about 80% of their efforts on pre-processing data and labeling data for training scenarios. Only 20% of the effort is put into building machine learning models. this is the reason why crowdsourcing platforms that take care of the repetitive tasks for labeling data arose. Initially labeling data in-house requires hiring employees and gives the advantage to have a transparent labeling process by knowing the people who perform the labeling. Rather than doing in-house labeling, crowdsourcing platforms allow companies to distribute thousands of tasks and easily maximize the return on investment by having operational expenditure based on the needed demand.
数据科学家将大约80%的精力用于预处理数据和为训练场景添加标签数据。 只有20%的精力用于构建机器学习模型。 这就是兴起了负责重复数据标注任务的众包平台的原因。 最初在内部给数据加标签需要雇用员工,并且通过了解执行标签的人员而具有透明的标签过程的优势。 众包平台无需内部标记,而是使公司可以分发数千个任务,并通过根据所需需求分配运营支出,轻松实现投资回报率的最大化。
众包模式 (Crowdsourcing Pattern)
The crowdsourcing pattern targets the solutions of human tasks by adopting an internet crowd which, on the one hand, is a scaling workforce and, on the other hand, is more flexible regarding required qualifications. In exchange for its services a contributor receives a small reward per task or has the change to win a one-time recognition. According to Gassmann et al. (Link), the Crowdsourcing pattern is often used to foster innovative technology and business ideas. One example can be given by reflecting Procter & Gamble’s product development. Procter & Gamble collaborates with external crowds to explore innovative solutions for product packages, designs, and marketing. This example shows that external crowds can deeply be integrated into the internal product development process. Regarding an AI-service development, the crowd may support the labeling of data to put the focus of internal data scientists rather on the development of the value-adding machine learning models than repetitive data pre-processing.
众包模式通过采用互联网人群来针对人类任务的解决方案,互联网人群一方面是规模不断扩大的劳动力,另一方面,对于所需的资格要求也更加灵活。 作为对服务的交换,贡献者将为每个任务获得少量奖励,或者进行更改以赢得一次性认可。 据加斯曼等。 ( 链接 ),众包模式通常用于培养创新技术和商业想法。 可以通过反映宝洁的产品开发来举例说明。 宝洁与外部人群合作,探索针对产品包装,设计和营销的创新解决方案。 此示例表明,外部人群可以深入地集成到内部产品开发过程中。 关于AI服务的开发,人们可能会支持数据标记,以将内部数据科学家的注意力集中在增值机器学习模型的开发上,而不是重复的数据预处理。
众包提供商 (Crowdsourcing Provider)
Addressing the need to outsource simple tasks, crowdsourcing companies offer services to distribute the work to a virtual distributed workforce. The use cases and features they support vary from data processing, creative design tasks, translation requests to any use case in which you can train the crowd on yourself. Known crowdsourcing providers are Amazon Mechanical Turk, MicroWorkers, ClickWorker, MicroTask, and Scale. The following list shows critical factors that shall be considered when selecting a crowdsourcing provider:
为满足将简单任务外包的需要,众包公司提供了将工作分配给虚拟分布式劳动力的服务。 他们支持的用例和功能从数据处理,创意设计任务,翻译请求到您可以在人群中进行培训的任何用例,不一而足。 已知的众包提供商是Amazon Mechanical Turk,MicroWorkers,ClickWorker,MicroTask和Scale。 下表显示了选择众包提供商时应考虑的关键因素:
Maturity: The maturity of the crowdsourcing providers gives insights if the solution has the level of availability and robustness for a reliable service operation.
成熟度:众包提供商的成熟度可为解决方案提供可靠服务运营的可用性和健壮性提供见解。
Use Case: According to their main services, platforms have specified on different use cases and serve all its peculiarities.
用例:根据平台的主要服务,平台已针对不同的用例进行了说明,并提供了其所有功能。
Technology: The crowdsourcing platform shall offer a machine-readable interface to interact on an automated basis. Moreover, the technical foundation shall be scalable for an increasing number of requests.
技术:众包平台应提供机器可读的界面,以自动进行交互。 此外,技术基础应可扩展,以应对越来越多的请求。
Quality: With a recurring quality validation of workers and a pre-defined set of requirements a platform ensures the quality of results the crowd is providing.
质量:通过对工人的反复质量验证和一组预定义的要求,平台可以确保人群提供的结果的质量。
Security: The confidentiality and secure storage of data are of high interest in developing AI-services and features as data is their core.
安全性:数据的机密性和安全存储对于开发AI服务和功能非常重要,因为数据是它们的核心。
Cost: Outsourcing tasks to external crowdsourcing platforms must be profitable or must meet the price that you want to pay for the level of functionality these platforms are providing.
成本:外包到外部众包平台的任务必须是有利可图的,或者必须满足您要为这些平台提供的功能级别支付的价格。
The above-mentioned list of criteria shall help to identify the crowdsourcing platform of choice for performing tasks. It is hard to generally decide which of the mentioned platforms is the best as this needs to be considered case by case.
上面提到的标准列表将有助于识别执行任务选择的众包平台。 通常很难确定所提到的平台中哪个最佳,因为需要逐案考虑。
AI标签众包平台模式 (AI-Labeling Crowdsourcing Platforms Pattern)
The AI-Labeling Crowdsourcing Platform pattern solves the emerging business challenge of meeting the needs and efforts for developing AI-services. Especially for companies without a workforce that is dedicated to data labeling and validation services such platforms are the key to stay competitive. The main characteristics of the AI-Labeling Crowdsourcing Platform pattern from the perspective of a developing company are the following:
AI标签众包平台模式解决了新兴的业务挑战,即满足开发AI服务的需求和努力。 特别是对于没有专门从事数据标记和验证服务的员工的公司而言,此类平台是保持竞争力的关键。 从开发公司的角度来看,AI标签众包平台模式的主要特征如下:
Customer Relationship: AI-services improve customer relationships by offering highly personalized services and new services.
客户关系: AI服务通过提供高度个性化的服务和新服务来改善客户关系。
Key Activities: AI-services go along with high efforts for labeling data to train AI-models and its recurring validation.
关键活动: AI服务与标记数据一起努力,以训练AI模型及其反复验证。
Key Partner: Outsourcing the services for data labeling and model validation, the crowdsourcing platform provider will be an important partner to ensure AI-service operation.
主要合作伙伴:将服务外包以进行数据标记和模型验证,众包平台提供商将成为确保AI服务运营的重要合作伙伴。
Key Resources: With the main effort being outsourced the key resource is the integration of crowdsourcing services into the AI-service based on an automated approach.
关键资源:在主要工作被外包的情况下,关键资源是基于自动化方法将众包服务集成到AI服务中。
Cost Structure: Crowdsourcing services are paid on a demand basis as micro-fees.
成本结构:众包服务是按需支付的小额费用。
The above characteristics of the pattern are visualized on the business model canvas below according to Osterwalder and Pigneur (Link).
根据Osterwalder和Pigneur( Link ),模式的上述特征在下面的业务模型画布上可视化。
AI数据的规模和标签 (Scale and the Labeling of AI-Data)
Amazon Mechanical Turk is most known as THE crowdsourcing platform provider which first entered the market of automating human intelligent tasks. It is part of Amazon’s Web Service offerings and is commonly used for text classification, transcriptions, surveys, and data labeling. Nevertheless, this article highlights the Scale platform as it is a simple and effective alternative to Amazon Mechanical Turk which strongly focuses on computer vision automation by providing managed labeling services.
Amazon Mechanical Turk是最著名的众包平台提供商,它首先进入了自动化人类智能任务的市场。 它是Amazon Web Service产品的一部分,通常用于文本分类,转录,调查和数据标签。 尽管如此,本文还是着重介绍Scale平台,因为它是Amazon Mechanical Turk的一种简单有效的替代方案,后者通过提供托管标签服务而专注于计算机视觉自动化。
The default use cases for Scale’s platform vary from retail, autonomous driving, robotics, drones, to augmented reality. The provided API interface allows companies to specify images, 3D point clouds, videos, texts, and whole documents to be labeled and therefore provides great flexibility on supported artifacts. After sending an artifact with the targeted data service e. g. extraction, classification, segmentation, transcript via an API call the request is reviewed for plausibility, processed from the crowd, validated according to statistical checks, and lastly returned. To provide insights into the pricing for outsourcing human tasks to Scale, the following equation shall be used exemplarily for the classification of a single image (Link):
Scale平台的默认用例从零售,自动驾驶,机器人技术,无人机到增强现实,不一而足。 提供的API接口允许公司指定要标记的图像,3D点云,视频,文本和整个文档,因此在支持的工件上提供了极大的灵活性。 在通过API调用通过目标数据服务(例如提取,分类,分段,成绩单)发送工件后,对请求进行合理性审查,从人群中进行处理,根据统计检查进行验证并最后返回。 为了提供对按比例外包人工任务的定价的见解,应示例性地使用以下方程式对单个图像进行分类( 链接 ):
$0.08 + $0.08 * number of requested classifications
$ 0.08 + $ 0.08 *请求的分类数量
This pricing model also provides one key difference to Amazon Mechanical Turk. While the pricing for Scale’s managed services is fixed, Amazon Mechanical Turk’s pricing is request-based which offers requesters to bid for higher prioritized processing. This may be an advantage or a disadvantage — at least for me Scale does provide a more transparent pricing approach.
这种定价模型还为Amazon Mechanical Turk提供了一个主要区别。 虽然Scale的托管服务的定价是固定的,但是Amazon Mechanical Turk的定价是基于请求的,这使请求者可以竞标更高优先级的处理。 这可能是优点还是缺点–至少对我而言,Scale确实提供了更透明的定价方法。
示例:丰田研究院的自动驾驶AI标签 (Example: The Toyota Research Institute’s AI-Labeling for Autonomous Driving)
Scale’s case study about the Toyota Research Institute provides an example of how the managed labeling services can be integrated into a business model for autonomous driving (Link). The mission of the Toyota Research Institute is to research autonomous driving to its full extend by taking all the responsibility for driving. The development of AI-services for autonomous driving goes along with large volumes of data. One the one hand, the machine learning team could not label the amount of data qualitatively and, on the other hand, the trade-off shall not negatively influence the quality of the labeling. With Scale, the Toyota Research Institute found a labeling provider that took care of the data annotation pipeline in a fully managed approach without the need to significantly increase the data engineering team. The Toyota Research Institute experienced great flexibility including labeling 2D and 3D data. With a fast-growing demand from the Toyota Research Institute, Scale even added custom simulation features and increased the labeling throughput by 10x.
Scale关于丰田汽车研究所的案例研究提供了一个示例,说明了如何将托管标签服务集成到自动驾驶业务模型中( Link )。 丰田研究院的任务是承担所有驾驶责任,全面研究自动驾驶。 用于自动驾驶的AI服务的开发伴随着大量数据的发展。 一方面,机器学习团队无法定性地标记数据量,另一方面,权衡也不会对标记的质量产生负面影响。 借助Scale,丰田研究院找到了一家标签提供商,该标签提供商以完全托管的方式处理数据注释管道,而无需大量增加数据工程团队。 丰田研究院拥有极大的灵活性,包括标注2D和3D数据。 随着丰田研究所的需求快速增长,Scale甚至增加了自定义仿真功能,并将标签吞吐量提高了10倍。
最佳实践 (Best Practices)
The following best practices help to gain the most benefits from outsourcing data labeling tasks to crowd workers:
以下最佳实践有助于从将数据标记任务外包给人群工人中获得最大收益:
Worker qualification: Ensure to define the qualification of the workers that the crowdsourcing platform is addressing. As an example, if the AI-service targets a local audience the workers shall be from the targeted audience.
工人资格:确保定义众包平台要解决的工人的资格。 例如,如果AI服务以本地受众为目标,则工作人员应来自目标受众。
Pre-processing: Define similarity thresholds of data to reduce the number of data that needs to be processed by the crowd as this saves time and money.
预处理:定义数据的相似性阈值以减少人群需要处理的数据数量,因为这样可以节省时间和金钱。
Shadow-crowd: For risk mitigation shadow the crowdsourcing platform with an alternative provider so that you do not rely on a single crowdsourcing platform.
影子人群:为了降低风险,可以使用其他提供商屏蔽众包平台,这样您就不必依赖单个众包平台了。
Own workers: Let your employees be part of the crowd as the knowledge of how the crowd process works and the internal quality assurance for labeling are beneficial.
自己的工人:让您的员工成为人群的一部分,因为了解人群过程的工作原理和内部标签质量保证是有益的。
相关模式 (Related Patterns)
The AI-Labeling Crowdsourcing Platform pattern is best combined with the Leverage Customer Data pattern (Link). This pattern provides new values by collecting customer data or providing business insights that can be sold to the customer or even third parties. AI-services which improve the customer relationship can further be used to create new functionality or services users might be interested in. As an example, Spotify’s recommendation algorithm can be highlighted as the recommendation does not only provides the end-user a suggestion on new music tracks it rather is a customer behavior insight that third parties are interested in.
AI-Labeling Crowdsourcing Platform模式最好与Leverage Customer Data模式( Link )结合使用。 这种模式通过收集客户数据或提供可以出售给客户甚至第三方的业务见解来提供新的价值。 改善客户关系的AI服务可以进一步用于创建新功能或用户可能感兴趣的服务。例如,可以突出显示Spotify的推荐算法,因为该推荐不仅向最终用户提供有关新音乐的建议跟踪它是第三方感兴趣的客户行为洞察力。
结论和最终想法 (Conclusion and Final Thoughts)
Artificial intelligence provides strategic advantages for back-office functionality and customer relationship. While companies like Google, Amazon, and Tesla already rely on artificial intelligence, other companies lack such technologies. Trying to catch up with AI-services, companies have the chance to leverage crowdsourcing platforms that efficiently take care of high-effort tasks such as data labeling and allow companies to focus on developing value-adding machine learning algorithms.
人工智能为后台功能和客户关系提供了战略优势。 尽管像Google,Amazon和Tesla这样的公司已经依靠人工智能,但其他公司却缺乏这种技术。 为了赶上AI服务,公司有机会利用众包平台有效地处理诸如数据标记之类的繁重任务,并使公司专注于开发增值机器学习算法。
Scale offers fully managed data labeling services to build AI-applications. With its API, Scale makes it easy to integrate the managed services into other applications and helps to boost such developments. Besides standard labeling of data such as for images or videos, Scale offers the simulation of 3D point clouds which makes it easy e. g. for the original equipment manufacturer to foster autonomous driving effortless.
Scale提供完全托管的数据标记服务,以构建AI应用程序。 借助其API,Scale可以轻松地将托管服务集成到其他应用程序中,并有助于促进此类开发。 除了对图像或视频等数据进行标准标记外,Scale还提供了3D点云的仿真功能,这使原始设备制造商可以轻松实现自动驾驶。
进一步阅读 (Further Readings)
Consider the following readings for more information on digital business models and how to adopt AI for your business.
考虑以下阅读内容,以获取有关数字业务模型以及如何为您的业务采用AI的更多信息。
Business Models of a Digital Era (Link)Do not miss this article about how digital transformation and digital natives are changing the business. With the adoption of emerging technologies and customer behaviors, companies show a variety of new business models patterns face the characteristics of a digital era.
数字时代的业务模型 ( 链接 )不要错过有关数字化转型和数字本地人如何改变业务的本文。 随着新兴技术和客户行为的采用,公司展示了面对数字时代特征的各种新商业模式。
Artificial Intelligence for Business: A Roadmap for Getting Started with AI (Link)Artificial Intelligence for Business helps to understand how organizations can adopt AI-technology by providing business gaps and opportunities that can be met easily. Furthermore, the book provides insights on how to find critical data sets, build prototypes for mitigating risk and best practices for production-ready AI-systems which might include organizational adaption.
商业人工智能:AI入门路线图 (链接)商业人工智能通过提供容易解决的业务差距和机遇,帮助企业了解组织如何采用AI技术。 此外,这本书还提供了有关如何找到关键数据集,构建用于减轻风险的原型的见解,以及可用于生产就绪的AI系统的最佳实践的最佳实践,其中可能包括组织适应性。
翻译自: https://medium.com/swlh/ai-labeling-crowdsourcing-platforms-630adbc79c40
阿里ai人工智能平台