下载全部视频和PPT,请关注公众号(bigdata_summit),并点击“视频下载”菜单
Billions of Messages a Day – Yelp’s Real-time Data Pipeline
by Justin Cunningham, Technical Lead, Software Engineering, Yelp
video, slide
Yelp moved quickly into building out a comprehensive service oriented architecture, and before long had over 100 data-owning production services. Distributing data across an organization creates a number of issues, particularly around the cost of joining disparate data sources, dramatically increasing the complexity of bulk data applications. Straightforward solutions like bulk data APIs and sharing data snapshots have significant drawbacks. Yelp’s Data Pipeline makes it easier for these services to communicate with each other, provides a framework for real-time data processing, and facilitates high-performance bulk data applications – making large SOAs easier to work with. The Data Pipeline provides a series of guarantees that makes it easy to create universal data producers and consumers that can be mashed up into interesting real-time data flows. We’ll show how a few simple services at Yelp lay the foundation that powers everything from search to our experimentation framework.
下面的内容来自机器翻译:
Yelp迅速建立了一个全面的面向服务的架构,不久之后拥有了100多个拥有数据的生产服务。在整个组织中分发数据会产生很多问题,特别是加入不同数据源的成本,这会大大增加批量数据应用程序的复杂性。直接的解决方案,如批量数据API和共享数据快照有明显的缺点。 Yelp的数据管道使得这些服务可以更容易地相互通信,为实时数据处理提供框架,并促进高性能批量数据应用程序 - 使大型SOA更易于使用。数据管道提供了一系列的保证,使创建通用的数据生产者和消费者变得容易,可以将其融入到有趣的实时数据流中。我们将展示Yelp的一些简单服务如何奠定了从搜索到实验框架的所有功能的基础。
Body Armor for Distributed System
by Michael Egorov, Co-founder and CTO, NuCypher
video, slide
We show a way to make Kafka end-to-end encrypted. It means that data is ever decrypted only at the side of producers and consumers of the data. The data is never decrypted broker-side. Importantly, all Kafka clients have their own encryption keys. There is no pre-shared encryption key. Our approach can be compared to TLS implemented for more than two parties connected together.
下面的内容来自机器翻译:
我们展示了一种使Kafka端到端加密的方法。这意味着数据只能在数据的生产者和消费者一方解密。数据永远不会解密经纪方。重要的是,所有的卡夫卡客户端都有自己的加密密钥。没有预共享加密密钥。我们的方法可以与两个以上连接在一起的双方实施的TLS进行比较。
DNS for Data: The Need for a Stream Registry
by Praveen Hirsave, Director Cloud Engineering, HomeAway
video, slide
As organizations increasingly adopt streaming platforms such as kafka, the need for visibility and discovery has become paramount. Increasingly, with the advent of self-service streaming and analytics, a need to increase on overall speed, not only on time-to-signal, but also on reducing times to production is becoming the difference between winners and losers. Beyond Kafka being at the core of successful streaming platforms, there is a need for a stream registry. Come to this session to find out how HomeAway is solving this with a “just right” approach to governance.
下面的内容来自机器翻译:
随着组织越来越多地采用卡夫卡等流媒体平台,对可视性和发现的需求已经变得至关重要。随着自助式流媒体和分析技术的出现,越来越需要提高总体速度,不仅要在信号发送时间上,而且要缩短产量时间成为赢家和输家之间的差异。除了Kafka是成功的流媒体平台的核心之外,还需要一个流注册表。来参加本次会议,了解HomeAway如何以“恰到好处”的方式解决这个问题。
Efficient Schemas in Motion with Kafka and Schema Registry
by Pat Patterson, Community Champion, StreamSets Inc.
video, slide
Apache Avro allows data to be self-describing, but carries an overhead when used with message queues such as Apache Kafka. Confluent’s open source Schema Registry integrates with Kafka to allow Avro schemas to be passed ‘by reference’, minimizing overhead, and can be used with any application that uses Avro. Learn about Schema Registry, using it with Kafka, and leveraging it in your application.
下面的内容来自机器翻译:
Apache Avro允许数据是自描述的,但是在与诸如Apache Kafka之类的消息队列一起使用时会带来开销。 Confluent的开源Schema Registry与Kafka集成,允许Avro模式通过“引用”传递,最大限度地减少开销,并且可以与任何使用Avro的应用程序一起使用。了解Schema Registry,将其与Kafka一起使用,并在您的应用程序中使用它。
From Scaling Nightmare to Stream Dream : Real-time Stream Processing at Scale
by Amy Boyle, Software Engineer, New Relic
video, slide
On the events pipeline team at New Relic, Kafka is the thread that stitches our micro-service architecture together. We receive billions of monitoring events an hour, which customers rely on us to alert on in real-time. Facing a ten fold+ growth in the system, learn how we avoided a costly scaling nightmare by switching to a streaming system, based on Kafka. We follow a DevOps philosophy at New Relic. Thus, I have a personal stake in how well our systems perform. If evaluation deadlines are missed, I loose sleep and customers loose trust. Without necessarily setting out to from the start, we’ve gone all in, using Kafka as the backbone of an event-driven pipeline, as a datastore, and for streaming updates to the system. Hear about what worked for us, what challenges we faced, and how we continue to scale our applications.
下面的内容来自机器翻译:
在New Relic的事件管道团队中,Kafka是将我们的微服务架构缝合在一起的线索。我们每小时接收数十亿次监控事件,客户依靠我们实时提醒。面对系统十倍以上的增长,了解我们如何通过切换到基于Kafka的流媒体系统来避免昂贵的缩放梦魇。我们遵循New Relic的DevOps理念。因此,我们的系统表现如何,我有个人利益。如果错过了评估的最后期限,我会放松睡眠,客户也会失去信任。从一开始我们就不必一一列出,我们已经全部使用Kafka作为事件驱动管道的骨干,作为一个数据存储,以及将更新流式传输到系统。听听我们的工作,我们面临的挑战,以及我们如何继续扩大我们的应用程序。
How Blizzard Used Kafka to Save Our Pipeline (and Azeroth)
by Jeff Field, Systems Engineer, Blizzard
video, slide
When Blizzard started sending gameplay data to Hadoop in 2013, we went through several iterations before settling on Flumes in many data centers around the world reading from RabbitMQ and writing to central flumes in our Los Angeles datacenter. While this worked at first, by 2015 we were hitting problems scaling to the number of events required. This is how we used Kafka to save our pipeline.
下面的内容来自机器翻译:
当暴雪于2013年开始向Hadoop发送游戏数据时,我们经历了几次迭代,然后在世界各地的许多数据中心从FlubitMQ读取Flumes,然后写入洛杉矶数据中心的中心水槽。虽然这个工作起初,到2015年,我们遇到的问题扩大到所需的事件数量。这就是我们如何使用卡夫卡拯救我们的管道。
Kafka Connect Best Practices – Advice from the Field
by Randall Hauch, Engineer, Confluent
video, slide
This talk will review the Kafka Connect Framework and discuss building data pipelines using the library of available Connectors. We’ll deploy several data integration pipelines and demonstrate :
best practices for configuring, managing, and tuning the connectors
tools to monitor data flow through the pipeline
using Kafka Streams applications to transform or enhance the data in flight.
下面的内容来自机器翻译:
本次演讲将回顾Kafka Connect Framework并讨论使用可用连接器库构建数据管道。我们将部署多个数据集成管道并进行演示:
配置,管理和调整连接器的最佳实践
工具来监视通过管道的数据流
使用Kafka Streams应用程序来转换或增强飞行中的数据。
One Data Center is Not Enough: Scaling Apache Kafka Across Multiple Data Centers
by Gwen Shapira, Product Manager, Confluent
video, slide
You have made the transition from single machines and one-off solutions to distributed infrastructure in your data center powered by Apache Kafka. But what if one data center is not enough? In this session, we review resilient data pipelines with Apache Kafka that span multiple data centers. We provide an overview of best practices and common patterns including key areas such as architecture and data replication as well as disaster scenarios and failure handling.
下面的内容来自机器翻译:
您已经从单台机器和一次性解决方案转换到由Apache Kafka提供支持的数据中心内的分布式基础架构。但是如果一个数据中心不够用呢?在本次会议中,我们将使用跨越多个数据中心的Apache Kafka来审查弹性数据管道。我们概述最佳实践和常见模式,包括架构和数据复制等关键领域以及灾难场景和故障处理。