[Spark基础]-- spark submmit大会(2017年6月5日 - 7日)

Spark Summit(2017年6月5日 - 7日,旧金山)议程发布

 

1、官方:http://spark.apache.org/news/spark-summit-june-2017-agenda-posted.html

2、议程:https://spark-summit.org/2017/schedule/

3、报名:https://prevalentdesignevents.com/sparksummit/ss17/?_ga=1.211902866.780052874.1433437196

很高兴的是有2位中国企业的工程师:

 

  • Ron Hu (Huawei Technologies)
  • Zhenhua Wang(Huawei Technologies)

 

 

4、内容如下

7:00 AM

Registration

 

TRAINING ROOM 1

TRAINING ROOM 2

TRAINING ROOM 3

TRAINING ROOM 4

TRAINING ROOM 5

TRAINING ROOM 6

TRAINING ROOM 7

9:00 AM

Training: Data Science With Apache Spark 2.x

(9:00 AM–12:00 PM)

Training: Exploring Wikipedia 2 With Apache Spark 2.x

(9:00 AM–12:00 PM)

Training: Apache Spark Intro for Machine Learning and Data Science

(9:00 AM–12:00 PM)

Training: Apache Spark Intro for Data Engineering

(9:00 AM–12:00 PM)

Training: Just Enough Scala for Spark

(9:00 AM–12:00 PM)

Training: Architecting a Data Platform

(9:00 AM–12:00 PM)

Training: Building Your First Big Data Application on AWS

(9:00 AM–12:00 PM)

12:00 PM

Lunch

 

TRAINING ROOM 1

TRAINING ROOM 2

TRAINING ROOM 3

TRAINING ROOM 4

TRAINING ROOM 5

TRAINING ROOM 6

TRAINING ROOM 7

1:00 PM

Training: Data Science With Apache Spark 2.x

(1:00 PM–5:00 PM)

Training: Exploring Wikipedia 2 With Apache Spark 2.x

(1:00 PM–5:00 PM)

Training: Apache Spark Intro for Machine Learning and Data Science

(1:00 PM–5:00 PM)

Training: Apache Spark Intro for Data Engineering

(1:00 PM–5:00 PM)

Training: Just Enough Scala for Spark

(1:00 PM–5:00 PM)

Training: Architecting a Data Platform

(1:00 PM–5:00 PM)

Training: Building Your First Big Data Application on AWS

(1:00 PM–5:00 PM)

6:00 PM

Meetup

Join us for an evening Bay Area Apache Spark Meetup at the 10th Spark Summit featuring tech-talks about using Apache 

Spark at scale from Pepperdata’s CTO Sean Suchter, RISELab’s Dan Crankshaw, and Databricks’ Spark committers… Read more

 

DAY 2 • TUESDAY, JUNE 6 • DEVELOPER DAY

7:00 AM

Registration

9:05 AM

What to Expect in 2017 for Big Data and Apache Spark

  • Matei Zaharia (Databricks)
  • Tim Hunter (Databricks)
9:30 AM

Snorkel: Dark Data and Machine Learning

  • Christopher Ré (Stanford)

Building applications that can read and analyze a wide variety of data may change the way we do science and make business decisions. 

However, building such applications is challenging: real world data is expressed in… Read more

9:45 AM

Unleashing Data Intelligence with Intel and Apache Spark

  • Michael Greene (Intel)

Organizations are developing deep learning applications to derive new insights, identify new opportunities and uncover new efficiencies.

 However, deep learning application development often means tapping into multiple frameworks, libraries, and clusters—a complex, 

time-consuming, and costly… Read more

9:55 AM

Rise Lab Fireside Chat

  • Ben Lorica (O’Reilly Media)
  • Ion Stoica (UC Berkeley AMP/RISE Lab & Databricks)

Ben Lorica and Ion Stoica discuss the growth and new projects taking place at Rise Lab.

10:15 AM

Keynote by Riot Games

  • Wes Kerr (Riot Games)
10:30 AM

Break

 

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

11:00 AM

DEVELOPER

A Deep Dive into Spark SQL's Catalyst Optimizer

  • Yin Huai(Databricks)

(11:00 AM–11:30 AM)

MACHINE LEARNING

Challenging Web-Scale Graph Analytics with Apache Spark

  • Xiangrui Meng(Databricks)

(11:00 AM–11:30 AM)

SPARK ECOSYSTEM

Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp Data Fabric and NetApp Private Storage

  • Karthikeyan Nagalingam(NetApp)
  • Nilesh Bagad(NetApp)

(11:00 AM–11:30 AM)

SPARK EXPERIENCE AND USE CASES

Scaling Up: How Switching to Apache Spark Improved Performance, Realizability, and Reduced Cost 

on a Very Large Scale ML Application

  • Kexin Xie(Salesforce)
  • Yacov Salomon(Salesforce)

(11:00 AM–11:30 AM)

ENTERPRISE

Spark Compute as a Service at Paypal

  • Prabhu Kasinathan(PayPal)

(11:00 AM–11:30 AM)

STREAMING

SSR: Structured Streaming on R for Machine Learning

  • Felix Cheung(Microsoft)

(11:00 AM–11:30 AM)

RESEARCH

Scaling Genetic Data Analysis with Apache Spark

  • Jonathan Bloom(Broad Institute of MIT and Harvard)
  • Timothy Poterba(Broad Institute of MIT and Harvard)

(11:00 AM–11:30 AM)

SPONSORED SESSIONS

TBA

(11:00 AM–11:30 AM)

TECHNICAL DEEP DIVES

Data Science Deep Dive: Spark ML with High Dimensional Labels

  • Michael Zargham(Cadent)
  • Stefan Panayotov(Cadent)

(11:00 AM–11:30 AM)

11:40 AM

DEVELOPER

TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters

  • Andy Feng (Yahoo)
  • Lee Yang (Yahoo)

(11:40 AM–12:10 PM)

MACHINE LEARNING

Needle in the Haystack—User Behavior Anomaly Detection for Information Security

  • Ping Yan(Salesforce.com)
  • Wei Deng(Salesforce)

(11:40 AM–12:10 PM)

SPARK ECOSYSTEM

Apache Kylin: Speed Up Cubing with Apache Spark

  • Luke Han(Kyligence, Inc.)
  • Shaofeng Shi(Kylingence Inc)

(11:40 AM–12:10 PM)

SPARK EXPERIENCE AND USE CASES

Incremental Processing on Large Analytical Datasets

  • Prasanna Rajaperumal (Uber)
  • Vinoth Chandar(Uber)

(11:40 AM–12:10 PM)

ENTERPRISE

Using SparkML to Power a DSaaS (Data Science as a Service)

  • Kiran Muglurmath(Comcast)
  • Sridhar Alla(Comcast)

(11:40 AM–12:10 PM)

STREAMING

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling

  • Jim Dowling (KTH Royal Institute of Technology)

(11:40 AM–12:10 PM)

RESEARCH

Lazy Join Optimizations Without Upfront Statistics

  • Matteo Interlandi(UCLA)

(11:40 AM–12:10 PM)

SPONSORED SESSIONS

TBA

(11:40 AM–12:10 PM)

TECHNICAL DEEP DIVES

Data Science Deep Dive: Spark ML with High Dimensional Labels (continues)

  • Michael Zargham(Cadent)
  • Stefan Panayotov(Cadent)

(11:40 AM–12:10 PM)

12:20 PM

DEVELOPER

Hive Bucketing in Apache Spark

  • Tejas Patil(Facebook)

(12:20 PM–12:50 PM)

MACHINE LEARNING

Random Walks on Large Scale Graphs with Apache Spark

  • Min Shen(LinkedIn)

(12:20 PM–12:50 PM)

SPARK ECOSYSTEM

Building a Unified Data Pipeline with Apache Spark and XGBoost

  • Nan Zhu(Microsoft)

(12:20 PM–12:50 PM)

SPARK EXPERIENCE AND USE CASES

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x

  • Richard Garris(Databricks)

(12:20 PM–12:50 PM)

ENTERPRISE

How Apache Spark and AI Powers UberEats

  • Chen Jin (Uber)
  • Xian Xing Zhang(Uber Technologies)

(12:20 PM–12:50 PM)

STREAMING

The Top Five Mistakes Made When Writing Streaming Applications

  • Mark Grover(Cloudera)
  • Ted Malaska(Blizzard, Inc.)

(12:20 PM–12:50 PM)

RESEARCH

Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash

  • Patrick Stuedi(IBM)

(12:20 PM–12:50 PM)

SPONSORED SESSIONS

TBA

(12:20 PM–12:50 PM)

TECHNICAL DEEP DIVES

Ray: A Cluster Computing Engine for Reinforcement Learning Applications

  • Philipp Moritz ()
  • Robert Nishihara ()

(12:20 PM–12:50 PM)

12:50 PM

Lunch

 

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

2:00 PM

DEVELOPER

Apache Spark MLlib's Past Trajectory and New Directions

  • Joseph Bradley(Databricks)

(2:00 PM–2:30 PM)

MACHINE LEARNING

Extending Spark Machine Learning: Adding Your Own Algorithms and Tools

  • Holden Karau (IBM)
  • Seth Hendrickson(IBM)

(2:00 PM–2:30 PM)

SPARK ECOSYSTEM

Building Data Product Based on Apache Spark at Airbnb

  • Jingwei Lu (Airbnb)
  • Liyin Tang (Airbnb)

(2:00 PM–2:30 PM)

SPARK EXPERIENCE AND USE CASES

Building a Versatile Analytics Pipeline on Top of Apache Spark

  • Mikhail Chernetsov(Grammarly)

(2:00 PM–2:30 PM)

ENTERPRISE

Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark

  • John Cavanaugh(HP)

(2:00 PM–2:30 PM)

STREAMING

Real-Time Machine Learning Analytics Using Structured Streaming and Kinesis Firehose

  • Caryl Yuhas(Databricks)
  • Myles Baker(Databricks)

(2:00 PM–2:30 PM)

RESEARCH

Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on Spark and MPI Using Three Case Studies

  • Michael Mahoney(UC Berkeley)

(2:00 PM–2:30 PM)

SPONSORED SESSIONS

TBA

(2:00 PM–2:30 PM)

TECHNICAL DEEP DIVES

Cost-Based Optimizer in Apache Spark 2.2

  • Ron Hu (Huawei Technologies)
  • Sameer Agarwal(Databricks)

(2:00 PM–2:30 PM)

2:40 PM

DEVELOPER

Informational Referential Integrity Constraints Support in Apache Spark

  • Ioana Delaney(IBM)
  • Suresh Thalamati(IBM)

(2:40 PM–3:10 PM)

MACHINE LEARNING

Fuzzy Matching on Apache Spark

  • Jennifer Shin (8 Path Solutions)

(2:40 PM–3:10 PM)

SPARK ECOSYSTEM

Extending the R API for Spark with sparklyr and Microsoft R Server

  • Ali Zaidi (Microsoft)

(2:40 PM–3:10 PM)

SPARK EXPERIENCE AND USE CASES

Best Practices for Using Alluxio with Apache Spark

  • Cheng Chang(Alluxio)
  • Haoyuan Li (Alluxio)

(2:40 PM–3:10 PM)

ENTERPRISE

Scaling Data Science Capabilities with Apache Spark at Stitch Fix

  • Derek Bennett(Stitch Fix)

(2:40 PM–3:10 PM)

STREAMING

A Practical Approach to Building a Streaming Processing Pipeline for an Online Advertising Platform

  • Amit Ramesh(Yelp)
  • Yifan Wang (Yelp)

(2:40 PM–3:10 PM)

RESEARCH

Apache Spark on Supercomputers: A Tale of the Storage Hierarchy

  • Costin Iancu(Lawrence Berkeley National Laboratory)
  • Nicholas Chaimov(University of Oregon)

(2:40 PM–3:10 PM)

SPONSORED SESSIONS

TBA

2:40 PM (2:40 PM–2:55 PM)

SPONSORED SESSIONS

TBA

2:55 PM (2:55 PM–3:10 PM)

TECHNICAL DEEP DIVES

Cost-Based Optimizer in Apache Spark 2.2 (continues)

  • Wenchen Fan(Databricks)
  • Zhenhua Wang(Huawei Technologies)

(2:40 PM–3:10 PM)

3:20 PM

DEVELOPER

Tricks of the Trade to be an Apache Spark Rock Star

  • Ted Malaska(Blizzard, Inc.)

(3:20 PM–3:50 PM)

MACHINE LEARNING

Assigning Responsibility for Deteriorations in Video Quality

  • Henry Milner(Conviva)
  • Oleg Vasilyev(Conviva)

(3:20 PM–3:50 PM)

SPARK ECOSYSTEM

Apache Spark on Kubernetes

  • Anirudh Ramanathan(Google)
  • Tim Chen(Hyperpilot)

(3:20 PM–3:50 PM)

SPARK EXPERIENCE AND USE CASES

Experiences Migrating Hive Workload to SparkSQL

  • Jie Xiong(Facebook)
  • Zhan Zhang(Facebook)

(3:20 PM–3:50 PM)

ENTERPRISE

Transforming B2B Sales with Spark-Powered Sales Intelligence

  • Songtao Guo(LinkedIn)
  • Wei Di (LinkedIn)

(3:20 PM–3:50 PM)

STREAMING

An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining with Spark Streaming

  • J White Bear (IBM)

(3:20 PM–3:50 PM)

RESEARCH

Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire!

  • Tiark Rompf(Purdue University)

(3:20 PM–3:50 PM)

SPONSORED SESSIONS

TBA

3:20 PM (3:20 PM–3:35 PM)

SPONSORED SESSIONS

TBA

3:35 PM (3:35 PM–3:50 PM)

TECHNICAL DEEP DIVES

TBA

(3:20 PM–3:50 PM)

3:50 PM

Break

 

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

4:20 PM

DEVELOPER

Improving Python and Spark Performance and Interoperability with Apache Arrow

  • Julien Le Dem(Dremio)
  • Li Jing (Two Sigma Investments, LP)

(4:20 PM–4:50 PM)

MACHINE LEARNING

Multi-Label Graph Analysis and Computations Using GraphX

  • Qiang Zhu(LinkedIn)
  • Qingbo Hu(LinkedIn)

(4:20 PM–4:50 PM)

SPARK ECOSYSTEM

More Algorithms and Tools for Genomic Analysis on Apache Spark

  • Ryan Williams(Mount Sinai School of Medicine)

(4:20 PM–4:50 PM)

SPARK EXPERIENCE AND USE CASES

Lessons Learned from Managing Thousands of Production Apache Spark Clusters Daily

  • Henry Davidge(Databricks)
  • Josh Rosen(Databricks)

(4:20 PM–4:50 PM)

ENTERPRISE

GoDaddy Customer Success Dashboard Using Apache Spark

  • Baburao Kamble(GoDaddy)

(4:20 PM–4:50 PM)

STREAMING

Dynamic DDL: Adding Structure to Streaming Data on the Fly

  • David Winters(GoPro)
  • Hao Zou (GoPro)

(4:20 PM–4:50 PM)

RESEARCH

Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren

  • Eric Jonas (UC Berkeley)
  • Shivaram Venkataraman (UC Berkeley)

(4:20 PM–4:50 PM)

SPONSORED SESSIONS

TBA

4:20 PM (4:20 PM–4:35 PM)

SPONSORED SESSIONS

TBA

4:35 PM (4:35 PM–4:50 PM)

TECHNICAL DEEP DIVES

Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark

  • Michael Armbrust(Databricks)
  • Tathagata Das(Databricks)

(4:20 PM–4:50 PM)

5:00 PM

DEVELOPER

Building Robust ETL Pipelines with Apache Spark

  • Xiao Li (Databricks)

(5:00 PM–5:30 PM)

MACHINE LEARNING

Visualization of Enhanced Spark Induced Naive Bayes Classifier

  • Barry Becker (ESI Group)

(5:00 PM–5:30 PM)

SPARK ECOSYSTEM

Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spark SQL

  • Bikas Saha(Hortonworks)
  • Weiqing Yang(Hortonworks)

(5:00 PM–5:30 PM)

SPARK EXPERIENCE AND USE CASES

From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets

  • Avi Aminov(Akamai Technologies)

(5:00 PM–5:30 PM)

ENTERPRISE

Applying Machine Learning to Construction

  • Charis Kaskiris(Autodesk)
  • Shubham Goel(Autodesk)

(5:00 PM–5:30 PM)

STREAMING

Building Continuous Application with Structured Streaming and Real-Time Data Source

  • Arijit Tarafdar(Microsoft)
  • Nan Zhu(Microsoft)

(5:00 PM–5:30 PM)

RESEARCH

Speeding Up Spark with Data Compression on Xeon+FPGA

  • David Ojika(University of Florida)

(5:00 PM–5:30 PM)

SPONSORED SESSIONS

TBA

5:00 PM (5:00 PM–5:15 PM)

SPONSORED SESSIONS

TBA

5:15 PM (5:15 PM–5:30 PM)

TECHNICAL DEEP DIVES

Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark (continues)

  • Michael Armbrust(Databricks)
  • Tathagata Das(Databricks)

(5:00 PM–5:30 PM)

5:40 PM

DEVELOPER

Behavior-Driven Development (BDD) Testing with Apache Spark

  • Aaron Colcord (FIS Global)
  • Zachary Nanfelt(FIS)

(5:40 PM–6:10 PM)

MACHINE LEARNING

The Key to Machine Learning is Prepping the Right Data

  • Jean Georges Perrin (Zaloni)

(5:40 PM–6:10 PM)

SPARK ECOSYSTEM

Building a Large Scale Recommendation Engine with Spark and Redis-ML

  • Dvir Volk (Redis Labs)
  • Shay Nativ (Redis Labs)

(5:40 PM–6:10 PM)

SPARK EXPERIENCE AND USE CASES

Apache Spark and Citizen Science: Using eBird Data to Predict Bird Abundance at Scale

  • Tom Auer (Cornell University)

(5:40 PM–6:10 PM)

ENTERPRISE

Rental Cars and Industrialized Learning to Rank

  • Sean Downes(Expedia)

(5:40 PM–6:10 PM)

STREAMING

Scalable Monitoring Using Apache Spark and Friends

  • Utkarsh Bhatnagar(Tinder)

(5:40 PM–6:10 PM)

RESEARCH

Accelerating SparkML Workloads on the Intel Xeon+FPGA Platform

  • Zhankun Tang(Intel)
  • Zhongyue Nah(Intel)

(5:40 PM–6:10 PM)

SPONSORED SESSIONS

TBA

(5:40 PM–6:10 PM)

TECHNICAL DEEP DIVES

TBA

(5:40 PM–6:10 PM)

6:10 PM

Attendee Reception

Have fun mingling with other attendees over hors d’oeuvres and cocktails as you tour the Spark Summit Expo Hall.

 

DAY 3 • WEDNESDAY, JUNE 7 • ENTERPRISE DAY

8:00 AM

Registration

9:00 AM

Databricks Keynote

  • Ali Ghodsi (Databricks)
  • Greg Owen (Databricks)
  • Michael Armbrust (Databricks)
9:40 AM

Keynote-TBA

9:55 AM

Keynote by Hotels.com

  • Matt Fryer (Hotels.com)
10:10 AM

Cutting Edge Predictive Analytics

  • Eric Siegel (Predictive Analytics World)

Apache Spark empowers predictive analytics and machine learning by increasing the reach and potential.

 But, before jumping to new deployments, it’s critical we 1) get the analytics right and 2) not overlook 

less conspicuous business… Read more

10:30 AM

Break

 

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

11:00 AM

DEVELOPER

Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop

  • Carl Steinbach(LinkedIn)
  • Simon King(Pepperdata)

(11:00 AM–11:30 AM)

MACHINE LEARNING

Embracing a Taxonomy of Types to Simplify Machine Learning

  • Leah McGuire(Salesforce.com)

(11:00 AM–11:30 AM)

SPARK ECOSYSTEM

HDFS on Kubernetes—Lessons Learned

  • Kimoon Kim(Pepperdata)

(11:00 AM–11:30 AM)

SPARK EXPERIENCE AND USE CASES

Spinach: Providing Ad-Hoc Query Support on Top of Spark SQL

  • Daoyuan Wang(Intel)
  • Yuanjian Li (Baidu)

(11:00 AM–11:30 AM)

ENTERPRISE

Archiving, E-Discovery, and Supervision with Spark and Hadoop

  • Jordan Volz(Cloudera)

(11:00 AM–11:30 AM)

DATA SCIENCE

Yelp Ad Targeting at Scale with Apache Spark

  • Inaz Alaei-Novin(Yelp)
  • Joe Malicki (Yelp)

(11:00 AM–11:30 AM)

RESEARCH

Debugging Big Data Analytics in Apache Spark with BigDebug

  • Matteo Interlandi(UCLA)
  • Muhammad Ali Gulzar (UCLA)

(11:00 AM–11:30 AM)

SPONSORED SESSIONS

TBA

(11:00 AM–11:30 AM)

TECHNICAL DEEP DIVES

Deep Dive Into Apache Spark Multi-User Performance

  • Mikhail Genkin(IBM)
  • Peter Lankford(STAC)

(11:00 AM–11:30 AM)

11:40 AM

DEVELOPER

Productive Use of the Apache Spark Prompt

  • Sam Penrose(Mozilla)

(11:40 AM–12:10 PM)

MACHINE LEARNING

Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Regulatory Landscapes Using Deep Learning Frameworks

  • Yi-Hsiang Hsu ()
  • Yongsheng Huang(Databricks)

(11:40 AM–12:10 PM)

SPARK ECOSYSTEM

Homologous Apache Spark Clusters Using Nomad

  • Alex Dadgar(Hashicorp)

(11:40 AM–12:10 PM)

SPARK EXPERIENCE AND USE CASES

Social Media, Spark, Machine Learning, and Data Visualization to Find Patterns and Insight

  • Erik Schlegel(Microsoft)

(11:40 AM–12:10 PM)

ENTERPRISE

Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark

  • Bernhard Schlegel(BMW)

(11:40 AM–12:10 PM)

DATA SCIENCE

Data Wrangling with PySpark for Data Scientists Who Know Pandas

  • Andrew Ray(Silicon Valley Data Science)

(11:40 AM–12:10 PM)

RESEARCH

Building Genomic Data Processing and Machine Learning Workflows Using Apache Spark

  • Anupama Joshi(Epinomics)
  • Matt Negulescu(Epinomics)

(11:40 AM–12:10 PM)

SPONSORED SESSIONS

TBA

(11:40 AM–12:10 PM)

TECHNICAL DEEP DIVES

Deep Dive Into Apache Spark Multi-User Performance (continues)

(11:40 AM–12:10 PM)

12:20 PM

DEVELOPER

Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust

  • David Taieb (IBM)

(12:20 PM–12:50 PM)

MACHINE LEARNING

Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned

  • Yanbo Liang(Hortonworks)

(12:20 PM–12:50 PM)

SPARK ECOSYSTEM

Interoperating a Zoo of Data Processing Platforms Using Rheem

  • Sebastian Kruse(PhD Student)
  • Yasser Idris (Qatar Computing Research Institute)

(12:20 PM–12:50 PM)

SPARK EXPERIENCE AND USE CASES

Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for Forensics, Fraud, and Finance

  • Bryan Cheng(BlockCypher)
  • Karen Hsu(BlockCypher)

(12:20 PM–12:50 PM)

ENTERPRISE

Big Data at Audi: Root Cause Analysis in an Automotive Paint Shop Using MLlib

  • Christian Raimann(Audi Business Innovation GmbH)
  • Christoph Kreibich(Audi)

(12:20 PM–12:50 PM)

DATA SCIENCE

Smart Scalable Feature Reduction With Random Forests

  • Erik Erlandson (Red Hat)

(12:20 PM–12:50 PM)

RESEARCH

Neuro-Symbolic AI for Sentiment Analysis

  • Michael Malak(Oracle)

(12:20 PM–12:50 PM)

SPONSORED SESSIONS

Women in Big Data Lunch

(12:20 PM–12:50 PM)

TECHNICAL DEEP DIVES

From Pipelines to Refineries: Building Complex Data Applications with Apache Spark

  • Tim Hunter(Databricks)

(12:20 PM–12:50 PM)

12:50 PM

Lunch

 

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

2:00 PM

DEVELOPER

Improving Apache Spark with S3

  • Ryan Blue (Netflix)

(2:00 PM–2:30 PM)

MACHINE LEARNING

Building Competing Models Using Apache Spark DataFrames

  • Abdulla Al-Qawasmeh (Credit Karma)

(2:00 PM–2:30 PM)

SPARK ECOSYSTEM

Cassandra and SparkSQL: You Don't Need Functional Programming for Fun

  • Russell Spitzer(DataStax)

(2:00 PM–2:30 PM)

SPARK EXPERIENCE AND USE CASES

Tuning Apache Spark for Large-Scale Workloads

  • Gaoxiang Liu(Facebook)
  • Sital Kedia(Facebook)

(2:00 PM–2:30 PM)

ENTERPRISE

From Data to Actions and Insights at Conviva

  • Rui Zhang(Conviva)
  • Yan Li (Conviva)

(2:00 PM–2:30 PM)

DATA SCIENCE

Fully-Reproducible ML Deployment with Spark, Pachyderm, and MLeap

  • Daniel Whitenack(Pachyderm)
  • Hollin Wilkins(Combust, Inc.)

(2:00 PM–2:30 PM)

DATA SCIENCE

Natural Language Processing with CNTK and Apache Spark

  • Ali Zaidi (Microsoft)

(2:00 PM–2:30 PM)

SPONSORED SESSIONS

TBA

(2:00 PM–2:30 PM)

TECHNICAL DEEP DIVES

Sparklyr: Recap, Updates, and Use Cases

  • Javier Luraschi(RStudio)

(2:00 PM–2:30 PM)

2:40 PM

DEVELOPER

Demystifying DataFrame and Dataset

  • Dr. Kazuaki Ishizaki(IBM)

(2:40 PM–3:10 PM)

MACHINE LEARNING

Real-Time Image Recognition with Apache Spark

  • Nikita Shamgunov(MemSQL)

(2:40 PM–3:10 PM)

SPARK ECOSYSTEM

Applying SparkSQL to Big Spatio-Temporal Data Using GeoMesa

  • Anthony Fox (CCRi)

(2:40 PM–3:10 PM)

SPARK EXPERIENCE AND USE CASES

Performance Optimization of Recommendation Training Pipeline at Netflix

  • Hua Jiang (Netflix)

(2:40 PM–3:10 PM)

ENTERPRISE

Changing the Way Viacom Looks at Video Performance

  • Mark Cohen(Viacom)
  • Michael Rosencrantz(Viacom, Inc.)

(2:40 PM–3:10 PM)

DATA SCIENCE

Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark

  • Yanwei Zhang(Uber)

(2:40 PM–3:10 PM)

DATA SCIENCE

ADMM-Based Scalable Machine Learning on Apache Spark

  • Mohak Shah(Robert Bosch LLC)
  • Sauptik Dhar(Robert Bosch LLC)

(2:40 PM–3:10 PM)

SPONSORED SESSIONS

TBA

(2:40 PM–3:10 PM)

TECHNICAL DEEP DIVES

Sparklyr: Recap, Updates, and Use Cases (continues)

(2:40 PM–3:10 PM)

3:20 PM

DEVELOPER

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT

  • Denis Magda(GridGain)

(3:20 PM–3:50 PM)

MACHINE LEARNING

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark

  • Masato Asahara(NEC)
  • Ryohei Fujimaki(NEC)

(3:20 PM–3:50 PM)

SPARK ECOSYSTEM

Just-in-Time Analytics and the Need for Autonomous Database Administration

  • Kristian Alexander(Algebraix Data Corp.)
  • Wes Holler(Algebraix)

(3:20 PM–3:50 PM)

SPARK EXPERIENCE AND USE CASES

Machine Learning as a Service: Apache Spark MLlib Enrichment and Web-Based Codeless Modeling

  • Zhengyi Le (Suning R&D)

(3:20 PM–3:50 PM)

ENTERPRISE

Leveraging Apache Spark to Disrupt Airline Pricing Distribution

  • Anton Diego(EveryMundo)
  • Daniel Santana(EveryMundo)

(3:20 PM–3:50 PM)

DATA SCIENCE

Write Graph Algorithms Like a Boss

  • Andrew Ray(Silicon Valley Data Science)

(3:20 PM–3:50 PM)

DATA SCIENCE

A Predictive Analytics Workflow on DICOM Images using Apache Spark

  • Anahita Bhiwandiwalla(Intel)
  • Karthik Vadla (Intel)

(3:20 PM–3:50 PM)

SPONSORED SESSIONS

TBA

(3:20 PM–3:50 PM)

TECHNICAL DEEP DIVES

TBA

(3:20 PM–3:50 PM)

3:50 PM

Break

 

ROOM 1

ROOM 2

ROOM 3

ROOM 4

ROOM 5

ROOM 6

ROOM 7

ROOM 8

ROOM 9

4:20 PM

DEVELOPER

A Developer’s View into Spark's Memory Model

  • Wenchen Fan(Databricks)

(4:20 PM–4:50 PM)

MACHINE LEARNING

Deep Learning in Security—Are We Ready?

  • Dr. Jisheng Wang(Niara)

(4:20 PM–4:50 PM)

SPARK ECOSYSTEM

Getting Ready to Use Redis with Apache Spark

  • Tague Griffith(Redis Labs)

(4:20 PM–4:50 PM)

SPARK EXPERIENCE AND USE CASES

Why You Should Care about Data Layout in the Filesystem

  • Cheng Lian(Databricks)
  • Vida Ha(Databricks)

(4:20 PM–4:50 PM)

ENTERPRISE

Leveraging Spark in Ecommerce Platform to Democratize Data

  • Shafaq Abdullah(Honest Company)

(4:20 PM–4:50 PM)

DATA SCIENCE

Using AI for Providing Insights and Recommendations on Activity Data

  • Alexis Roos(Salesforce)
  • Sammy Nammari(Salesforce)

(4:20 PM–4:50 PM)

DATA SCIENCE

Apache SparkR Under the Hood: How to Debug your SparkR Applications

  • Hossein Falaki(Databricks)

(4:20 PM–4:50 PM)

SPONSORED SESSIONS

TBA

(4:20 PM–4:50 PM)

TECHNICAL DEEP DIVES

Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more

  • Cihan Biyikoglu(Redis Labs)

(4:20 PM–4:50 PM)

5:00 PM

DEVELOPER

Continuous Application with FAIR Scheduler

  • Robert Xue(Groupon)

(5:00 PM–5:30 PM)

MACHINE LEARNING

Deep Learning to Big Data Analytics on Apache Spark Using BigDL

  • Xianyan Jia (Intel)
  • Yuhao Yang (Intel)

(5:00 PM–5:30 PM)

SPARK ECOSYSTEM

From R Script to Production Using rsparkling

  • Navdeep Gill(H2O.ai)

(5:00 PM–5:30 PM)

SPARK EXPERIENCE AND USE CASES

RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Environment

  • Adrian Petrescu(Rubikloud)

(5:00 PM–5:30 PM)

ENTERPRISE

Stream All Things—Patterns of Modern Data Integration

  • Gwen Shapira(Confluent)

(5:00 PM–5:30 PM)

DATA SCIENCE

NLP with MLlib: Global Empire-Building for Fun and Profit

  • Michelle Casbon(Qordoba)

(5:00 PM–5:30 PM)

DATA SCIENCE

Building Smart IoT Applications Using Spark

  • Rafael Schultze-Kraft (WATTx)

(5:00 PM–5:30 PM)

SPONSORED SESSIONS

TBA

(5:00 PM–5:30 PM)

TECHNICAL DEEP DIVES

Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more (continues)

(5:00 PM–5:30 PM)

5:40 PM

DEVELOPER

SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitoring

  • Yiannis Gkoufas(IBM)

(5:40 PM–6:10 PM)

MACHINE LEARNING

Deep Learning with Apache Spark and GPUs

  • Pierce Spitler(Bitfusion)
  • Tim Gasper(Bitfusion)

(5:40 PM–6:10 PM)

SPARK ECOSYSTEM

Distributed End-to-End Drug Similarity Analytics and Visualization Workflow

  • Anahita Bhiwandiwalla(Intel)
  • Dina Suehiro (Intel Corporation)

(5:40 PM–6:10 PM)

SPARK EXPERIENCE AND USE CASES

The Smart Data Warehouse: Goal-Based Data Production

  • Sim Simeonov(Swoop)

(5:40 PM–6:10 PM)

ENTERPRISE

TBA

(5:40 PM–6:10 PM)

DATA SCIENCE

Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While Looking for Signs of Extra-Terrestrial Life

  • Gil Vernik (IBM Corporation)
  • Graham Mackintosh (IBM)

(5:40 PM–6:10 PM)

DATA SCIENCE

Semantic Search: Fast Results from Large, Non-Native Language Corpora

  • Rob Lantz (Novetta)

(5:40 PM–6:10 PM)

SPONSORED SESSIONS

TBA

(5:40 PM–6:10 PM)

TECHNICAL DEEP DIVES

TBA

(5:40 PM–6:10 PM)

8:00 PM

JOIN Party

Come close out the 10th edition of Spark Summit at the JOIN attendee party. This rockin’ celebration includes drinks, games, 

DJs, dancing and a few fun surprises. In the coming weeks, we will announce even… Read moreDatabricks

 

 

 

你可能感兴趣的:(Spark)