为期五天的 Spark Summit North America 2020在美国时间 2020-06-22 ~ 06-26 举行。由于今年新冠肺炎的影响,本次会议第一次以线上的形式进行。这次会议虽然是五天,但是前两天是培训,后面三天才是正式会议。本次会议一共有超过210个议题,一如既往,主题也主要是 Spark + AI,在 AI 方面会议还深入讨论一些流行的软件框架,如 Delta Lake、MLflow、TensorFlow、SciKit-Learn、Keras、PyTorch、DeepLearning4J、BigDL 和 deep learning pipeline等。会议的全部日程请参见:https://databricks.com/sparkaisummit/north-america-2020/agenda
这次会议带来了几点比较重要消息:数砖收购 Redash 公司,发布 Delta Engine等,不过目前 KeyNote 会议的 PPT 还没有发布,感兴趣的可以看下相关视频。过往记忆大数据也在前几天发了几篇这次会议 KeyNote 的介绍,感兴趣的同学可以看 《马铁大神的 Apache Spark 十年回顾》、《Spark 背后的商业公司收购的 Redash 是个啥?》以及 《全方位解读数砖的 Delta Engine》。
下面是马铁的 Spark 十年回顾和 Delta Engine 的视频:
另外,在接下来的几天,本公众号也会对一些比较有意思的议题进行介绍,敬请关注本公众号。
如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公共帐号:iteblog_hadoop
本次会议的议题范围具体如下:
•Apache Spark™, Delta Lake, MLflow 以及 Koalas 未来规划;•管理机器学习生命周期的最佳实践•构建大规模可靠数据管道的技巧•流行的深度学习和机器学习框架的最新发展•真实的 AI 用户案例
关注微信公众号 过往记忆大数据 或者 Java技术范 并回复 spark-9832 获取。
下面议题提供 PPT 下载
•Data Science Across Data Sources with Apache Arrow•Portable Scalable Data Visualization Techniques for Apache Spark and Python Notebook-based Analytics•Native Support of Prometheus Monitoring in Apache Spark 3.0•Performant Streaming in Production: Preventing Common Pitfalls when Productionizing Streaming Jobs•Scaling Security Threat Detection with Apache Spark and Databricks•User Defined Aggregation in Apache Spark: A Love Story•Powering Interactive BI Analytics with Presto and Delta Lake•Using AI to Support Proliferating Merchant Changes•Tuning ML Models: Scaling, Workflows, and Architecture•Battling Model Decay with Deep Learning and Gamification•An Approach to Data Quality for Netflix Personalization Systems•High-Performance Analytics with Probabilistic Data Structures: the Power of HyperLogLog•Preventing Abuse Using Unsupervised Learning•Geospatial Analytics at Scale: Analyzing Human Movement Patterns During a Pandemic•Leveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning•Filtering vs Enriching Data in Apache Spark•Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters•Deep Dive into GPU Support in Apache Spark 3.x•Sputnik: Airbnb’s Apache Spark Framework for Data Engineering•Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts•Automated and Explainable Deep Learning for Clinical Language Understanding at Roche•Building Understanding Out of Incomplete and Biased Datasets using Machine Learning and Databricks•Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA and Governance•Managing ADLS gen2 using Apache Spark•Using Apache Spark and Differential Privacy for Protecting the Privacy of the 2020 Census Respondents•The 2020 Census and Innovation in Surveys•scaling-data-and-ml-with-apache-spark-and-feast•The Apache Spark File Format Ecosystem•Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL Pipeline•A Production Quality Sketching Library for the Analysis of Big Data•Children Safety Retrieval (CENSER) System for Retrieval of Kidnapped Children from Brothels in India•Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner Enabled Apache Spark Clusters•Scalable AutoML for Time Series Forecasting using Ray•Using Machine Learning to Evolve Sports Entertainment•Using Bayesian Generative Models with Apache Spark to Solve Entity Resolution Problems (DeDup, Merging, Uniqueness) at Scale•Fine Tuning and Enhancing Performance of Apache Spark Jobs•All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databricks) - A Real World Case Study•Running Apache Spark on Kubernetes: Best Practices and Pitfalls•Lessons Learned from Modernizing USCIS Data Analytics Platform•On Improving Broadcast Joins in Apache Spark SQL•Using Databricks as an Analysis Platform•Is This Thing On? A Well State Model for the People•Advanced Natural Language Processing with Apache Spark NLP•Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends•Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes•Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator•Resource-Efficient Deep Learning Model Selection on Apache Spark•Bring Satellite and Drone Imagery into your Data Science Workflows•Scoring at Scale: Generating Follow Recommendations for Over 690 Million LinkedIn Members•From HDFS to S3: Migrate Pinterest Apache Spark Clusters•SparkCruise: Automatic Computation Reuse in Apache Spark•Chromatic Sparse Learning•Deploy and Serve Model from Azure Databricks onto Azure Machine Learning•Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler•The Revolution Will be Streamed•Democratizing PySpark for Mobile Game Publishing•Ray: Enterprise-Grade, Distributed Python•Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics•Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific•Scaling Up AI Research to Production with PyTorch and MLFlow•Best Practices for Building Robust Data Platform with Apache Spark and Delta•Building a Pipeline for State-of-the-Art Natural Language Processing Using Hugging Face Tools•Designing the Next Generation of Data Pipelines at Zillow with Apache Spark•Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks•Flash for Apache Spark Shuffle with Cosco•Building a Real-Time Feature Store at iFood
猜你喜欢
1、Spark 背后的商业公司收购的 Redash 是个啥?
2、马铁大神的 Apache Spark 十年回顾
3、YARN 在字节跳动的优化与实践
4、Apache Spark 3.0.0 正式版终于发布了,重要特性全面解析
过往记忆大数据微信群,请添加微信:fangzhen0219,备注【进群】