databricks notebook 笔记

我们要做的 大数据平台 打算使用spark 来做 ,我很开心

  • spark 软件栈丰富全面,涵盖了离线数据清洗、流处理、迭代的机器学习

  • 想不起来了暂时

Databricks是Berkeley AMPLab Spark大牛们的新作,
定位是”Databricks is a managed platform for running Apache Spark”

  • It’s a point and click platform for those that prefer a user interface like data scientists or data analysts.
  • However, this UI is accompanied by a sophisticated API for those that want to automate aspects of their data workloads with automated jobs.
  • To meet the needs of enterprises, Databricks also includes features such as role-based access control and other intelligent optimizations that not only improve usability for users but also reduce costs and complexity for administrators.
    也就是说提供了 数据清洗、机器学习、用户管理功能,能够很好的满足我们的需要
    databricks 同时提供了 webUI 与 REST api
