azure 入门_Azure Databricks入门指南

azure 入门

This article serves as a complete guide to Azure Databricks for the beginners. Here, you will walk through the basics of Databricks in Azure, how to create it on the Azure portal and various components & internals related to it.

本文是面向初学者的Azure Databricks的完整指南。 在这里,您将了解Azure中Databricks的基础知识,如何在Azure门户上创建它以及与之相关的各种组件和内部组件。

Systems are working with massive amounts of data in petabytes or even more and it is still growing at an exponential rate. Big data is present everywhere around us and comes in from different sources like social media sites, sales, customer data, transactional data, etc. And I firmly believe, this data holds its value only if we can process it both interactively and faster.

系统正在处理PB级甚至更多的海量数据,并且仍在以指数级的速度增长。 大数据无处不在,并且来自不同的来源,例如社交媒体网站,销售,客户数据,交易数据等。我坚信,只有当我们能够交互且更快地进行处理时,这些数据才有价值。

Apache Spark is an open-source, fast cluster computing system and a highly popular framework for big data analysis. This framework processes the data in parallel that helps to boost the performance. It is written in Scala, a high-level language, and also supports APIs for Python, SQL, Java and R.

Apache Spark是一个开放源代码的快速集群计算系统,是用于大数据分析的非常流行的框架。 该框架并行处理数据,有助于提高性能。 它是用高级语言Scala编写的,并且还支持Python,SQL,Java和R的API。

Now the question is:

现在的问题是:

什么是Azure Databricks,它与Spark有何关系? (What is Azure Databricks and how is it related to Spark?)

Simply put, Databricks is the implementation of Apache Spark on Azure. With fully managed Spark clusters, it is used to process large workloads of data and also helps in data engineering, data exploring and also visualizing data using Machine learning.

简而言之,Databricks是Apache Spark在Azure上的实现。 借助完全托管的Spark集群,它可用于处理大型数据工作量,还有助于数据工程,数据探索以及使用机器学习对数据进行可视化。

While I was working on databricks, I find this analytic platform to be extremely developer-friendly and flexible with ease to use APIs like Python, R, etc. To explain this a little more, say you have created a data frame in Python, with Azure Databricks, you can load this data into a temporary view and can use Scala, R or SQL with a pointer referring to this temporary view. This allows you to code in multiple languages in the same notebook. This was just one of the cool features of it.

当我处理数据块时,我发现此分析平台对开发人员非常友好且灵活,并且易于使用Python,R等API。要对此进行更多说明,请说您已经使用Python创建了一个数据框架,在Azure Databricks中,您可以将该数据加载到临时视图中,并且可以将Scala,R或SQL与指向该临时视图的指针一起使用。 这使您可以在同一笔记本中以多种语言进行编码。 这只是它的酷功能之一。

为什么选择Azure Databricks? (Why Azure Databricks?)

Evidently, the adoption of Databricks is gaining importance and relevance in a big data world for a couple of reasons. Apart from multiple language support, this service allows us to integrate easily with many Azure services like Blob Storage, Data Lake Store, SQL Database and BI tools like Power BI, Tableau, etc. It is a great collaborative platform letting data professionals share clusters and workspaces, which leads to higher productivity.

显然,采用Databricks在大数据世界中正变得越来越重要和相关,原因有两个。 除了多语言支持之外,该服务还使我们可以轻松地与许多Azure服务集成,例如Blob存储,Data Lake Store,SQL数据库和BI工具(例如Power BI,Tableau等)。这是一个出色的协作平台,可让数据专业人员共享群集和工作空间,从而提高了生产率。

大纲 (Outline)

Before we get started digging Databricks in Azure, I would like to take a minute here to describe how this article series is going to be structured. I intend to cover the following aspects of Databricks in Azure in this series. Please note – this outline may vary here and there when I actually start writing on them.

在我们开始在Azure中挖掘Databricks之前,我想花一点时间来描述本系列文章的结构。 我打算在本系列中介绍Azure中Databricks的以下方面。 请注意–当我实际开始在其上书写时,此轮廓可能在此处和此处有所不同。

  1. How to access Azure Blob Storage from Azure Databricks

    如何从Azure Databricks访问Azure Blob存储
  2. Processing and exploring data in Azure Databricks

    在Azure Databricks中处理和浏览数据
  3. Connecting Azure SQL Databases with Azure Databricks

    将Azure SQL数据库与Azure Databricks连接
  4. Load data into Azure SQ

你可能感兴趣的:(数据库,大数据,java,python,编程语言)