spark introduction

What is Spark?
Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

spark  是一个开源的计算集群系统,目标是数据分析快速的执行和快速写的。

To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much more quickly than with disk-based systems like Hadoop MapReduce.

为了使程序更快,spark提供初级的内存计算。可以将数据加载到内存中,多次查询速度要比像Hadoop这样基于磁盘的系统快

得多。

To make programming faster, Spark provides clean, concise APIs in Scala, Java and Python. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.

为使程序更快,spark提供简洁的接口包括Scala,java,python。同时,你可以使用Scala和python脚本和spark交互从而快速的查询大的数据集。

What can it do?
Spark was initially developed for two applications where keeping data in memory helps: iterativealgorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can run up to 100x faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our example jobs.

spark开始是为了两个数据可以存放在内存中的应用开发的,一个是interativealgoriths 通常用于数据挖掘,另一个是interactive 数据挖掘。在这两个实际应用中,spark提供比Hadoop mapreduce快一百倍的速度。当能,你也可以把spark应用于数据处理。

Spark is also the engine behind Shark, a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive.

spark页是shark的引擎,shark是一个hive完全兼容的数据仓库系统,shark的速度比hive快一百倍以上。

While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.

spark是一个新的引擎,他可以处理任何Hadoop支持的数据源,所以非常容易替代已有的系统。

Who uses it?
Spark was developed in the UC Berkeley AMPLab. It’s used by several groups of researchers at Berkeley to run large-scale applications such as spam filtering and traffic prediction. It’s also used to accelerate data analytics at Conviva, Quantifind, and other companies — in total, 14 companies have contributed to Spark! Spark is open source under a BSD license, so download it to check it out.

spark是伯克利AMPLAB开发。它被多个实验室用于邮件过虑,交通预测等大数据的应用。同时也被conviva,quantifind以及其他的一些公司用来加速数据分析。总共有14家公司给spark贡献源码。spark使用开源BSC许可证,所以你可以下载试用。

你可能感兴趣的:(spark,shark)