spark 官方动手练习一:Introduction to the Scala Shell

本章将教你如何使用Scala shell的基本知识,并为您介绍函数式编程与集合。
如果你已经熟悉Scala或使用Python shell为课程,那么建议直接跳到下一节。
这项练习是基于setp1 scala的教程,详细链接:http://www.artima.com/scalazine/articles/steps.html。然而,通读了整个教程,并在控制台尝试这些例子的话,那么可能花相当长的时间,所以我们会提供一些介绍Scala的shell命令。

通过键入以下命令启动scala控制台:

1.调出scala控制台

[root@hadoop ~]# scala
Welcome to Scala version 2.10.3 (Java HotSpot(TM) Client VM, Java 1.6.0_24).
Type in expressions to have them evaluated.
Type :help for more information.
scala> 

2.定义一个变量为myNumbers的整数list。

scala> val myNumbers = List(1, 2, 5, 4, 7, 3)
myNumbers: List[Int] = List(1, 2, 5, 4, 7, 3)

3.声明一个函数cube来计算数字的立方。

scala> def cube(a: Int): Int = a * a * a
cube: (a: Int)Int

4.使用map函数的功能,来求出变量myNumbers的立方。

scala> myNumbers.map(x => cube(x))
res0: List[Int] = List(1, 8, 125, 64, 343, 27)

scala还提供如下2个便捷方法:

scala> myNumbers.map(cube(_))
res1: List[Int] = List(1, 8, 125, 64, 343, 27)

scala> myNumbers.map(cube)
res2: List[Int] = List(1, 8, 125, 64, 343, 27)

详细的map的api如下

traitMap[A+B] extends Iterable[(AB)] with GenMap[AB] with MapLike[AB, Map[AB]]

A map from keys of type A to values of type B.

Implementation note: This trait provides most of the operations of a Map independently of its representation. It is typically inherited by concrete implementations of maps.

To implement a concrete map, you need to provide implementations of the following methods:

def get(key: A): Option[B]
def iterator: Iterator[(A, B)]
def + [B1 >: B](kv: (A, B1)): This
def -(key: A): This

If you wish that methods like takedropfilter also return the same kind of map you should also override:

def empty: This

It is also good idea to override methods foreach and size for efficiency.

Note: If you do not have specific implementations for add and - in mind, you might consider inheriting from DefaultMap instead.

Note: If your additions and mutations return the same kind of map as the map you are defining, you should inherit from MapLike as well.

A

the type of the keys in this map.

B

the type of the values associated with keys


5.然后也试着通过使用{}的符号来写内联函数来调用map。

scala> myNumbers.map{x => x * x * x}
res3: List[Int] = List(1, 8, 125, 64, 343, 27)

6.定义了一个阶乘函数,计算Ñ! =1*2*...* N给定的输入N。你可以使用一个循环或递归,在我们的解决方案,我们使用递归(见step1的5-7)。然后来计算myNumbers的阶乘的总和。

scala> def factorial(n:Int):Int = if (n==0) 1 else n * factorial(n-1)
factorial: (n: Int)Int
scala> myNumbers.map(factorial).sum
res4: Int = 5193

scala API中的SUM函数参照如下:

A class for immutable linked lists representing ordered collections of elements of type.

This class comes with two implementing case classes scala.Nil and scala.:: that implement the abstract members isEmptyhead and tail.

This class is optimal for last-in-first-out (LIFO), stack-like access patterns. If you need another access pattern, for example, random access or FIFO, consider using a collection more suited to this than List.

Source
List.scala
Example:
  1. // Make a list via the companion object factory
    val days = List("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
    
    // Make a list element-by-element
    val when = "AM" :: "PM" :: List()
    
    // Pattern match
    days match {
      case firstDay :: otherDays =>
        println("The first day of the week is: " + firstDay)
      case List() =>
        println("There don't seem to be any week days.")
    }

7.这是一个更具挑战性的任务,可能需要10分钟或更长时间才能完成。我们来做一个文本文件的单词计数。更具体地,通过单词作为键,单词出现的次数作为值来创建map对象。

7.1 你可以加载一个文本文件作为行如下所示的数组:

scala> val lines = Source.fromFile("/root/spark-0.8.0/README.md").getLines.toArray

lines: Array[String] = Array(# Apache Spark, "", Lightning-Fast Cluster Computing - <http://spark.incubator.apache.org/>, "", "", ## Online Documentation, "", You can find the latest Spark documentation, including a programming, guide, on the project webpage at <http://spark.incubator.apache.org/documentation.html>., This README file only contains basic setup instructions., "", "", ## Building, "", Spark requires Scala 2.9.3 (Scala 2.10 is not yet supported). The project is, built using Simple Build Tool (SBT), which is packaged with it. To build, Spark and its example programs, run:, "", "    sbt/sbt assembly", "", Once you've built Spark, the easiest way to start using it is the shell:, "", "    ./spark-shell", "", Or, for the Python API, the Python shell (`./pyspark`)., "", Spark als...

7.2

scala> val counts = new collection.mutable.HashMap[String, Int].withDefaultValue(0)
counts: scala.collection.mutable.Map[String,Int] = Map()

7.3

scala> lines.flatMap(line => line.split(" ")).foreach(word => counts(word) += 1) 
scala> counts

res6: scala.collection.mutable.Map[String,Int] = Map(request, -> 1, Documentation -> 1, requires -> 1, Each -> 1, their -> 1, code, -> 1, instructions. -> 1, MRv1, -> 1, basic -> 1, (Scala -> 1, SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 -> 1, must -> 1, Incubator -> 1, Regression -> 1, Hadoop, -> 1, Online -> 1, thread, -> 1, 1.0.1 -> 1, projects -> 1, v2 -> 1, org.apache.spark.examples.SparkLR -> 1, Cloudera -> 4, POM -> 1, To -> 2, is -> 10, contribution -> 1, Building -> 1, yet -> 2, adding -> 1, required -> 1, usage -> 1, Versions -> 1, does -> 1, application, -> 1, described -> 1, If -> 1, sponsored -> 1, 2 -> 1, About -> 1, uses -> 1, through -> 1, can -> 3, email, -> 1, This -> 2, MapReduce -> 2, gladly -> 1, Please -> 1, one -> 2, # -> 5, including -> 1, against -> 1, While -> 1, ASF ...

或用函数方法来解决:

scala> import scala.io.Source
import scala.io.Source
scala>  val lines = Source.fromFile("/root/spark-0.8.0/README.md").getLines.toArray
lines: Array[String] = Array(# Apache Spark, "", Lightning-Fast Cluster Computing - <http://spark.incubator.apache.org/>, "", "", ## Online Documentation, "", You can find the latest Spark documentation, including a programming, guide, on the project webpage at <http://spark.incubator.apache.org/documentation.html>., This README file only contains basic setup instructions., "", "", ## Building, "", Spark requires Scala 2.9.3 (Scala 2.10 is not yet supported). The project is, built using Simple Build Tool (SBT), which is packaged with it. To build, Spark and its example programs, run:, "", "    sbt/sbt assembly", "", Once you've built Spark, the easiest way to start using it is the shell:, "", "    ./spark-shell", "", Or, for the Python API, the Python shell (`./pyspark`)., "", Spark als...
scala> val emptyCounts = Map[String,Int]().withDefaultValue(0)
emptyCounts: scala.collection.immutable.Map[String,Int] = Map()
scala> val words = lines.flatMap(line => line.split(" "))
words: Array[String] = Array(#, Apache, Spark, "", Lightning-Fast, Cluster, Computing, -, <http://spark.incubator.apache.org/>, "", "", ##, Online, Documentation, "", You, can, find, the, latest, Spark, documentation,, including, a, programming, guide,, on, the, project, webpage, at, <http://spark.incubator.apache.org/documentation.html>., This, README, file, only, contains, basic, setup, instructions., "", "", ##, Building, "", Spark, requires, Scala, 2.9.3, (Scala, 2.10, is, not, yet, supported)., The, project, is, built, using, Simple, Build, Tool, (SBT),, which, is, packaged, with, it., To, build, Spark, and, its, example, programs,, run:, "", "", "", "", "", sbt/sbt, assembly, "", Once, you've, built, Spark,, the, easiest, way, to, start, using, it, is, the, shell:, "", "", "", "",...
scala> val counts = words.foldLeft(emptyCounts)({(currentCounts: Map[String,Int], word: String) => currentCounts.updated(word, currentCounts(word) + 1)})
counts: scala.collection.immutable.Map[String,Int] = Map(Please -> 1, CPUs. -> 1, Contributing -> 1, Regression -> 1, application -> 1, please -> 1, "" -> 113, convenience, -> 1, for -> 2, find -> 1, Apache -> 9, further -> 1, Each -> 1, adding -> 1, `SPARK_YARN=true`: -> 1, Hadoop, -> 1, any -> 2, review -> 1, Once -> 1, (SBT), -> 1, For -> 5, this -> 3, protocols -> 1, in -> 4, local[2] -> 1, "local[N]" -> 1, parameter -> 1, have -> 3, your -> 6, MapReduce -> 2, </dependency> -> 1, manner -> 1, <params>`. -> 1, are -> 2, is -> 10, source -> 2, HDFS -> 1, agree -> 1, built -> 3, 2.10 -> 1, POM -> 1, effort -> 1, % -> 2, thread, -> 1, developing -> 1, All -> 1, using -> 4, endorsed -> 1, MRv2, -> 1, 0.23.x, -> 1, shell -> 1, mesos:// -> 1, specify -> 1, connect -> 1, easiest -> 1, This ...
scala> counts
res7: scala.collection.immutable.Map[String,Int] = Map(Please -> 1, CPUs. -> 1, Contributing -> 1, Regression -> 1, application -> 1, please -> 1, "" -> 113, convenience, -> 1, for -> 2, find -> 1, Apache -> 9, further -> 1, Each -> 1, adding -> 1, `SPARK_YARN=true`: -> 1, Hadoop, -> 1, any -> 2, review -> 1, Once -> 1, (SBT), -> 1, For -> 5, this -> 3, protocols -> 1, in -> 4, local[2] -> 1, "local[N]" -> 1, parameter -> 1, have -> 3, your -> 6, MapReduce -> 2, </dependency> -> 1, manner -> 1, <params>`. -> 1, are -> 2, is -> 10, source -> 2, HDFS -> 1, agree -> 1, built -> 3, 2.10 -> 1, POM -> 1, effort -> 1, % -> 2, thread, -> 1, developing -> 1, All -> 1, using -> 4, endorsed -> 1, MRv2, -> 1, 0.23.x, -> 1, shell -> 1, mesos:// -> 1, specify -> 1, connect -> 1, easiest -> 1, This ->...
scala> 


你可能感兴趣的:(scala,spark,AMP训练营)