Scala常用函数

Scala常用函数

1、基本属性和常用操作

val list = List(23, 54, 68, 91, 15)

(1)获取集合长度

println(list.length)

(2)获取集合大小

println(list.size)

(3)循环遍历

for( elem <- list ) print(elem + "\t")
list.foreach( elem => print(elem + "\t") )

(4)迭代器

val iter = list.iterator
while(iter.hasNext) print(iter.next() + "\t")
for( elem <- list.iterator ) print(elem + "\t")

(5)生成字符串

println(list.mkString("--"))

(6)是否包含

println(list.contains(23))

2、衍生集合

val list1 = List(23, 54, 68, 91, 15)
val list2 = List(35, 48, 69, 54, 23, 91, 102, 68, 23, 56)

(1)获取集合的头

println(list1.head)

(2)获取集合的尾(不是头的就是尾)

println(list1.tail)

(3)集合最后一个数据

println(list1.last)

(4)集合初始数据(不包含最后一个)

println(list1.init)

(5)反转

println(list1.reverse)

(6)取前(后)n个元素

println(list2.take(5))
//println(list2.reverse.take(3).reverse)
println(list2.takeRight(3))

(7)去掉前(后)n个元素

println(list2.drop(2))
println(list2.dropRight(5))

(8)并集

println(list1.union(list2))

(9)交集

println(list1.intersect(list2))

(10)差集

println(list1.diff(list2))

(11)拉链

println(list1.zip(list2))

(12)滑窗

list2.sliding(3).foreach( println )
list2.sliding(5, 3).foreach( println )

3、集合计算的初级函数

​ (1)求和

println(list.sum)

​ (2)求乘积

println(list.product)

​ (3)最大值

println(list.max)

​ (4)最小值

println(list.min)

​ (5)排序

// (5.1)按照元素大小排序
    println(list.sortBy(x => x))

    // (5.2)按照元素的绝对值大小排序
    println(list.sortBy(x => x.abs))

    // (5.3)按元素大小升序排序
	println(list.sortWith((x, y) => x < y))

	// (5.4)按元素大小降序排序
    println(list.sortWith((x, y) => x > y))

3、集合计算高级函数

 //(1)过滤
        println(list.filter(x => x % 2 == 0))

        //(2)转化/映射
        println(list.map(x => x + 1))

        //(3)扁平化
        println(nestedList.flatten)

        //(4)扁平化+映射 注:flatMap相当于先进行map操作,在进行flatten操作
        println(wordList.flatMap(x => x.split(" ")))

        //(5)分组
        println(list.groupBy(x => x % 2))

reduce方法

val list = List(1,2,3,4)

        // 将数据两两结合,实现运算规则
        val i: Int = list.reduce( (x,y) => x-y )
        println("i = " + i)

        // 从源码的角度,reduce底层调用的其实就是reduceLeft
        //val i1 = list.reduceLeft((x,y) => x-y)

        // ((4-3)-2-1) = -2
        val i2 = list.reduceRight((x,y) => x-y)
        println(i2)

Fold方法

val list = List(1,2,3,4)

        // fold方法使用了函数柯里化,存在两个参数列表
        // 第一个参数列表为 : 零值(初始值)
        // 第二个参数列表为: 简化规则

        // fold底层其实为foldLeft
        val i = list.foldLeft(1)((x,y)=>x-y)

        val i1 = list.foldRight(10)((x,y)=>x-y)

        println(i)
        println(i1)
        // 两个Map的数据合并
        val map1 = mutable.Map("a"->1, "b"->2, "c"->3)
        val map2 = mutable.Map("a"->4, "b"->5, "d"->6)

        val map3: mutable.Map[String, Int] = map2.foldLeft(map1) {
            (map, kv) => {
                val k = kv._1
                val v = kv._2

                map(k) = map.getOrElse(k, 0) + v

                map
            }
        }

        println(map3)

4、普通WordCount案例

def main(args: Array[String]): Unit = {

        // 单词计数:将集合中出现的相同的单词,进行计数,取计数排名前三的结果
        val stringList = List("Hello Scala Hbase kafka", "Hello Scala Hbase", "Hello Scala", "Hello")

        // 1) 将每一个字符串转换成一个一个单词
        val wordList: List[String] = stringList.flatMap(str=>str.split(" "))
        //println(wordList)

        // 2) 将相同的单词放置在一起
        val wordToWordsMap: Map[String, List[String]] = wordList.groupBy(word=>word)
        //println(wordToWordsMap)

        // 3) 对相同的单词进行计数
        // (word, list) => (word, count)
        val wordToCountMap: Map[String, Int] = wordToWordsMap.map(tuple=>(tuple._1, tuple._2.size))

        // 4) 对计数完成后的结果进行排序(降序)
        val sortList: List[(String, Int)] = wordToCountMap.toList.sortWith {
            (left, right) => {
                left._2 > right._2
            }
        }

        // 5) 对排序后的结果取前3名
        val resultList: List[(String, Int)] = sortList.take(3)

        println(resultList)
    }

5、复杂WordCount案例

def main(args: Array[String]): Unit = {

        // 第一种方式(不通用)
        val tupleList = List(("Hello Scala Spark World ", 4), ("Hello Scala Spark", 3), ("Hello Scala", 2), ("Hello", 1))

        val stringList: List[String] = tupleList.map(t=>(t._1 + " ") * t._2)

        //val words: List[String] = stringList.flatMap(s=>s.split(" "))
        val words: List[String] = stringList.flatMap(_.split(" "))

        //在map中,如果传进来什么就返回什么,不要用_省略
        val groupMap: Map[String, List[String]] = words.groupBy(word=>word)
        //val groupMap: Map[String, List[String]] = words.groupBy(_)

        // (word, list) => (word, count)
        val wordToCount: Map[String, Int] = groupMap.map(t=>(t._1, t._2.size))

        val wordCountList: List[(String, Int)] = wordToCount.toList.sortWith {
            (left, right) => {
                left._2 > right._2
            }
        }.take(3)

        //tupleList.map(t=>(t._1 + " ") * t._2).flatMap(_.split(" ")).groupBy(word=>word).map(t=>(t._1, t._2.size))
        println(wordCountList)
    }
==================================
//方式二
def main(args: Array[String]): Unit = {

        val tuples = List(("Hello Scala Spark World", 4), ("Hello Scala Spark", 3), ("Hello Scala", 2), ("Hello", 1))

        // (Hello,4),(Scala,4),(Spark,4),(World,4)
        // (Hello,3),(Scala,3),(Spark,3)
        // (Hello,2),(Scala,2)
        // (Hello,1)
        val wordToCountList: List[(String, Int)] = tuples.flatMap {
            t => {
                val strings: Array[String] = t._1.split(" ")
                strings.map(word => (word, t._2))
            }
        }

        // Hello, List((Hello,4), (Hello,3), (Hello,2), (Hello,1))
        // Scala, List((Scala,4), (Scala,3), (Scala,2)
        // Spark, List((Spark,4), (Spark,3)
        // Word, List((Word,4))
        val wordToTupleMap: Map[String, List[(String, Int)]] = wordToCountList.groupBy(t=>t._1)

        val stringToInts: Map[String, List[Int]] = wordToTupleMap.mapValues {
            datas => datas.map(t => t._2)
        }
        stringToInts

        /*
        val wordToCountMap: Map[String, List[Int]] = wordToTupleMap.map {
            t => {
                (t._1, t._2.map(t1 => t1._2))
            }
        }

        val wordToTotalCountMap: Map[String, Int] = wordToCountMap.map(t=>(t._1, t._2.sum))
        println(wordToTotalCountMap)
        */
    }

你可能感兴趣的:(Scala,scala,大数据)