公众号推荐:
菜鸟学Python
机器学习算法与自然语言处理
CSDN
程序猿
程序员头条
程序员大咖
java葵花宝典
码农有道
过往记忆大数据
高效运维
spark学习技巧
大数据猿
var map = Map[String,Int]()
val list = List("k1","k2","k3")
map += ("k1" -> 1,"k2" -> 2)
map ++= Map("k11" -> 1,"k22" -> 2)
map -= ("k1","k2")
map --= List("k11","k2")
var map = Map[String,Int]()
val l = List("1","2","3")
l.foreach(x=>{
map +=((x,x.toInt))
})
println(map)
}
在scala中,函数就像和数字、字符串一样,可以将函数传递给一个方法。我们可以对算法进行封装,然后将具体的动作传递给方法,这种特性很有用。
我们之前学习过List的map方法,它就可以接收一个函数,完成List的转换。
本身也是函数的一种。
在scala和spark的源代码中,大量使用到了柯里化。为了后续方便阅读源代码,我们需要来了解下柯里化。
柯里化(Currying)是指将原先接受多个参数的方法转换为多个只有一个参数的参数列表的过程。
柯里化一
scala> def m2(x:Int)(y:Int)=x*y
m2: (x: Int)(y: Int)Int
scala> m2(2)(3)
res3: Int = 6
scala> val f2 = m2(2)
:13: error: missing argument list for method m2
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `m2 _` or `m2(_)(_)` instead of `m2`.
val f2 = m2(2)
^
scala> val f2 = m2(2)_
f2: Int => Int =
scala> f2(3)
res4: Int = 6
柯里化二
scala> def m3(x:Int)=(y:Int)=>x*y
m3: (x: Int)Int => Int
scala> val f3 = m3(2)
f3: Int => Int =
scala> f3(3)
res5: Int = 6
需求:设计一个柯里化方法,第一个参数列表需要一个输入类型为空,返回值为Int的函数,第二个参数列表需要输入类型为Int,输出类型为空的函数。 最终方法的方法体是用第二个函数调用第一个函数。
scala> def m4(f1:()=>Int)(f2:Int=>Unit)={
| f2(f1())}
m4: (f1: () => Int)(f2: Int => Unit)Unit
scala> def f1()=3
f1: ()Int
scala> def f2(x:Int)=println(x*1000)
f2: (x: Int)Unit
scala> m4(f1)(f2)
3000
scala> def m5(compute:()=>Map[String,Int])(save:Map[String,Int]=>Unit)={
| save(compute())}
m5: (compute: () => Map[String,Int])(save: Map[String,Int] => Unit)Unit
scala> def c()={
| Map("spark" -> 1,"hadoop" -> 7, "sqoop" -> 1, "flume" -> 1)}
c: ()scala.collection.immutable.Map[String,Int]
scala> def s(m:Map[String,Int])={
| println(m)}
s: (m: Map[String,Int])Unit
scala> m5(c)(s)
Map(spark -> 1, hadoop -> 7, sqoop -> 1, flume -> 1)
定义一个闭包
val y=10
val add=(x:Int)=>{
x+y
}
println(add(5)) // 结果15
add函数就是一个闭包
//普通人
class PTpeople
//超人
class SuperMan(s:String){
//超人能飞
def fly(): Unit ={
println("i can fly in the sky~~~")
}
}
object obj{
//隐式转换方法:implicit def 方法名 (参数:需要转换的类) = new 目标类
implicit def pt2super(pt:PTpeople) = new SuperMan("sadf")
}
object o {
def main(args: Array[String]): Unit = {
val putong: PTpeople = new PTpeople
import obj.pt2super
putong.fly()
}
}
class superman{
def fly={
println("i am flsdfsadfing ")
}
}
使用步骤
定义
创建Actor
参考代码
case class SubmitTaskMessage(msg:String)
case class SuccessSubmitTaskMessage(msg:String)
// 注意:要导入的是Akka下的Actor
object SenderActor extends Actor {
override def preStart(): Unit = println("执行SenderActor的preStart()方法")
override def receive: Receive = {
case "start" =>
val receiveActor = this.context.actorSelection("/user/receiverActor")
receiveActor ! SubmitTaskMessage("请完成#001任务!")
case SuccessSubmitTaskMessage(msg) =>
println(s"接收到来自${sender.path}的消息: $msg")
}
}
object ReceiverActor extends Actor {
override def preStart(): Unit = println("执行ReceiverActor()方法")
override def receive: Receive = {
case SubmitTaskMessage(msg) =>
println(s"接收到来自${sender.path}的消息: $msg")
sender ! SuccessSubmitTaskMessage("完成提交")
case _ => println("未匹配的消息类型")
}
}
object Entrance {
def main(args: Array[String]): Unit = {
val actorSystem = ActorSystem("SimpleAkkaDemo", ConfigFactory.load())
val senderActor: ActorRef = actorSystem.actorOf(Props(SenderActor), "senderActor")
val receiverActor: ActorRef = actorSystem.actorOf(Props(ReceiverActor), "receiverActor")
senderActor ! "start"
}
}
Akka中,提供一个scheduler对象来实现定时调度功能。使用ActorSystem.scheduler.schedule方法,可以启动一个定时任务。
第一种:发送消息
def schedule(
initialDelay: FiniteDuration, // 延迟多久后启动定时任务
interval: FiniteDuration, // 每隔多久执行一次
receiver: ActorRef, // 给哪个Actor发送消息
message: Any) // 要发送的消息
(implicit executor: ExecutionContext) // 隐式参数:需要手动导入
第二种:自定义实现
def schedule(
initialDelay: FiniteDuration, // 延迟多久后启动定时任务
interval: FiniteDuration // 每隔多久执行一次
)(f: ⇒ Unit) // 定期要执行的函数,可以将逻辑写在这里
(implicit executor: ExecutionContext) // 隐式参数:需要手动导入
参考代码
Entrance.scala
val workerActorSystem = ActorSystem("actorSystem", ConfigFactory.load())
val workerActor: ActorRef = workerActorSystem.actorOf(Props(WorkerActor), "WorkerActor")
// 发送消息给WorkerActor
workerActor ! "setup"
WorkerActor.scala
object WorkerActor extends Actor{
override def receive: Receive = {
case "setup" =>
println("WorkerActor:启动Worker")
}
}
参考代码
Entrance.scala
val masterActorSystem = ActorSystem("MasterActorSystem", ConfigFactory.load())
val masterActor: ActorRef = masterActorSystem.actorOf(Props(MasterActor), "MasterActor")
MasterActor.scala
object MasterActor extends Actor{
override def receive: Receive = {
case "connect" =>
println("2. Worker连接到Master")
sender ! "success"
}
}
WorkerActor.scala
object WorkerActor extends Actor{
override def receive: Receive = {
case "setup" =>
println("1. 启动Worker...")
val masterActor = context.actorSelection("akka.tcp://[email protected]:9999/user/MasterActor")
// 发送connect
masterActor ! "connect"
case "success" =>
println("3. 连接Master成功...")
}
}
模拟Spark的Master与Worker通信
步骤
工程名 | 说明 |
---|---|
spark-demo-common | 存放公共的消息、实体类 |
spark-demo-master | Akka Master节点 |
spark-demo-worker | Akka Worker节点 |
步骤
参考代码
Master.scala
val sparkMasterActorSystem = ActorSystem("sparkMaster", ConfigFactory.load())
val masterActor = sparkMasterActorSystem.actorOf(Props(MasterActor), "masterActor")
MasterActor.scala
object MasterActor extends Actor{
override def receive: Receive = {
case x => println(x)
}
}
Worker.scala
val sparkWorkerActorSystem = ActorSystem("sparkWorker", ConfigFactory.load())
sparkWorkerActorSystem.actorOf(Props(WorkerActor), "workerActor")
WorkerActor.scala
object WorkerActor extends Actor{
override def receive: Receive = {
case x => println(x)
}
}
步骤
参考代码
MasterActor.scala
object MasterActor extends Actor{
private val regWorkerMap = collection.mutable.Map[String, WorkerInfo]()
override def receive: Receive = {
case WorkerRegisterMessage(workerId, cpu, mem) => {
println(s"1. 注册新的Worker - ${workerId}/${cpu}核/${mem/1024.0}G")
regWorkerMap += workerId -> WorkerInfo(workerId, cpu, mem, new Date().getTime)
sender ! RegisterSuccessMessage
}
}
}
WorkerInfo.scala
/**
* 工作节点信息
* @param workerId workerid
* @param cpu CPU核数
* @param mem 内存多少
* @param lastHeartBeatTime 最后心跳更新时间
*/
case class WorkerInfo(workerId:String, cpu:Int, mem:Int, lastHeartBeatTime:Long)
MessagePackage.scala
/**
* 注册消息
* @param workerId
* @param cpu CPU核数
* @param mem 内存大小
*/
case class WorkerRegisterMessage(workerId:String, cpu:Int, mem:Int)
/**
* 注册成功消息
*/
case object RegisterSuccessMessage
WorkerActor.scala
object WorkerActor extends Actor{
private var masterActor:ActorSelection = _
private val CPU_LIST = List(1, 2, 4, 6, 8)
private val MEM_LIST = List(512, 1024, 2048, 4096)
override def preStart(): Unit = {
masterActor = context.system.actorSelection("akka.tcp://[email protected]:7000/user/masterActor")
val random = new Random()
val workerId = UUID.randomUUID().toString.hashCode.toString
val cpu = CPU_LIST(random.nextInt(CPU_LIST.length))
val mem = MEM_LIST(random.nextInt(MEM_LIST.length))
masterActor ! WorkerRegisterMessage(workerId, cpu, mem)
}
...
}
步骤
参考代码
ConfigUtil.scala
object ConfigUtil {
private val config: Config = ConfigFactory.load()
val `worker.heartbeat.interval` = config.getInt("worker.heartbeat.interval")
}
MessagePackage.scala
package com.itheima.spark.common
...
/**
* Worker心跳消息
* @param workerId
* @param cpu CPU核数
* @param mem 内存大小
*/
case class WorkerHeartBeatMessage(workerId:String, cpu:Int, mem:Int)
WorkerActor.scala
object WorkerActor extends Actor{
...
override def receive: Receive = {
case RegisterSuccessMessage => {
println("2. 成功注册到Master")
import scala.concurrent.duration._
import context.dispatcher
context.system.scheduler.schedule(0 seconds,
ConfigUtil.`worker.heartbeat.interval` seconds){
// 发送心跳消息
masterActor ! WorkerHeartBeatMessage(workerId, cpu, mem)
}
}
}
}
MasterActor.scala
object MasterActor extends Actor{
...
override def receive: Receive = {
...
case WorkerHeartBeatMessage(workerId, cpu, mem) => {
println("3. 接收到心跳消息, 更新最后心跳时间")
regWorkerMap += workerId -> WorkerInfo(workerId, cpu, mem, new Date().getTime)
}
}
}
步骤
参考代码
ConfigUtil.scala
object ConfigUtil {
private val config: Config = ConfigFactory.load()
// 心跳检查时间间隔
val `master.heartbeat.check.interval` = config.getInt("master.heartbeat.check.interval")
// 心跳超时时间
val `master.heartbeat.check.timeout` = config.getInt("master.heartbeat.check.timeout")
}
MasterActor.scala
override def preStart(): Unit = {
import scala.concurrent.duration._
import context.dispatcher
context.system.scheduler.schedule(0 seconds,
ConfigUtil.`master.heartbeat.check.interval` seconds) {
// 过滤出来超时的worker
val timeoutWorkerList = regWorkerMap.filter {
kv =>
if (new Date().getTime - kv._2.lastHeartBeatTime > ConfigUtil.`master.heartbeat.check.timeout` * 1000) {
true
}
else {
false
}
}
if (!timeoutWorkerList.isEmpty) {
regWorkerMap --= timeoutWorkerList.map(_._1)
println("移除超时的worker:")
timeoutWorkerList.map(_._2).foreach {
println(_)
}
}
if (!regWorkerMap.isEmpty) {
val sortedWorkerList = regWorkerMap.map(_._2).toList.sortBy(_.mem).reverse
println("可用的Worker列表:")
sortedWorkerList.foreach {
var rank = 1
workerInfo =>
println(s"<${rank}> ${workerInfo.workerId}/${workerInfo.mem}/${workerInfo.cpu}")
rank = rank + 1
}
}
}
}
...
}
步骤