记spark过程中Scala多线程小问题

这次更改ThriftServer源码,加了些业务,中间遇到这样一个问题,异步提交任务的时候想做成多线程,刚开始是使用的scala的Actor,传递了SQLContext和sql,发现每次sparkSessionId在一直变化,每次提交和触发Action之后产生的sessionId都不一致,这是怎么回事,后来才发现是多线程异步的问题,传递sqlContext在线程那边执行任务的时候会重新触发一个会话,那可怎么办呢,只能用以下方式实现了

java.util.concurrent.ExecutorService executorService = Executors.newFixedThreadPool(2);
executorService.submit(new Callable() {
@Override
public Void call(){
        df.rdd().saveAsTextFile(rb.getString("hdfspath") + "/file3",com.hadoop.compression.lzo.LzopCodec.class);
        return null;
}
});
executorService.submit(new Callable() {
@Override
public Void call(){
        df.rdd().saveAsTextFile(rb.getString("hdfspath") + "/file4",com.hadoop.compression.lzo.LzopCodec.class);
        return null;
}
});

executorService.shutdown();

在当前方法内部使用context变量就可以了

当然为了方便大家学习,另外常见的写法如下:

import java.util.concurrent.{Executors, ExecutorService}

 object Test {
     def main(args: Array[String]) {
//创建线程池
val threadPool:ExecutorService=Executors.newFixedThreadPool(5)
try {
//提交5个线程
for(i <- 1 to 5){
//threadPool.submit(new ThreadDemo("thread"+i))
threadPool.execute(new ThreadDemo("thread"+i))
             }
         }finally {
             threadPool.shutdown()
         }
     }

//定义线程类,每打印一次睡眠100毫秒
class ThreadDemo(threadName:String) extends Runnable{
         override def run(){
for(i <- 1 to 10){
                 println(threadName+"|"+i)
                 Thread.sleep(100)
             }
         }
     }
 }

Callable示例

import java.util.concurrent.{Callable, FutureTask, Executors, ExecutorService}

object Test {
  def main(args: Array[String]) {
    val threadPool:ExecutorService=Executors.newFixedThreadPool(3)
    try {
      val future=new FutureTask[String](new Callable[String] {
        override def call(): String = {
          Thread.sleep(100)
          return "im result"
        }
      })
      threadPool.execute(future)
      println(future.get())
    }finally {
      threadPool.shutdown()
    }
  }
}

 

你可能感兴趣的:(Spark,大数据)