Spark Job Server 0.7.0部署和使用

##安装Scala

在Scala官网下载合适的版本
解压到/usr/local/scala目录下(目录可随意修改)
在linux下加入环境变量

export PATH="$PATH:/usr/scala/bin"

输入scala检查是否安装成功

##手动安装sbt

在官网下载sbt,可以用zip或tgz
解压到/usr/local/sbt目录下
在/usr/local/sbt目录下新建sbt文件

cd /usr/local/sbt
vi sbt

输入以下内容(-XX:MaxPermSize=256M 在JAVA 1.8可以取消):

SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
java $SBT_OPTS -jar /usr/local/sbt/bin/sbt-launch.jar "$@"    

配置仓库

vi ~/.sbt/repositories

输入以下内容

[repositories]
  local
  aliyun-nexus: http://maven.aliyun.com/nexus/content/groups/public/
  #或者oschina: http://maven.oschina.net/content/groups/public/
  jcenter: http://jcenter.bintray.com/
  typesafe-ivy-releases: http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
  maven-central: http://repo1.maven.org/maven2/

配置环境变量

export SBT_HOME=/usr/local/sbt
export PATH=$PATH:$SBT_HOME

输入sbt命令检查是否安装成功
sbt第一次执行时会自动下载包,等出现sbt控制台即配置完成

sbt:sbt> 

##部署spark-jobserver
###配置

github地址
设置配置环境为local
复制local.conf.template为local.conf,local.sh.template为local.sh

cd /home/hadoop/application/spark-jobserver/conf
cp local.conf.template local.conf
cp local.sh.template local.sh

vi local.sh

#!/usr/bin/env bash

# Environment and deploy file
# For use with bin/server_deploy, bin/server_package etc.
#ssh远程部署host,可以使用ip
DEPLOY_HOSTS="dashuju213
              dashuju214"

#ssh安装时用户名和用户组
APP_USER=hadoop
APP_GROUP=hadoop
JMX_PORT=9999
# optional SSH Key to login to deploy server
#SSH_KEY=/path/to/keyfile.pem
# deploy安装目录
INSTALL_DIR=/home/hadoop/application/jobserver
# 日志目录
LOG_DIR=/home/hadoop/application/jobserver/logs
PIDFILE=spark-jobserver.pid
JOBSERVER_MEMORY=1G
#SPARK版本
SPARK_VERSION=2.3.2
MAX_DIRECT_MEMORY=512M
#SPARK目录
SPARK_HOME=/home/hadoop/application/spark
SPARK_CONF_DIR=$SPARK_HOME/conf
#SCALA版本
SCALA_VERSION=2.11.12

配置数据库
vi local.conf,只列出需要修改的配置

# also add the following line at the root level.
flyway.locations="db/mysql/migration"

spark {
  # local[...], yarn, mesos://... or spark://...
  master = "spark://dashuju213:6066,dashuju214:6066"

  # client or cluster deployment
  submit.deployMode = "cluster"

  # Default # of CPUs for jobs to use for Spark standalone cluster
  job-number-cpus = 2
  
  jobserver {
    ...
    sqldao {
      # Slick database driver, full classpath
      slick-driver = slick.driver.MySQLDriver

      # JDBC driver, full classpath
      jdbc-driver = com.mysql.jdbc.Driver

      jdbc {
        url = "jdbc:mysql://db_host/spark_jobserver"
        user = "jobserver"
        password = "secret"
      }

      dbcp {
        maxactive = 20
        maxidle = 10
        initialsize = 10
      }
    }
  }
}

配置ssh免密登录
配置ssh端口,默认使用了22端口,可以根据需要修改
vi server_deploy.sh

for host in $DEPLOY_HOSTS; do
  # We assume that the deploy user is APP_USER and has permissions
  ssh -p 2222 -o StrictHostKeyChecking=no $ssh_key_to_use  ${APP_USER}@$host mkdir -p $INSTALL_DIR
  scp -P 2222 -o StrictHostKeyChecking=no $ssh_key_to_use  $FILES ${APP_USER}@$host:$INSTALL_DIR/
  scp -P 2222 -o StrictHostKeyChecking=no $ssh_key_to_use  "$CONFIG_DIR/$ENV.conf" ${APP_USER}@$host:$INSTALL_DIR/
  scp -P 2222 -o StrictHostKeyChecking=no $ssh_key_to_use  "$configFile" ${APP_USER}@$host:$INSTALL_DIR/settings.sh
done

###部署

进入bin目录下,执行部署命令

./server_deploy.sh local

执行完成后,进入INSTALL_DIR中的目录,使用server_start.sh和server_stop.sh进行启停

###遇到的问题
####启动问题

因为我在spark-defult.xml中配置了master和deployMode,因此需要修改server_start.sh,改为需要的方式

cmd='$SPARK_HOME/bin/spark-submit --master local[1] --deploy-mode

####数据库初始化失败

修改spark-jobserver\spark-jobserver-master\job-server\src\main\resources\db\mysql\migration\V0_7_2\V0_7_2__convert_binaries_table_to_use_milliseconds.sql
可以重新执行部署命令或直接修改jar包中文件

ALTER TABLE `BINARIES` MODIFY COLUMN `UPLOAD_TIME` TIMESTAMP;

Validate failed. Migration Checksum mismatch for migration 0.7.2
由于初始化失败造成,删除数据库下所有表,重新初始化

####java.lang.ClassNotFoundException: akka.event.slf4j.Slf4jLogger

修改project/Dependencies.scala

    "com.typesafe.akka" %% "akka-slf4j" % akka % "provided",
    ...
    "io.spray" %% "spray-routing" % spray,

改为

    "com.typesafe.akka" %% "akka-slf4j" % akka,
    ...
    "io.spray" %% "spray-routing-shapeless23" % "1.3.4",

project/Versions.scala 新增

  lazy val mysql = "5.1.42"

###使用
####执行spark-sql

修改local.conf

spark {
  jobserver {
    # Automatically load a set of jars at startup time.  Key is the appName, value is the path/URL.
    job-binary-paths {    # NOTE: you may need an absolute path below
      sql = job-server-extras/target/scala-2.10/job-server-extras_2.10-0.6.2-SNAPSHOT-tests.jar
    }
  }
  
  contexts {
    sql-context {
      num-cpu-cores = 1           # Number of cores to allocate.  Required.
      memory-per-node = 512m         # Executor memory per node, -Xmx style eg 512m, 1G, etc.
      context-factory = spark.jobserver.context.HiveContextFactory
    }
  }  
}

你可能感兴趣的:(spark,大数据)