大数据集群安装03之spark配置

spark配置

必看

配置千万条,网络第一条。配置不规范,bug改到吐。 内外ip要分清,本机配置内ip,连接请用外ip

1.下载上传插件rz

【安装命令】:

yum install -y lrzsz

2.上传spark压缩包

【上传命令】:

## 上传压缩包
rz

## 压缩
tar -zxvf [包名]

3.配置spark

(1)编辑.bashrc文件
在所有节点的.bashrc文件中添加如下内容:(也可以在profile文件中添加)

# jdk
export JAVA_HOME=/root/jdk1.8.0_241
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

# hadoop
export HADOOP_HOME=/root/hadoop-2.7.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_HOME_WARN_SUPPRESS=1

# spark
export SPARK_HOME=/root/spark-2.4.5-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

其中JAVA_HOME,SPARK_HOME和HADOOP_HOME需要换成你自己的安装路径。

执行:source .bashrc使编辑的内容生效。

4.编辑/conf文件

  • 配置slaves文件
    将 slaves.template 拷贝到 slaves
cp slaves.template slaves

Shell 命令

slaves文件设置Worker节点。编辑slaves内容,把默认内容localhost替换成如下内容:

hadoop-02
hadoop-03
  • 配置spark-env.sh文件

    将 spark-env.sh.template 拷贝到 spark-env.sh

    cp spark-env.sh.template spark-env.sh
    

    Shell 命令

    编辑spark-env.sh,添加如下内容:

    export SPARK_DIST_CLASSPATH=$(/root/hadoop-2.7.1 classpath)
    export SPARK_MASTER_IP=192.168.0.4
    export SPARK_MASTER_PORT=7077
    export SPARK_MASTER_WEBUI_POST=8080
    export SPARK_WORKER_MEMORY=500M
    export SPARK_WORKER_PORT=7078
    export SPARK_WORKER_WEBUI_PORT=8081
    export JAVA_HOME=/root/jdk1.8.0_241
    export HADOOP_HOME=/root/hadoop-2.7.1
    export HADOOP_CONF_DIR=/root/hadoop-2.7.1/etc/hadoop
    
    

    SPARK_MASTER_IP 指定 Spark 集群 Master 节点的 IP 地址;

配置好后,将Master主机上的/usr/local/spark文件夹复制到各个节点上。在Master主机上执行如下命令:将配置好的hadoop文件复制到其他节点上

scp -r /root/spark-2.4.5-bin-hadoop2.7/conf root@hadoop-02:/root/spark-2.4.5-bin-hadoop2.7/

scp -r /root/spark-2.4.5-bin-hadoop2.7/conf root@hadoop-03:/root/spark-2.4.5-bin-hadoop2.7/

报错:

如果worker报这个错误,那么在spark-env.sh中加入spark-local-ip=外网ip

20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7078. Attempting port 7079.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7079. Attempting port 7080.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7080. Attempting port 7081.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7081. Attempting port 7082.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7082. Attempting port 7083.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7083. Attempting port 7084.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7084. Attempting port 7085.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7085. Attempting port 7086.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7086. Attempting port 7087.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7087. Attempting port 7088.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7088. Attempting port 7089.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7089. Attempting port 7090.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7090. Attempting port 7091.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7091. Attempting port 7092.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7092. Attempting port 7093.
20/08/07 00:27:41 WARN util.Utils: Service 'sparkWorker' could not bind on port 7093. Attempting port 7094.
20/08/07 00:27:41 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.net.BindException: Cannot assign requested address: Service 'sparkWorker' failed after 16 retries (starting from 7078)! Consider explicitly setting the appropriate port for the service 'sparkWorker' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:433)
	at sun.nio.ch.Net.bind(Net.java:425)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
	at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:132)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:551)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1346)
	at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:503)
	at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:488)
	at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:985)
	at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:247)
	at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:344)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)



你可能感兴趣的:(spark,大数据集群配置,spark,hadoop,kafka,hdfs,mapreduce)