Spark启动失败:slave节点无法连接到master

1. 问题描述

启动spark时报错,使用start_all.sh或在slave节点上start_slave.sh都报相同的错。

  1. 报错信息如下:
19/04/20 04:35:49 INFO Utils: Successfully started service 'sparkWorker' on port 45265.
19/04/20 04:35:49 INFO Worker: Starting Spark worker 192.168.182.130:45265 with 1 cores, 1024.0 MB RAM
19/04/20 04:35:49 INFO Worker: Running Spark version 2.4.1
19/04/20 04:35:49 INFO Worker: Spark home: /opt/spark
19/04/20 04:35:50 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
19/04/20 04:35:50 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://slave1:8081
19/04/20 04:35:50 INFO Worker: Connecting to master master:7077...
19/04/20 04:35:50 WARN Worker: Failed to connect to master master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
	at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:253)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to master/192.168.182.129:7077
	at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
	at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
	at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
	at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
	at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
	... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 拒绝连接: master/192.168.182.129:7077
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	... 1 more
Caused by: java.net.ConnectException: 拒绝连接
	... 11 more

2. 解决方法

  1. 试了网上的办法都没用,但是在看master启动的log时看到一句话,如下:
19/04/20 04:48:55 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.1.1; using 192.168.182.129 instead (on interface ens33)
  1. 猜测原因为启动master的时候,master的IP实际被变成127.0.1.1。具体地,我的spark_env.shSPARK_MASTER_HOST=master,而我的/etc/hosts文件如下,master/etc/hosts中我设置为我的ip,但同时主机名刚好也为master,会被映射到127.0.1.1,造成上面的情况
Node@master /o/spark-2.4.1-bin-hadoop2.6> cat /etc/hosts
127.0.0.1	localhost
127.0.1.1	master

#add ip-name
192.168.182.129 master
192.168.182.130 slave1
192.168.182.131 slave2
  1. 尝试的解决方法就是spark_env.shSPARK_MASTER_HOST直接设置为ip,重新尝试可以正常启动

  2. 尝试将hosts文件中127.0.1.1注释掉,也可以正常启动

你可能感兴趣的:(大数据学习记录,Spark,slave,启动,失败,master)