在Hadoop Yarn 运行 pyspark 的一些问题

hduser@master:~$ pyspark --master local[4]
Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
18/08/16 09:13:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.3.1
      /_/

Using Python version 2.7.12 (default, Dec  4 2017 14:50:18)
SparkSession available as 'spark'.
>>> sc.master
u'local[4]'
>>> textFile=sc.textFile("file:/usr/local/spark/README.md")
>>> textFile.count()
103                                                                             
>>> textFile =sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")
>>> textFile.count()
1594
>>> exit()
hduser@master:~$ HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
18/08/16 09:16:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/16 09:16:10 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.3.1
      /_/

Using Python version 2.7.12 (default, Dec  4 2017 14:50:18)
SparkSession available as 'spark'.
>>> 18/08/16 09:16:46 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
18/08/16 09:16:47 ERROR TransportClient: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/08/16 09:16:47 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/08/16 09:16:47 ERROR Utils: Uncaught exception in thread Yarn application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:567)
	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:95)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:155)
	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:508)
	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1755)
	at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1931)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1360)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1930)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:112)
Caused by: java.io.IOException: Failed to send RPC 7397841312553810412 to /192.168.0.102:39808: java.nio.channels.ClosedChannelException
	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

>>> sc.master
u'yarn'
>>> textFile = sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/spark/python/pyspark/context.py", line 546, in textFile
    minPartitions = minPartitions or min(self.defaultParallelism, 2)
  File "/usr/local/spark/python/pyspark/context.py", line 400, in defaultParallelism
    return self._jsc.sc().defaultParallelism()
  File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/local/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o30.defaultParallelism.
: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:

org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.lang.Thread.run(Thread.java:748)

The currently active SparkContext was created at:

(No active SparkContext.)
         
	at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:99)
	at org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:2332)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

>>> textFile.count()
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'textFile' is not defined
>>> 

  在本地运行pyspark程序查询没问题,但在Hadoop YARN 运行pyspark出现上述问题,希望有关大神看到,指点一下迷津。十分感谢~~~

下面附上我的yarn-site.xml设置







yarn.nodemanager.aux-services
mapreduce_shuffle


yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler


yarn.resourcemanager.resource-tracker.address master:8025


yarn.resourcemanager.scheduler.address master:8030


yarn.resourcemanager.address master:8050



        yarn.nodemanager.vmem-check-enabled
        false

 

 

  

转载于:https://www.cnblogs.com/genghenggao/p/9485421.html

你可能感兴趣的:(在Hadoop Yarn 运行 pyspark 的一些问题)