PySpark整合Jupyter Notebook

PySpark整合Jupyter Notebook

主要是两个变量

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

然后直接启动pyspark

$SPARK_HOME\bin\pyspark

窗口中的消息输出会给出端口号

[I 14:59:08.242 NotebookApp] 0 active kernels 
[I 14:59:08.242 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 14:59:08.243 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 15:01:35.974 NotebookApp] Saving file at ...

然后从自己的机器上直接用浏览器打开使用就行了
如果要用到外部的jar包,可以加一下系统变量,比如这个就是一个oracle jdbc的例子

export SPARK_CLASSPATH=$ORACLE_HOME/ojdbc8.jar

如果看到以下警告,则改一下spark.executor.extraClassPath或者spark.driver.extraClassPath

WARN SparkConf: SPARK_CLASSPATH was detected (set to '/home/ojdbc8.jar'). This is deprecated in Spark 1.0+.
Please instead use:
./spark-submit with --driver-class-path to augment the driver classpath
spark.executor.extraClassPath to augment the executor classpath
WARN SparkConf: Setting 'spark.executor.extraClassPath' to '/home/ojdbc8.jar' as a work-around.
WARN SparkConf: Setting 'spark.driver.extraClassPath' to '/home/ojdbc8.jar' as a work-around.

你可能感兴趣的:(PySpark整合Jupyter Notebook)