pyspark读取Mysql数据


pyspark读取Mysql数据:

样例code 1:

from pyspark.sqlimportSQLContext

sqlContext = SQLContext(sc)
dataframe_mysql = sqlContext.read.format("jdbc").options(url="jdbc:mysql://127.0.0.1:3306/spark_db", driver="com.mysql.jdbc.Driver", dbtable="spark_table", user="root", password="root").load()
dataframe_mysql.show()


样例code 2:

from pyspark import SparkContext,SQLContext
from pyspark.sql import SQLContext

sc = SparkContext("spark://train01:7077","LDASample")  
sqlContext=SQLContext(sc)
jdbcDf=sqlContext.read.format("jdbc").options(url="jdbc:mysql://10.10.10.10:3306/adl",driver="com.mysql.jdbc.Driver",dbtable="(SELECT code,title,description FROM project) tmp",user="mouren",password="mouren").load()
print(jdbcDf.select('description').show(2))


前提:配置文件/etc/spark/conf/spark-env.sh

+export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/mysql-connector-java/mysql-connector-java-5.1.40-bin.jar

这样的配置有时报错:

WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to ':/opt/mysql-connector-java/mysql-connector-java-5.1.40-bin.jar' as a work-around.

解决方案:

去掉上面的配置,编辑spark-defaults.conf

+spark.executor.extraClassPath /opt/mysql-connector-java/mysql-connector-java-5.1.40-bin.jar

你可能感兴趣的:(hadoop/hive)