pyspark 写入MySQL报错 An error occurred while calling o45.jdbc.: scala.MatchError: null 解决方案

当我尝试使用pySpark连接MySQL,将简单的spark dataframe写入MySQL数据时报错,

py4j.protocol.Py4JJavaError: An error occurred while calling o45.jdbc.: scala.MatchError: null 错误解决方案

(1)错误提示:

Fri Jul 13 16:22:56 CST 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Traceback (most recent call last):
  File "/Users/a6/Downloads/speiyou_di/hive/log_task/111.py", line 47, in 
    df1.write.mode("append").jdbc(url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})
  File "/Library/Python/2.7/site-packages/pyspark/sql/readwriter.py", line 765, in jdbc
    self._jwrite.mode(mode).jdbc(url, table, jprop)
  File "/Library/Python/2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Library/Python/2.7/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Library/Python/2.7/site-packages/py4j-0.10.6-py2.7.egg/py4j/protocol.py", line 320, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o45.jdbc.
: scala.MatchError: null
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:63)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
	at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

(2)出错代码:

# !/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
# 设置spark_home
import os
os.environ["SPARK_HOME"] = "/Users/a6/Applications/spark-2.1.0-bin-hadoop2.6"

from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext(appName="pyspark mysql demo")
sqlContext = SQLContext(sc)

# 创建连接获取数据

# 本地测试
dataframe_mysql=sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark_db").option("dbtable", "test_person").option("user", "root").option("password", "yyz!123456").load()

# 输出数据
print "\nstep1 、dataframe_mysql.collect()\n",dataframe_mysql.collect()
dataframe_mysql.registerTempTable("temp_table")
print dataframe_mysql.show()
print dataframe_mysql.count()

print "step 2、 准备待写入的数据"

from pyspark.sql.types import *

# user defined schema for json file.
schema = StructType([StructField("name", StringType()), StructField("age", IntegerType())])

# loading the contents of the json to the data frame with the user defined schema for json data.
d = [{'name': 'Alice1', 'age': 1}, {'name': 'tome1', 'age': 20}]
df1 = sqlContext.createDataFrame(d, schema)

# display the contents of the dataframe.
print df1.show()

# display the schema of the dataframe.
print df1.printSchema()

print "step3、写入数据"

# 本地测试
#  出错代码A
df1.write.mode("append").jdbc(url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})

# 正确代码B
#df1.write.jdbc(mode="overwrite", url="jdbc:mysql://localhost:3306/spark_db?user=root&password=yyz!123456", table="test_person", properties={"driver": 'com.mysql.jdbc.Driver'})

print "step4、写入成功,读取验证数据"
df1.show()

# 本地测试
dataframe_mysql=sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark_db").option("dbtable", "test_person").option("user", "root").option("password", "yyz!123456").load()

# 输出数据
print "dataframe_mysql.collect()\n",dataframe_mysql.collect()

print "step 5、 所有执行成功"

(3)解决方案

        将【出错代码A】换成【正确代码B】,即可执行成功。比较可知,我们只是轻微做了调整。

(4)错误场景还原需要

首先,需要在本地创建数据库spark_db,同时创建test_person数据,具体如下:

create database spark_db;

CREATE TABLE `test_person` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `age` int(3) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

insert into test_person(name,age) values('yyz',18);

参考:https://stackoverflow.com/questions/49391933/pyspark-jdbc-write-error-an-error-occurred-while-calling-o43-jdbc-scala-matc

你可能感兴趣的:(Spark,Python,mysql)