pyspark写入mysql关于jdbc的加载方法

from pyspark.sql import SparkSession

# Creates spark session with JDBC JAR
spark = SparkSession.builder \
    .appName('stack_overflow') \
    .config('spark.jars', '/path/to/mysql/jdbc/connector') \
    .getOrCreate()

# Creates your DataFrame with spark session with JDBC
df = spark.createDataFrame([
    (1, 'Hello'),
    (2, 'World!')
], ['Index', 'Value'])

df.write.jdbc('jdbc:mysql://localhost:3306/my_db?useSSL=false', 'my_table',
              mode='append',
              properties={'user': 'db_user', 'password': 'db_pass'})

 df.write.format('jdbc').options(url='jdbc:mysql://localhost/db4recommandation',driver='com.mysql.jdbc.Driver',dbtable='user_activity',user='123',password='456',useSSL=False).mode('append').save()

 

 

两个命令:

spark-submit --jars spark-streaming-flume-assembly_2.11-2.4.4.jar test_push.py
./bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties.example_push --name a1 -Dflume.root.logger=INFO,console

分解列元素:

df:

+-----+------+
|first|second|
+-----+------+
|d,e,f| D,E,F|
+-----+------+

df.alias('L').select('L.*',F.posexplode(F.split('first',',')).alias('p1','v1')).alias('R').select('R.*',F.posexplode(F.split('second',',')).alias('p2','v2')).show()

 

 

某些小技巧:

https://stackoverflow.com/questions/39235704/split-spark-dataframe-string-column-into-multiple-columns

 

 

(有时候withcolumn不行,用https://stackoverflow.com/questions/50123238/pyspark-use-dataframe-inside-udf)

读取:

 

df = spark.read.format('jdbc').options(
    url='jdbc:mysql://127.0.0.1',
    dbtable='dbname.tablename',
    user='root',
    password='123456' 
    ).load()
df.show()

你可能感兴趣的:(spark)