spark2.x shell 客户端操作sparkSQL

1.客户端启动shell

进入spark安装目录

bin/spark-shell --master spark://IP:7077 --executor-memory 1g

2.scala操作

(1)把HDFS上的文件映射为表

启动sparkSession对象:

val spark = org.apache.spark.sql.SparkSession.builder().appName("SparkSessionZipsExample").config("spark.sql.crossJoin.enabled", "true").getOrCreate()

建立表映射类:

case class User (id:String,name:String,age:String)

文件与表映射并过滤掉长度不够的字段(文件以“\t”隔开,映射后表名为user):

spark.read.textFile("hdfs://IP:9000/xxxx/user").map(_.split("\t", -1)).filter(r => r match { case i if (i.length >= 3) => true case _ => false }).map(r => User(r(0).trim, r(1).trim, r(2).trim)).createOrReplaceTempView("user")

像数据库一样操作文件:

spark.sql("select * from user").show()

(2)把数据库上的表映射到sparkSession中

启动sparkSession对象:

val spark = org.apache.spark.sql.SparkSession.builder().appName("SparkSessionZipsExample").config("spark.sql.crossJoin.enabled", "true").getOrCreate()

配置数据库信息

val prop = new java.util.Properties()
prop.put("user", "postgres")
prop.put("password", "123456")

加载数据库表

spark.read.jdbc("jdbc:postgresql://IP:5432/risk", "user", prop).createOrReplaceTempView("user")

像数据库一样操作文件:

spark.sql("select * from user").show()


(3)把List数据集映射为表


启动sparkSession对象:

val spark = org.apache.spark.sql.SparkSession.builder().appName("SparkSessionZipsExample").config("spark.sql.crossJoin.enabled", "true").getOrCreate()


建立表映射类:

case class User (id:String,name:String,age:String)

组装数据集

val userList: ListBuffer[User] = ListBuffer()
for(int i=0;i<100;i++){
userList += User(i,"xm","23")
}

spark加载数据集

sparkSession.createDataFrame(userList).createOrReplaceTempView("user")

像数据库一样操作文件:

spark.sql("select * from user").show()




你可能感兴趣的:(hadoop,spark,mysql)