Spark官方编译指南地址:
http://spark.apache.org/docs/latest/building-spark.html
1)ssh脚本编译
Spark源文件根目录下:make-distribution.sh ,usage:
./make-distribution.sh [--name] [--tgz] [--with-tachyon] <maven build options>
生成支持hadoop2.2.0、yarn 、hive 的部署包:
sudo ./make-distribution.sh --tgz --skip-java-test –Pyarn –Phadoop-2.2 –Dhadoop.version=2.2.0 –Phive –Phive-thriftserver
Hive版本0.13.1或者0.13.0,将hive的conf目录下hive-site.xml拷贝至spark1.2.0的conf目录下;
将hive的lib目录下mysql-connector-java-5.1.30-bin.jar拷贝至spark1.2.0的lib目录下;
SparkSQL官方地址:
http://spark.apache.org/docs/latest/sql-programming-guide.html
运行SparkSQL时需要指定mysql-connector-java-5.1.30-bin.jar驱动
1) spark-sql
sudospark-sql --executor-memory 4g --driver-memory 1g --total-executor-cores 2--master spark://192.168.1.100:7077 --driver-class-path$SPARK_HOME/lib/mysql-connector-java-5.1.30-bin.jar
启动后运行HIVE查询命令:
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
valsqlContext=neworg.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("select * from tmp_test wherestarttime='2014-12-24' LIMIT 20").collect().foreach(println)
sqlContext.sql("desc tmp_test").collect().foreach(println)
2) spark-shell
sudospark-shell --executor-memory 3g --driver-memory 1g --total-executor-cores 2--master spark://192.168.1.100:7077 --driver-class-path$SPARK_HOME/lib/mysql-connector-java-5.1.30-bin.jar
启动后运行HIVE查询命令:
select * from tmp_test where starttime='2014-12-24' LIMIT 20";
desc tmp_test;
3) spark-sql—脚本提交方式
命令终端输入:spark-sql –help,查看帮助信息
spark-sql 可以有 –e参数,直接传入sql语句;
在启动spark-sql语句后面加-e “sql语句”参数,如:
sudospark-shell --executor-memory 3g --driver-memory 1g --total-executor-cores 2--master spark://192.168.1.100:7077 --driver-class-path$SPARK_HOME/lib/mysql-connector-java-5.1.30-bin.jar –e “ select * from tmp_testwhere starttime='2014-12-24' LIMIT 20;”
spark-sql 可以有 –e参数,直接传入sql语句,将SQL查询结果保存至文件中:
加上-e “sql语句”参数后,可以将查询结果保存在文件中,如:
sudospark-shell --executor-memory 3g --driver-memory 1g --total-executor-cores 2--master spark://192.168.1.100:7077 --driver-class-path$SPARK_HOME/lib/mysql-connector-java-5.1.30-bin.jar –e “ select * from tmp_testwhere starttime='2014-12-24' LIMIT 20;” > "/data/test.txt"
转载请注明出处:
http://blog.csdn.net/sunbow0/article/details/42487761