http://blog.csdn.net/wzy0623/article/details/50681554
hive 2.0.0 安装,参考
http://blog.csdn.net/wzy0623/article/details/50685966
注:hive 2.0.0初始化需要执行下面的命令:
$HIVE_HOME/bin/schematool -initSchema -dbType mysql -userName=root -passowrd=new_password
否则执行hive会报错:
Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)
安装spark
1. 下载spark安装包,地址: http://spark.apache.org/downloads.html
图1
注:如果要用sparksql查询hive的数据,一定要注意spark和hive的版本兼容性问题,在hive源码包的pom.xml文件中可以找到匹配的spark版本。
2. 解压缩8. 配置yarn
vi /home/grid/hadoop-2.7.2/etc/hadoop/yarn-site.xml
# 修改如下属性
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
图2
jps查看主从机进程图3
http://192.168.17.210:8080/图4
11. 测试
图5
图6
图7
测试SparkSQL:
在$SPARK_HOME/conf目录下创建hive-site.xml文件,然后在该配置文件中,添加hive.metastore.uris属性,具体如下:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
</configuration>
# 启动hive metastore服务
hive --service metastore > /tmp/grid/hive_metastore.log 2>&1 &
# 启动SparkSQL CLI
spark-sql --master spark://master:7077 --executor-memory 1g
# 这时就可以使用HQL语句对Hive数据进行查询
show databases;
create table test;
use test;
create table t1 (name string);
load data local inpath '/home/grid/a.txt' into table t1;
select * from t1;
select count(*) from t1;
drop table t1;
SQL执行如图8所示
图8
做了一个简单的对比测试,300G数据时,sparksql比hive快近三倍,3T数据时,sparksql比hive快7.5倍。
参考:
http://spark.apache.org/docs/latest/running-on-yarn.htmlhttp://blog.csdn.net/u014039577/article/details/50829910
http://www.cnblogs.com/shishanyuan/p/4723604.html
http://www.cnblogs.com/shishanyuan/p/4723713.html