首先,我们需要搭建软件的运行环境,本文中所使用的软件均需要运行在Java环境之上,所以在您的电脑中安装JDK,并设置好环境变量,参考链接:传送门 。( 作者已将相关软件上传网盘:点击下载 提取码:8cmm )
注意事项:
下载spark安装软件(网盘中版本为spark-2.4.3-bin-hadoop2.7),将其解压至硬盘中,然后配置环境变量,变量名 SPARK_HOME,变量值为解压的目录地址,最后到PATH中添加路径 %SPARK_HOME%\bin
测试:打开cmd窗口,输入 spark-shell 命令
下载Hadoop安装软件(网盘中版本为hadoop-2.8.4.tar),将其解压至硬盘中。其次,下载winutils-master.zip并解压。
接下来按如下操作:
fs.defaultFS
hdfs://localhost:9000
mapreduce.framework.name
yarn
dfs.replication
1
dfs.namenode.name.dir
file:/xxx(你的安装目录)/hadoop-2.8.4/data/namenode
dfs.datanode.data.dir
file:/xxx(你的安装目录)/hadoop-2.8.4/data/datanode
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
@rem set JAVA_HOME=%JAVA_HOME%
set JAVA_HOME=xxx(java的安装目录)\jdk1.8.0_111
下载hive安装软件(网盘中版本为apache-hive-2.1.1-bin.tar),将其解压至硬盘中。配置环境变量,变量名 HIVE_HOME,变量值为解压的目录地址,最后到PATH中添加路径 %HIVE_HOME%\bin。
因为Hive需要仓库存储数据,我们需要提前安装MySQL(网盘中版本为mysql-5.6.36-winx64),安装过程参考博客:MySQL-5.6.13免安装版配置方法 。
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
root
username to use against metastore database
javax.jdo.option.ConnectionPassword
root
password to use against metastore database
hive.exec.local.scratchdir
xxx(hive安装的目录)/scratch_dir
Local scratch space for Hive jobs
hive.downloaded.resources.dir
xxx(hive安装的目录)/resources_dir/${hive.session.id}_resources
Temporary local directory for added resources in the remote file system.
hive.querylog.location
xxx(hive安装的目录)/querylog_dir
Location of Hive run time structured log file
hive.server2.logging.operation.log.location
xxx(hive安装的目录)/operation_dir
Top level directory where operation logs are stored if logging functionality is enabled
此外,hive.metastore.warehouse.dir,hive.user.install.directory的地址需要在hdfs中创建。
status = INFO
name = HiveLog4j2
packages = org.apache.hadoop.hive.ql.log
# list of properties
property.hive.log.level = INFO
property.hive.root.logger = DRFA
property.hive.log.dir = hive_log
property.hive.log.file = hive.log
property.hive.perflogger.log.level = INFO
# list of all appenders
appenders = console, DRFA
# console appender
appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n
# daily rolling file appender
appender.DRFA.type = RollingRandomAccessFile
appender.DRFA.name = DRFA
appender.DRFA.fileName = ${hive.log.dir}/${hive.log.file}
# Use %pid in the filePattern to append @ to the filename if you want separate log files for different CLI session
appender.DRFA.filePattern = ${hive.log.dir}/${hive.log.file}.%d{yyyy-MM-dd}
appender.DRFA.layout.type = PatternLayout
appender.DRFA.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n
appender.DRFA.policies.type = Policies
appender.DRFA.policies.time.type = TimeBasedTriggeringPolicy
appender.DRFA.policies.time.interval = 1
appender.DRFA.policies.time.modulate = true
appender.DRFA.strategy.type = DefaultRolloverStrategy
appender.DRFA.strategy.max = 30
# list of all loggers
loggers = NIOServerCnxn, ClientCnxnSocketNIO, DataNucleus, Datastore, JPOX, PerfLogger
logger.NIOServerCnxn.name = org.apache.zookeeper.server.NIOServerCnxn
logger.NIOServerCnxn.level = WARN
logger.ClientCnxnSocketNIO.name = org.apache.zookeeper.ClientCnxnSocketNIO
logger.ClientCnxnSocketNIO.level = WARN
logger.DataNucleus.name = DataNucleus
logger.DataNucleus.level = ERROR
logger.Datastore.name = Datastore
logger.Datastore.level = ERROR
logger.JPOX.name = JPOX
logger.JPOX.level = ERROR
logger.PerfLogger.name = org.apache.hadoop.hive.ql.log.PerfLogger
logger.PerfLogger.level = ${hive.perflogger.log.level}
# root logger
rootLogger.level = ${hive.log.level}
rootLogger.appenderRefs = root
rootLogger.appenderRef.root.ref = ${hive.root.logger}
hive --service metastore -hiveconf hive.root.logger=DEBUG
接着,新开一个cmd窗口,执行以下命令启动hiveserver
hive --service hiveserver2
最后,可以打开hive客户端
hive --service cli
show databases;
至此,spark+Hadoop+hive的Windows伪分布式测试环节已搭建完成,各位有问题欢迎随时交流。