工作需要,现在开始做大数据开发了,通过下面的配置步骤,你可以在win10系统中,部署出一套hadoop+hbase,便于单机测试调试开发。
准备资料:
1.
hadoop-2.7.2:
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/
2.
hadoop-common-2.2.0-bin-master:
https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip
3.
hbase-1.2.3:
http://apache.fayea.com/hbase/stable/
4.
jdk1.8:
http://dl-t1.wmzhe.com/30/30118/jdk_1.8.0.0_64.exe
以上压缩包用winrar解压失败的话,请下载安装 Cygwin 然后用它下的命令提示符下输入:tar -zxvf hadoop-2.7.2.tar.gz
将下载好的3个压缩包分别解压缩到
D:\HBase\hadoop-2.7.2
D:\HBase\hadoop-common-2.2.0-bin-master
D:\HBase\hbase-1.2.3
复制 D:\HBase\hadoop-common-2.2.0-bin-master\bin 的7个文件(注意:只复制这7个) 复制到 D:\HBase\hadoop-2.7.2\bin
hadoop.dll、hadoop.exp、hadoop.lib、hadoop.pdb、libwinutils.lib、winutils.exe、winutils.pdb
配置Hadoop:
D:\HBase\hadoop-2.7.2\etc\hadoop
hadoop-env.cmd 内容如下:
@echo off
@rem Licensed to the Apache Software Foundation (ASF) under one or more
@rem contributor license agreements. See the NOTICE file distributed with
@rem this work for additional information regarding copyright ownership.
@rem The ASF licenses this file to You under the Apache License, Version 2.0
@rem (the "License"); you may not use this file except in compliance with
@rem the License. You may obtain a copy of the License at
@rem
@rem http://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing, software
@rem distributed under the License is distributed on an "AS IS" BASIS,
@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem See the License for the specific language governing permissions and
@rem limitations under the License.
@rem Set Hadoop-specific environment variables here.
@rem The only required environment variable is JAVA_HOME. All others are
@rem optional. When running a distributed configuration it is best to
@rem set JAVA_HOME in this file, so that it is correctly defined on
@rem remote nodes.
@rem The java implementation to use. Required.
set JAVA_HOME=%JAVA_HOME%
@rem The jsvc implementation to use. Jsvc is required to run secure datanodes.
@rem set JSVC_HOME=%JSVC_HOME%
@rem set HADOOP_CONF_DIR=
@rem Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
if exist %HADOOP_HOME%\contrib\capacity-scheduler (
if not defined HADOOP_CLASSPATH (
set HADOOP_CLASSPATH=%HADOOP_HOME%\contrib\capacity-scheduler\*.jar
) else (
set HADOOP_CLASSPATH=%HADOOP_CLASSPATH%;%HADOOP_HOME%\contrib\capacity-scheduler\*.jar
)
)
@rem The maximum amount of heap to use, in MB. Default is 1000.
@rem set HADOOP_HEAPSIZE=
@rem set HADOOP_NAMENODE_INIT_HEAPSIZE=""
@rem Extra Java runtime options. Empty by default.
@rem set HADOOP_OPTS=%HADOOP_OPTS% -Djava.net.preferIPv4Stack=true
@rem Command specific options appended to HADOOP_OPTS when specified
if not defined HADOOP_SECURITY_LOGGER (
set HADOOP_SECURITY_LOGGER=INFO,RFAS
)
if not defined HDFS_AUDIT_LOGGER (
set HDFS_AUDIT_LOGGER=INFO,NullAppender
)
set HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_NAMENODE_OPTS%
set HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS %HADOOP_DATANODE_OPTS%
set HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_SECONDARYNAMENODE_OPTS%
@rem The following applies to multiple commands (fs, dfs, fsck, distcp etc)
set HADOOP_CLIENT_OPTS=-Xmx512m %HADOOP_CLIENT_OPTS%
@rem set HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData %HADOOP_JAVA_PLATFORM_OPTS%"
@rem On secure datanodes, user to run the datanode as after dropping privileges
set HADOOP_SECURE_DN_USER=%HADOOP_SECURE_DN_USER%
@rem Where log files are stored. %HADOOP_HOME%/logs by default.
@rem set HADOOP_LOG_DIR=%HADOOP_LOG_DIR%\%USERNAME%
@rem Where log files are stored in the secure data environment.
set HADOOP_SECURE_DN_LOG_DIR=%HADOOP_LOG_DIR%\%HADOOP_HDFS_USER%
@rem The directory where pid files are stored. /tmp by default.
@rem NOTE: this should be set to a directory that can only be written to by
@rem the user that will run the hadoop daemons. Otherwise there is the
@rem potential for a symlink attack.
set HADOOP_PID_DIR=%HADOOP_PID_DIR%
set HADOOP_SECURE_DN_PID_DIR=%HADOOP_PID_DIR%
@rem A string representing this instance of hadoop. %USERNAME% by default.
set HADOOP_IDENT_STRING=%USERNAME%
set JAVA_HOME=D:\Java\jdk1.8.0_31
set HADOOP_HOME=D:\HBase\hadoop-2.7.2
set HADOOP_PREFIX=D:\HBase\hadoop-2.7.2
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin
core-site.xml 内容如下:
fs.default.name
hdfs://0.0.0.0:19000
hdfs-site.xml 内容如下:
fs.default.name
hdfs://0.0.0.0:19000
创建mapred-site.xml 内容如下:中文名字处换成阁下的WINDOWS当前登录账户名
mapreduce.job.user.name
冯明刚
mapreduce.framework.name
yarn
yarn.apps.stagingDir
/user/冯明刚/staging
mapreduce.jobtracker.address
local
yarn.server.resourcemanager.address
0.0.0.0:8020
yarn.server.resourcemanager.application.expiry.interval
60000
yarn.server.nodemanager.address
0.0.0.0:45454
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.server.nodemanager.remote-app-log-dir
/app-logs
yarn.nodemanager.log-dirs
/dep/logs/userlogs
yarn.server.mapreduce-appmanager.attempt-listener.bindAddress
0.0.0.0
yarn.server.mapreduce-appmanager.client-service.bindAddress
0.0.0.0
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
-1
yarn.application.classpath
%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*
初始化Hadoop:
:> cd D:\HBase\hadoop-2.7.2\etc\hadoop\
:> hadoop-env.cmd
:>%HADOOP_PREFIX%\bin\hdfs namenode -format
启动Hadoop:
:>%HADOOP_PREFIX%\sbin\start-dfs.cmd
停止Hadoop:
:>%HADOOP_PREFIX%\sbin\stop-all.cmd
配置Hbase:
编辑hbase-site.xml内容如下
hbase.rootdir
file:///D:/HBase/hbase-1.2.3/root
hbase.tmp.dir
D:/HBase/hbase-1.2.3/tmp
hbase.zookeeper.quorum 127.0.0.1
hbase.zookeeper.property.dataDir D:/HBase/hbase-1.2.3/zoo
hbase.cluster.distributed false
编辑hbase-env.cmd内容如下:
@rem/**
@rem * Licensed to the Apache Software Foundation (ASF) under one
@rem * or more contributor license agreements. See the NOTICE file
@rem * distributed with this work for additional information
@rem * regarding copyright ownership. The ASF licenses this file
@rem * to you under the Apache License, Version 2.0 (the
@rem * "License"); you may not use this file except in compliance
@rem * with the License. You may obtain a copy of the License at
@rem *
@rem * http://www.apache.org/licenses/LICENSE-2.0
@rem *
@rem * Unless required by applicable law or agreed to in writing, software
@rem * distributed under the License is distributed on an "AS IS" BASIS,
@rem * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem * See the License for the specific language governing permissions and
@rem * limitations under the License.
@rem */
@rem Set environment variables here.
@rem The java implementation to use. Java 1.7+ required.
@rem set JAVA_HOME=c:\apps\java
@rem Extra Java CLASSPATH elements. Optional.
@rem set HBASE_CLASSPATH=
@rem The maximum amount of heap to use. Default is left to JVM default.
@rem set HBASE_HEAPSIZE=1000
@rem Uncomment below if you intend to use off heap cache. For example, to allocate 8G of
@rem offheap, set the value to "8G".
@rem set HBASE_OFFHEAPSIZE=1000
@rem For example, to allocate 8G of offheap, to 8G:
@rem etHBASE_OFFHEAPSIZE=8G
@rem Extra Java runtime options.
@rem Below are what we set by default. May only work with SUN JVM.
@rem For more on why as well as other possible settings,
@rem see http://wiki.apache.org/hadoop/PerformanceTuning
@rem JDK6 on Windows has a known bug for IPv6, use preferIPv4Stack unless JDK7.
@rem @rem See TestIPv6NIOServerSocketChannel.
set HBASE_OPTS="-XX:+UseConcMarkSweepGC" "-Djava.net.preferIPv4Stack=true"
@rem Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
set HBASE_MASTER_OPTS=%HBASE_MASTER_OPTS% "-XX:PermSize=128m" "-XX:MaxPermSize=128m"
set HBASE_REGIONSERVER_OPTS=%HBASE_REGIONSERVER_OPTS% "-XX:PermSize=128m" "-XX:MaxPermSize=128m"
@rem Uncomment below to enable java garbage collection logging for the server-side processes
@rem this enables basic gc logging for the server processes to the .out file
@rem set SERVER_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-XX:+PrintGCDateStamps" %HBASE_GC_OPTS%
@rem this enables gc logging using automatic GC log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+. Either use this set of options or the one above
@rem set SERVER_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-XX:+PrintGCDateStamps" "-XX:+UseGCLogFileRotation" "-XX:NumberOfGCLogFiles=1" "-XX:GCLogFileSize=512M" %HBASE_GC_OPTS%
@rem Uncomment below to enable java garbage collection logging for the client processes in the .out file.
@rem set CLIENT_GC_OPTS="-verbose:gc" "-XX:+PrintGCDetails" "-XX:+PrintGCDateStamps" %HBASE_GC_OPTS%
@rem Uncomment below (along with above GC logging) to put GC information in its own logfile (will set HBASE_GC_OPTS)
@rem set HBASE_USE_GC_LOGFILE=true
@rem Uncomment and adjust to enable JMX exporting
@rem See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
@rem More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
@rem
@rem set HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false"
@rem set HBASE_MASTER_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10101"
@rem set HBASE_REGIONSERVER_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10102"
@rem set HBASE_THRIFT_OPTS=%HBASE_JMX_BASE% "-Dcom.sun.management.jmxremote.port=10103"
@rem set HBASE_ZOOKEEPER_OPTS=%HBASE_JMX_BASE% -Dcom.sun.management.jmxremote.port=10104"
@rem File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
@rem set HBASE_REGIONSERVERS=%HBASE_HOME%\conf\regionservers
@rem Where log files are stored. $HBASE_HOME/logs by default.
@rem set HBASE_LOG_DIR=%HBASE_HOME%\logs
@rem A string representing this instance of hbase. $USER by default.
@rem set HBASE_IDENT_STRING=%USERNAME%
@rem Seconds to sleep between slave commands. Unset by default. This
@rem can be useful in large clusters, where, e.g., slave rsyncs can
@rem otherwise arrive faster than the master can service them.
@rem set HBASE_SLAVE_SLEEP=0.1
@rem Tell HBase whether it should manage it's own instance of Zookeeper or not.
@rem set HBASE_MANAGES_ZK=true
set JAVA_HOME=D:\Java\jdk1.8.0_31
set HADOOP_HOME=D:\HBase\hadoop-2.7.2
set HADOOP_PREFIX=D:\HBase\hadoop-2.7.2
set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%
set PATH=%PATH%;%HADOOP_PREFIX%\bin
启动Hbase
:> cd D:\HBase\hbase-1.2.3\conf
:> hbase-env.cmd
:> cd D:\HBase\hbase-1.2.3\bin
:> start-hbase.cmd
以上就配置完了,用 Hbase Shell试一下是否能操作数据库
:> cd D:\HBase\hbase-1.2.3\bin
:>hbase shell
创建表
创建一个名为 test 的表,这个表只有一个 列族 为 cf。可以列出所有的表来检查创建情况,然后插入些值。
>create 'test','cf'
插入记录
>put 'test','row1','cf:a','value1'
查询
>scan 'test'
完毕,有问题,请留言