初探hadoop,首先需要安装和配置。 配置这玩意,真的要看RP和心情的, 好的时候几分钟顺顺利利就搞定了, 不好的话几个小时都在郁闷的找why和searching goolge!
这次我的机器环境是: mac lion system 10.7.4, 使用的homebew版本是0.9.2, 建议大家使用homebrew之前先进行更新(运行 brew update), 之前我就是没有更新homebrew然后安装hadoop的时候是0.21.0版本的hadoop, 按照网上的步骤来配置怎么都卡在运行start-all.sh的时候毫无反应(其实就是无法启动namenode)。homebrew 0.9.2安装的hadoop是1.0.3
安装、配置步骤如下:
1. 下载安装hadoop 1.0.3
执行 brew install hadoop, 自动就能帮你装好(意思是下载下来、配置好环境变量,例如JAVA_HOME, 这里注意的是在mac上面,这个环境变量要这样配置:
export JAVA_HOME="$(/usr/libexec/java_home)"
by the way , java需要1.6版本的才行喔。
2. 配置hadoop
2.1) 第一步装好的hadoop是在哪里呢? 我们可以用brew list hadoop 来查看。位置应该是在:
/usr/local/Cellar/hadoop/1.0.3 。 我由于想把log和配置目录分开,所以我修改了 /usr/local/Cellar/hadoop/1.0.3/libexec/conf/hadoop-env.sh 这里的log位置(改下HADOOP_LOG_DIR这个就行)
2.2)配置core-site.xml、hdfs-site.xml、mapred-site.xml
* core-site.xml:
hadoop.tmp.dir /usr/local/tmp/hadoop/hadoop-${user.name} A base for other temporary directories. fs.default.name hdfs://localhost:8020
* hdfs-site.xml:
dfs.replication 1
* mapred-site.xml:
mapred.job.tracker localhost:8021
其实这些都是基本配置, 网上一大把。配置的值的说明可以参考这3个地址:
http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
http://hadoop.apache.org/common/docs/r0.20.0/mapred-default.html
2.3)格式化hadoop的文件系统hdfs。 执行命令 hadoop namenode -format
类似输出如下:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = jianpxs-MacBook-Pro.local/192.168.1.106
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.3
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012
************************************************************/
Re-format filesystem in /Users/jianpx/hadoop/tmp/dfs/name ? (Y or N) Y
12/08/12 20:59:40 INFO util.GSet: VM type = 64-bit
12/08/12 20:59:40 INFO util.GSet: 2% max memory = 19.9175 MB
12/08/12 20:59:40 INFO util.GSet: capacity = 2^21 = 2097152 entries
12/08/12 20:59:40 INFO util.GSet: recommended=2097152, actual=2097152
2012-08-12 20:59:40.860 java[8202:1903] Unable to load realm info from SCDynamicStore
12/08/12 20:59:41 INFO namenode.FSNamesystem: fsOwner=jianpx
12/08/12 20:59:41 INFO namenode.FSNamesystem: supergroup=supergroup
12/08/12 20:59:41 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/08/12 20:59:41 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/08/12 20:59:41 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/08/12 20:59:41 INFO namenode.NameNode: Caching file names occuring more than 10 times
12/08/12 20:59:41 INFO common.Storage: Image file of size 112 saved in 0 seconds.
12/08/12 20:59:41 INFO common.Storage: Storage directory /Users/jianpx/hadoop/tmp/dfs/name has been successfully formatted.
12/08/12 20:59:41 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at jianpxs-MacBook-Pro.local/192.168.1.106
************************************************************/
2.4)启动hadoop。 执行start-all.sh就可以了。类似输出如下:
localhost: starting datanode, logging to /Users/jianpx/hadoop/logs/hadoop-jianpx-datanode-jianpxs-MacBook-Pro.local.out
localhost: starting secondarynamenode, logging to /Users/jianpx/hadoop/logs/hadoop-jianpx-secondarynamenode-jianpxs-MacBook-Pro.local.out
starting jobtracker, logging to /Users/jianpx/hadoop/logs/hadoop-jianpx-jobtracker-jianpxs-MacBook-Pro.local.out
localhost: starting tasktracker, logging to /Users/jianpx/hadoop/logs/hadoop-jianpx-tasktracker-jianpxs-MacBook-Pro.local.out
2.5) 再执行 jps命令就可以查看namenode是否启动了。正常输入:
8480 SecondaryNameNode
8549 JobTracker
8287 NameNode
8647 TaskTracker
2.6) 测试hadoop的文件系统命令, 执行: hadoop dfs -ls /
第一次的输出是:
ls: Cannot access .: No such file or directory.
据说这是HADOOP-7489 BUG,fix的方法是在hadoop-env.sh里面加入一句:
OK, 到此为止, 在Mac Lion系统10.7.4上面使用homebrew安装hadoop 1.0.3 就成功了, 之后自己试下跑mapreduce吧! ^_^
Reference:
http://blogs.msdn.com/b/brandonwerner/archive/2011/11/13/how-to-set-up-hadoop-on-os-x-lion-10-7.aspx