文章中讲述了类似于“personalcondor”的一种“personal hadoop” 配置法。主要的目的是配置文件和日志文件有一个单一的源,
可以用软连接到开发生成的二进制库,这样就可以在所生成二进制库更新的时候维护其他的数据和配置项。
1. 比较不用改变现有系统中安装软件的情况下,在本地的沙盒环境中做测试
2. 单一源的配置文件盒日志文件
网页:
http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://wiki.apache.org/hadoop/
http://docs.hortonworks.com/CURRENT/index.htm#Appendix/Configuring_Ports/HDFS_Ports.htm
书籍:
Hadoop “TheDefinitive Guide”
1. 当前是在使用存在maven依赖的非本地开发步骤,详细信息在本地的包中,请查看:https://fedoraproject.org/wiki/Features/Hadoop
2 . 单节点环境搭建步骤在下边列出
1. 配置没有密码的ssh
yum install openssh openssh-clients openssh-server
# generate a public/private key, if you don't already have one
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/*
# testing ssh:
ps -ef | grep sshd # verify sshd is running
ssh localhost # accept the certification when prompted
sudo passwd root # Make sure the root has a password
2. 安装其它依赖包
yum install cmake git subversion dh-make ant autoconf automake sharutils libtool asciidoc xmlto curl protobuf-compiler gcc-c++
3. 安装java和开发环境
yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-javadoc *maven*
修改.bashrc文件信息
export JVM_ARGS="-Xmx1024m -XX:MaxPermSize=512m"
export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m"
注意:以上的配置用在F18的OpenJDK7上,可以通过以下命令来测试当前环境配置是否成功。
mvn install -Dmaven.test.failure.ignore=true
1. 下载编译hadoop
git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
git checkout -b branch-2.0.4-alpha origin/branch-2.0.4-alpha
mvn clean package -Pdist -DskipTests
2. 创建沙盒环境
在这个配置中我们默认到/home/tstclair
cd ~
mkdir personal-hadoop
cd personal-hadoop
mkdir -p conf data name logs/yarn
ln -sf
/hadoop-dist/target/hadoop-2.0.4-alpha home 3. 重写你的环境变量
附加以下信息到家目录的.bashrc文件中
# Hadoop env override:
export HADOOP_BASE_DIR=${HOME}/personal-hadoop
export HADOOP_LOG_DIR=${HOME}/personal-hadoop/logs
export HADOOP_PID_DIR=${HADOOP_BASE_DIR}
export HADOOP_CONF_DIR=${HOME}/personal-hadoop/conf
export HADOOP_COMMON_HOME=${HOME}/personal-hadoop/home
export HADOOP_HDFS_HOME=${HADOOP_COMMON_HOME}
export HADOOP_MAPRED_HOME=${HADOOP_COMMON_HOME}
# Yarn env override:
export HADOOP_YARN_HOME=${HADOOP_COMMON_HOME}
export YARN_LOG_DIR=${HADOOP_LOG_DIR}/yarn
#classpath override to search hadoop loc
export CLASSPATH=/usr/share/java/:${HADOOP_COMMON_HOME}/share
#Finally update your PATH
export PATH=${HADOOP_COMMON_HOME}/bin:${HADOOP_COMMON_HOME}/sbin:${HADOOP_COMMON_HOME}/libexec:${PATH}
4. 验证以上步骤
source ~/.bashrc
which hadoop # verify it should be ${HOME}/personal-hadoop/home/bin
hadoop -help # verify classpath is correct.
5. 创建初始化单一源的配置文件
拷贝默认的配置文件
cp ${HADOOP_COMMON_HOME}/etc/hadoop/* ${HADOOP_BASE_DIR}/conf
更新你的hdfs-site.xml文件:
fs.default.name
hdfs://localhost/
dfs.name.dir
file:///home/tstclair/personal-hadoop/name
dfs.http.address
0.0.0.0:50070
dfs.data.dir
file:///home/tstclair/personal-hadoop/data
dfs.datanode.address
0.0.0.0:50010
dfs.datanode.http.address
0.0.0.0:50075
dfs.datanode.ipc.address
0.0.0.0:50020
更新mapred-site.xml文件
mapreduce.cluster.temp.dir
No description
true
mapreduce.cluster.local.dir
No description
true
最后更新yarn-site.xml文件
yarn.resourcemanager.resource-tracker.address
localhost:8031
host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.
yarn.resourcemanager.scheduler.address
localhost:8030
host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager.
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
In case you do not want to use the default scheduler
yarn.resourcemanager.address
localhost:8032
the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager.
yarn.nodemanager.local-dirs
the local directories used by the nodemanager
yarn.nodemanager.address
localhost:8034
the nodemanagers bind to this port
yarn.nodemanager.resource.memory-mb
10240
the amount of memory on the NodeManager in GB
yarn.nodemanager.aux-services
mapreduce.shuffle
shuffle service that needs to be set for Map Reduce to run
格式化namenode
hadoop namenode -format
#verify output is correct.
开启hdfs:
start-dfs.sh
打开浏览器http://localhost:50070,查看是否有一个节点已经被启动
接下来开启yarn
start-yarn.sh
通过查看日志文件来验证是否正常启动
最后通过运行MapReduce任务来检查Hadoop是否正常运行
cd ${HADOOP_COMMON_HOME}/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-example-2.0.4-alpha.jar randomwriter out
文章出处:http://timothysc.github.io/blog/2013/04/22/personalhadoop/