Hadoop学习笔记——安装Hadoop

sudo mv /home/common/下载/hadoop-2.7.2.tar.gz /usr/local
sudo tar -xzvf hadoop-2.7.2.tar.gz
sudo mv hadoop-2.7.2 hadoop    #改个名

 在etc/profile文件中添加

export HADOOP_HOME=/usr/local/hadoop
export PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

 1.修改/usr/local/hadoop/etc/hadoop/hadoop-env.sh文件

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121

2.修改/usr/local/hadoop/etc/hadoop/core-site.xml文件



        
                fs.default.name
                hdfs://master:9000
        
        
                hadoop.tmp.dir
                ~/software/apache/hadoop-2.9.1/tmp
        
        
                hadoop.native.lib
                false
        


 在/etc/hosts中添加自己的外网ip

XXXX    master

 如果在工程中需要访问HDFS,需要在resources中添加 core-site.xml文件









  
    fs.defaultFS
    hdfs://master:9000
  


 

 3.修改/usr/local/hadoop/etc/hadoop/hdfs-site.xml文件



        
                dfs.replication
                1
        
        
                dfs.name.dir
                file:/home/lintong/software/apache/hadoop-2.9.1/tmp/dfs/name
        
        
                dfs.data.dir
                file:/home/lintong/software/apache/hadoop-2.9.1/tmp/dfs/data
        
        
                dfs.namenode.checkpoint.dir
                file:/home/lintong/software/apache/hadoop-2.9.1/tmp/dfs/namenode
        
        
                dfs.permissions
                false
        


 

 4./usr/local/hadoop/etc/hadoop/mapred-site.xml(修改mapred-site.xml.template的那个文件)



        
                mapreduce.framework.name
                yarn
        


 

5. /usr/local/hadoop/etc/hadoop/yarn-site.xml




        
                yarn.nodemanager.aux-services
                mapreduce_shuffle
        
        
                yarn.nodemanager.aux-services.mapreduce.shuffle.class
                org.apache.hadoop.mapred.ShuffleHandler
        


 

6.使得/etc/profile生效

sudo source /etc/profile

 /etc/profile文件内容

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_121
export JRE_HOME=${JAVA_HOME}/jre 
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib 
export PATH=${JAVA_HOME}/bin:$PATH

export PATH=/usr/local/texlive/2015/bin/x86_64-linux:$PATH 
export MANPATH=/usr/local/texlive/2015/texmf-dist/doc/man:$MANPATH 
export INFOPATH=/usr/local/texlive/2015/texmf-dist/doc/info:$INFOPATH

export HADOOP_HOME=/usr/local/hadoop
export PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

export M2_HOME=/opt/apache-maven-3.3.9
export M2=$M2_HOME/bin
export PATH=$M2:$PATH

export GRADLE_HOME=/opt/gradle/gradle-3.4.1
export PATH=$GRADLE_HOME/bin:$PATH

 ~/.bashrc文件内容

export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL

SSH和Hadoop用户设置可以参考

http://www.cnblogs.com/CheeseZH/p/5051135.html

http://www.powerxing.com/install-hadoop/

免密登录

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost

 

如果遇到dataNode不能启动的问题,参考

http://www.aboutyun.com/thread-12803-1-1.html

去Hadoop/log目录下查看log日志文件,然后在/usr/local/hadoop/tmp/dfs/data/current目录下修改VERSION文件中的内容

 

ubuntu Hadoop启动报Error: JAVA_HOME is not set and could not be found解决办法

修改/etc/hadoop/hadoop-env.sh中设JAVA_HOME为绝对路径

 

Hadoop目录下的权限

Hadoop学习笔记——安装Hadoop_第1张图片

 

格式化一个新的分布式文件系统

hdfs namenode -format

运行Hadoop

Hadoop学习笔记——安装Hadoop_第2张图片

运行Hadoop示例

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 5

 输出

Number of Maps  = 2
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Starting Job
17/03/26 11:49:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/03/26 11:49:47 INFO input.FileInputFormat: Total input paths to process : 2
17/03/26 11:49:47 INFO mapreduce.JobSubmitter: number of splits:2
17/03/26 11:49:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1490497943530_0002
17/03/26 11:49:48 INFO impl.YarnClientImpl: Submitted application application_1490497943530_0002
17/03/26 11:49:48 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1490497943530_0002/
17/03/26 11:49:48 INFO mapreduce.Job: Running job: job_1490497943530_0002
17/03/26 11:49:55 INFO mapreduce.Job: Job job_1490497943530_0002 running in uber mode : false
17/03/26 11:49:55 INFO mapreduce.Job:  map 0% reduce 0%
17/03/26 11:50:02 INFO mapreduce.Job:  map 100% reduce 0%
17/03/26 11:50:08 INFO mapreduce.Job:  map 100% reduce 100%
17/03/26 11:50:08 INFO mapreduce.Job: Job job_1490497943530_0002 completed successfully
17/03/26 11:50:08 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=50
		FILE: Number of bytes written=353898
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=524
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=9536
		Total time spent by all reduces in occupied slots (ms)=3259
		Total time spent by all map tasks (ms)=9536
		Total time spent by all reduce tasks (ms)=3259
		Total vcore-milliseconds taken by all map tasks=9536
		Total vcore-milliseconds taken by all reduce tasks=3259
		Total megabyte-milliseconds taken by all map tasks=9764864
		Total megabyte-milliseconds taken by all reduce tasks=3337216
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=36
		Map output materialized bytes=56
		Input split bytes=288
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=56
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=319
		CPU time spent (ms)=2570
		Physical memory (bytes) snapshot=719585280
		Virtual memory (bytes) snapshot=5746872320
		Total committed heap usage (bytes)=513802240
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=236
	File Output Format Counters 
		Bytes Written=97
Job Finished in 21.472 seconds
Estimated value of Pi is 3.60000000000000000000

 

可以访问 Web 界面 http://localhost:50070 查看 NameNode 和 Datanode 信息,还可以在线查看 HDFS 中的文件

Hadoop学习笔记——安装Hadoop_第3张图片

 

启动 YARN 之后,运行实例的方法还是一样的,仅仅是资源管理方式、任务调度不同。观察日志信息可以发现,不启用 YARN 时,是 “mapred.LocalJobRunner” 在跑任务,启用 YARN 之后,是 “mapred.YARNRunner” 在跑任务。启动 YARN 有个好处是可以通过 Web 界面查看任务的运行情况:http://localhost:8088/cluster

Hadoop学习笔记——安装Hadoop_第4张图片

点击history,查看每一个任务,如果遇到master:19888不能访问的情况,在目录下执行

mr-jobhistory-daemon.sh start historyserver

Hadoop学习笔记——安装Hadoop_第5张图片

 

 

关于Hadoop的架构请关注下面这篇博文的内容

Hadoop HDFS概念学习系列之初步掌握HDFS的架构及原理1(一)

关于Hadoop中HDFS的读取过程请关注下面这篇博文的内容

Hadoop HDFS概念学习系列之初步掌握HDFS的架构及原理2(二)

关于Hadoop中HDFS的写入过程请关注下面这篇博文的内容

Hadoop HDFS概念学习系列之初步掌握HDFS的架构及原理3(三)

关于Hadoop中SNN的作用请关注下面这篇博文的内容

http://blog.csdn.net/xh16319/article/details/31375197

你可能感兴趣的:(Hadoop学习笔记——安装Hadoop)