首先下载:
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz
解压:
tar -zxvf hadoop-2.7.4.tar.gz
拷贝到工作目录 :
mv hadoop.xxxxx /usr/local/hadoop
安装java 1.8
下载
http://download.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.tar.gz
解压
tar -zxvf jdk-8u144-linux-x64.tar.gz
拷贝至工作目录
mv jdk-8u144-linux-x64 /usr/local/java1.8
修改配制文件
vim /etc/profile
添加变量,java和hadoop 环境变量
JAVA_HOME=/usr/local/java1.8
JRE_HOME=/usr/local/java1.8/jre
HADOOP_HOME=/usr/local/hadoop
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
PATH=$PATH:$HADOOP_HOME/bin
export PATH
创建软连接
ln -s /usr/local/hadoop/bin/hdfs /usr/bin/hdfs
ln -s /usr/local/hadoop/bin/hdfs /usr/bin/hdfs
查看是否成功
3 java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
验证hadoop
#hadoop version
Hadoop 2.7.4
Subversion https://[email protected]/repos/asf/hadoop.git -r cd915e1e8d9d0131462a0b7301586c175728a282
Compiled by kshvachk on 2017-08-01T00:29Z
Compiled with protoc 2.5.0
From source with checksum 50b0468318b4ce9bd24dc467b7ce1148
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.4.jar
出现以上结果表示安装成功
配制hadoop 配制文件:
hadoop/etc/hadoop/hadoop-env.sh
hadoop/etc/hadoop/yarn-env.sh
hadoop/etc/hadoop/core-site.xml
hadoop/etc/hadoop/hdfs-site.xml
hadoop/etc/hadoop/mapred-site.xml
hadoop/etc/hadoop/yarn-site.xml
首先
配置hadoop-env.sh
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/java/jdk1.8
2)配置yarn-env.sh
#export JAVA_HOME=/home/y/libexec/jdk1.7.0/
export JAVA_HOME=/usr/local/java/jdk1.8
3)配置core-site.xml
添加如下配置:
<configuration>
<property> <name>fs.default.namename> <value>hdfs://localhost:9000value> <description>HDFS的URI,文件系统://namenode标识:端口号description> property> <property> <name>hadoop.tmp.dirname> <value>/usr/local/hadoop/tmpvalue> <description>namenode上本地的hadoop临时文件夹description> property> configuration>
注意创建tmp目录 !!!!!!!!!!!!!!!!!!
--添加权限
#bin/hdfs dfs -chmod -R 777 /tmp
4),配置hdfs-site.xml
添加如下配置
<configuration>
<!—hdfs-site.xml-->
<property> <name>dfs.name.dirname> <value>/data0/hadoop/hdfs/namevalue> <description>namenode上存储hdfs名字空间元数据 description> property> <property> <name>dfs.data.dirname> <value>/data0/hadoop/hdfs/datavalue> <description>datanode上数据块的物理存储位置description> property> <property> <name>dfs.replicationname> <value>1value> <description>副本个数,配置默认是3,应小于datanode机器数量description> property> configuration>
5),配置mapred-site.xml
添加如下配置:
<configuration>
<property>
<name>mapreduce.framework.namename> <value>yarnvalue> property> configuration>
6),配置yarn-site.xml
添加如下配置:
<configuration>
<property>
<name>yarn.nodemanager.aux-servicesname> <value>mapreduce_shufflevalue> property> <property> <name>yarn.resourcemanager.webapp.addressname> <value>192.168.30.33:8099value> property> configuration>
Hadoop启动
1)格式化namenode
$ bin/hdfs namenode –format
2)启动NameNode 和 DataNode 守护进程
$ sbin/start-dfs.sh
3)启动ResourceManager 和 NodeManager 守护进程
$ sbin/start-yarn.sh
启动验证
1)执行jps命令,有如下进程,说明Hadoop正常启动
# jps
6097 NodeManager
11044 Jps
7497 -- process information unavailable
8256 Worker
5999 ResourceManager
5122 SecondaryNameNode
8106 Master
4836 NameNode
4957 DataNode
如果想做成完全分布式需要做以下工作:
所有节点机器ssh免密登陆,请自己去百度。
在第一台机器slave 配制文件中。添加上所有节点名称
添加所有机器名至所有节点 的/etc/hosts文件。
把第一台机器 的hadoop目录 ,复制到所有节点。
完成。
二 spark 安装
下载:
https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
解压:
tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz
拷贝
mv spark-2.2.0-bin-hadoop2.7 /usr/local/spark
修改配制文件:
cd /usr/local/spark/
cp ./conf/spark-env.sh.template ./conf/spark-env.sh
#vim spark-env.sh
添加一行
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
测试spark
./bin/run-example SparkPi 2>&1 | grep "Pi is roughly"
结果:
Pi is roughly 3.1423757118785596