大数据-hadoop 安装 spark前奏

单节点安装

开发Hadoop需要的基本软件

vmware


vmware安装ubuntu 12虚拟机配置:



开启root用户:

sudo  -s

sudo passwd root

详细参考:

http://blog.csdn.net/flash8627/article/details/44729077

 

安装vsftpd:

root@ubuntu:/usr/lib/java# apt-getinstall vsftpd

 

配置vsftpd.conf即可使用本机帐户登陆

root@ubuntu:/usr/lib/java# cp/etc/vsftpd.conf /etc/vsftpd.conf.bak

详细信息网上很多,不多说了.


Java  1.7

上传至服务器后解压,设置环境变量即可,环境变量具体参数如下:

root@ubuntu:/usr/lib/java# tar -zxvfjdk-7u80-linux-x64.tar.gz

root@ubuntu:/usr/lib/java# mv jdk1.7.0_80/usr/lib/java/jdk1.7


root@ubuntu:/usr/lib/java#  vim/root/.bashrc

 

export JAVA_HOME=/usr/lib/java/jdk1.7

export JRE_HOME=${JAVA_HOME}/jre

exportCLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib

export PATH=${JAVA_HOME}/bin:/usr/local/hadoop/hadoop-2.6.0/bin:$PATH

 

 

安装ssh



设置ssh免密码登陆

root@ubuntu:/usr/lib/java# ssh-keygen -trsa -P ""

Generating public/private rsa key pair.

Enter file in which to save the key(/root/.ssh/id_rsa):

Created directory '/root/.ssh'.

Your identification has been saved in/root/.ssh/id_rsa.

Your public key has been saved in/root/.ssh/id_rsa.pub.

The key fingerprint is:

d3:bb:1e:df:10:09:ed:62:78:43:66:9f:8f:6a:b0:e7root@ubuntu

The key's randomart image is:

+--[ RSA 2048]----+

|                 |

|          .     |

|         = .    |

|        * + o   |

|       S * *    |

|       .+ + +   |

|        oo o .  |

|       . o= o   |

|        =E . .  |

+-----------------+

 

root@ubuntu:/usr/lib/java# ls /root/.ssh/

id_rsa id_rsa.pub

root@ubuntu:/usr/lib/java# cat/root/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

root@ubuntu:/usr/lib/java# ls /root/.ssh/

authorized_keys  id_rsa id_rsa.pub

 

 安装rsync

 

root@ubuntu:/usr/lib/java#apt-get install  rsync

 

 hadoop  2.6

解压hadoop 

tar -zxvf   /home/ftp/hadoop-2.6.0

 

配置hadoop-env.sh

cd  /usr/local/hadoop/hadoop-2.6.0/etc/hadoop/

vim  hadoop-env.sh

 

# export JAVA_HOME=${JAVA_HOME}

export JAVA_HOME=/usr/lib/java/jdk1.7

 

配置hadoop环境变量,文件相对于用户目录下.bashrc

cat  ~/.bashrc

export JAVA_HOME=/usr/lib/java/jdk1.7

export JRE_HOME=${JAVA_HOME}/jre

exportCLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib

export PATH=${JAVA_HOME}/bin:/usr/local/hadoop/hadoop-2.6.0/bin:$PATH

 

验证环境变量:hadoopversion

 

运行wordcount

 

mkdir   input 

root@ubuntu:/usr/local/hadoop/hadoop-2.6.0#cp README.txt input

 

 

root@ubuntu:/usr/local/hadoop/hadoop-2.6.0#hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcountinput output

 

root@ubuntu:/usr/local/hadoop/hadoop-2.6.0# cat  output/*


配置Hadoop单机模式并运行Wordcount示例

  

主要涉及以下配置信息:修改hadoop核心配置文件core-site.xml,主要是配置hdfs的地址和端口号.修改hadoop中hdfs的配置文件hdfs-site.xml,主要是配置replication.修改hadoop的MapReduce的配置文件mapred-site.xml,主要是配置JobTracker的地址和端口.文件所在的目录:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop 

core-site.xml 


      

fs.default.name      

hdfs://localhost:9000

      

hadoop.tmp.dir      

/usr/local/hadoop/tmp

 

vim  hdfs-site.xml

      

dfs.replication>     

 1

      

dfs.name.dir      

/usr/local/hadoop/hdfs/name

      

dfs.data.dir

       /usr/local/hadoop/hdfs/data


root@ubuntu:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop#cp mapred-site.xml.template mapred-site.xml

root@ubuntu:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop#vim mapred-site.xml


mapred.job.tracker

       localhost:9001

 

接下来进行namenode格式化:

hadoop namenode -format

 

第二次格式化需要输入Y完成格式化过程

 

启动hadoop:start-all.sh

root@ubuntu:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop#../../sbin/start-all.sh

This script is Deprecated. Instead usestart-dfs.sh and start-yarn.sh

Starting namenodes on [localhost]

localhost: starting namenode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-ubuntu.out

localhost: starting datanode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-ubuntu.out

Starting secondary namenodes [0.0.0.0]

The authenticity of host '0.0.0.0(0.0.0.0)' can't be established.

ECDSA key fingerprint is81:a2:0b:4d:95:43:c7:3f:84:f1:a4:d4:24:30:53:bf.

Are you sure you want to continueconnecting (yes/no)?  yes

0.0.0.0: Warning: Permanently added'0.0.0.0' (ECDSA) to the list of known hosts.

0.0.0.0: starting secondarynamenode,logging to /usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-secondarynamenode-ubuntu.out

starting yarn daemons

starting resourcemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-resourcemanager-ubuntu.out

localhost: starting nodemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-ubuntu.out

 

查看hadoop运行进程jps

 

root@ubuntu:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop#jps

4300 NodeManager

4085 ResourceManager

4510 Jps

3951 SecondaryNameNode

3652 DataNode

3443 NameNode

 

集群监控查看:

http://localhost:50070/dfshealth.jsp

或用新的UI:  http://192.168.222.143:50070/dfshealth.html#tab-overview


在hdfs上建目录:

hadoop  fs  -mkdir /input

 

上传文件:

hadoop  fs  -copyFromLocal  /usr/local/hadoop/hadoop-2.6.0/etc/hadoop/*  /input


至此伪集群完成.    


如有需要可进QQ群[大数据交流 208881891]询问.




集群安装


1./etc/hostname修改主机名并在/etc/hosts中配置主机名和IP的映射关系

 

主要修改主机名:/etc/hostname

配置映射关系:/etc/hosts

 

192.168.222.143     Master

192.168.222.144     Slave1

192.168.222.145     Slave2

 

配置ssh无密码登陆ssh-keygen  -t  rsa -P  "" 

scp  id_rsa.pub Slave1:/root/.ssh/Master.pub  远程拷贝  

cat  id_rsa.pub  >>authorized_keys 

 

修改hadoop配置:

把先前的localhost改成Master 

具体配置如下:

core-site.xml

 

        fs.default.name

        hdfs://Master:9000

        hadoop.tmp.dir

        /usr/local/hadoop/hadoop-2.6.0/tmp

 

   

 

hdfs-site.xml

 

        dfs.replication>

        3

        dfs.name.dir

        /usr/local/hadoop/hdfs/name

        dfs.data.dir

        /usr/local/hadoop/hdfs/data

   

 

 

mapred-site.xml

        mapred.job.tracker

        Master:9001

   

 

slaves

Master

Slave1

    Slave2

 

 

将java和hadoop拷贝到远程节点:

root@Master:/usr/lib/java#  

scp  -r  jdk1.7  Slave1:/usr/lib/java/

scp  -r hadoop-2.6.0 Slave1:/usr/local/hadoop/

 

拷贝完成后修改slave的环境配置

 

export JAVA_HOME=/usr/lib/java/jdk1.7

export JRE_HOME=${JAVA_HOME}/jre

export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib

    export PATH=${JAVA_HOME}/bin:/usr/local/hadoop/hadoop-2.6.0/bin:$PATH

 

先清理hdfs/name和data,  tmp目录

 

格式化集群:hadoop  namenode -format

 

启动集群:

root@Master:/usr/local/hadoop/hadoop-2.6.0/sbin# ./start-all.sh

Thisscript is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Startingnamenodes on [Master]

Theauthenticity of host 'master (192.168.222.143)' can't be established.

ECDSAkey fingerprint is 81:a2:0b:4d:95:43:c7:3f:84:f1:a4:d4:24:30:53:bf.

Areyou sure you want to continue connecting (yes/no)? yes

Master:Warning: Permanently added 'master,192.168.222.143' (ECDSA) to the list ofknown hosts.

Master:starting namenode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-Master.out

Master:starting datanode, logging to /usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-Master.out

Slave2:starting datanode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-Slave2.out

Slave1:starting datanode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-Slave1.out

Startingsecondary namenodes [0.0.0.0]

0.0.0.0:starting secondarynamenode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-secondarynamenode-Master.out

startingyarn daemons

startingresourcemanager, logging to /usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-resourcemanager-Master.out

Slave1:starting nodemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-Slave1.out

Master:starting nodemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-Master.out

Slave2:starting nodemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-Slave2.out

 

root@Master:/usr/local/hadoop/hadoop-2.6.0/sbin# jps

2912 DataNode

3182 SecondaryNameNode

3557 NodeManager

3855 Jps

3342 ResourceManager

2699 NameNode

 

root@Master:/usr/local/hadoop/hadoop-2.6.0/sbin# hadoop dfsadmin-report

DEPRECATED: Use of this script toexecute hdfs command is deprecated.

Instead use the hdfs command for it.

 

Configured Capacity: 56254304256(52.39 GB)

Present Capacity: 48346591232 (45.03GB)

DFS Remaining: 48346517504 (45.03 GB)

DFS Used: 73728 (72 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

 

-------------------------------------------------

Live datanodes (3):

 

Name: 192.168.222.143:50010 (Master)

Hostname: Master

Decommission Status : Normal

Configured Capacity: 18751434752(17.46 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 2651889664 (2.47 GB)

DFS Remaining: 16099520512 (14.99 GB)

DFS Used%: 0.00%

DFS Remaining%: 85.86%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sat Jun 11 10:51:41 CST2016

 

 

Name: 192.168.222.144:50010 (Slave1)

Hostname: Slave1

Decommission Status : Normal

Configured Capacity: 18751434752(17.46 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 2653249536 (2.47 GB)

DFS Remaining: 16098160640 (14.99 GB)

DFS Used%: 0.00%

DFS Remaining%: 85.85%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sat Jun 11 10:51:41 CST2016

 

 

Name: 192.168.222.145:50010 (Slave2)

Hostname: Slave2

Decommission Status : Normal

Configured Capacity: 18751434752(17.46 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 2602573824 (2.42 GB)

DFS Remaining: 16148836352 (15.04 GB)

DFS Used%: 0.00%

DFS Remaining%: 86.12%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Sat Jun 11 10:51:42 CST 2016



 

root@Master:/usr/local/hadoop/hadoop-2.6.0/sbin#./stop-all.sh

This script is Deprecated. Insteaduse stop-dfs.sh and stop-yarn.sh

Stopping namenodes on [Master]

Master: stopping namenode

Master: stopping datanode

Slave1: stopping datanode

Slave2: stopping datanode

Stopping secondary namenodes[0.0.0.0]

0.0.0.0: stopping secondarynamenode

stopping yarn daemons

stopping resourcemanager

Slave1: stopping nodemanager

Master: stopping nodemanager

Slave2: stopping nodemanager

Slave1: nodemanager did not stopgracefully after 5 seconds: killing with kill -9

Slave2: nodemanager did not stopgracefully after 5 seconds: killing with kill -9

no proxyserver to stop

 

下一篇:在此基础上  spark集群搭建  


啥情况都可以进群讨论.

QQ群:大数据交流 208881891





你可能感兴趣的:(hadoop,大数据,Spark)