hadoop完全分布式安装记录_部分


集群节点:共4台机器,1台master,3台slave


各节点信息:

hadoop各节点管理用户均为jared,组均为hadoop,配置步骤略

|hostname| 内网ip(static)  | 出口ip(dhcp) |      

| master | 192.168.255.25  | 192.168.1.10 |

| node1  | 192.168.255.26  | 192.168.1.11 |

| node2  | 192.168.255.27  | 192.168.1.12 |

| node3  | 192.168.255.28  | 192.168.1.13 |


系统环境:CentOS release 6.5 (Final)

hadoop版本:apache hadoop0.20.2

java版本:jdk1.7(建议最好是1.6)


环境变量设置

export JAVA_HOME=/usr/java/jdk1.7.0_51

export HADOOP_INSTALL=/home/jared/hadoop

export PATH=$PATH:$HADOOP_INSTALL/bin


各节点配置ssh互信:

配置思路:

1.各节点分别以jared用户生成一对秘钥,分别为id_rsa和id_rsa.pub

2.将各个节点的公钥(id_rsa.pub)里面的内容全部放到一个名字为authorized_keys的文件中

3.将authorized_keys文件分别copy到各个节点的/home/jared/.ssh路径下

4.各个节点互相ssh登陆测试,首次登陆需要输入"yes",以后就不需要输入了


本系统环境中authorized_keys的内容如下所示:

vim  authorized_keys

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA5u69aO7lSceNLuQFbZkRt/V4O6nxc4QNXQRNxiar+k15c3Fe+5pFMOBpQZFxgk6w4490Z/koM6HJ7Kg2s9jnSSkyhJk7YuzYvUkmQbZG0uEyxX1uor/lTlySXuwlokSzLwTaKnEk1Wkq/s7eR3zcItrX++fAnKas9IZcziZJ+fCWBH3c2BNql2/K0j3jT+oTUaNY4mPZwnYljPZr/eldQOQcM0dDtS5Q/UWHC8USXQrBtCzOTiRlIVyFC7KEMThkkfSfvPjG7bT5O2Rg9R5gzMgIsku6d0KQMQ1GKmTbV3OYStUx7ByhM8GmDN/FFZU94lW/pjcTLeqjE61FJHJ1HQ== jared@master

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqtK8BTl3wLy4Oc9mg6Xj+APrjATDWd5vFdNzP6VKi2ZWV9YuN+8Snsj6Vay6d9w7CrVzO8lShSIG2PId9YwiwBnvFzPigF2Gk9ncsSNbLzOX+9OR3jGe1NNIdfBJQfMuD/l42X4sMwJKDjK+Wpp5bQSQ63qO4vtBJ1MbM7D8FyUTIse9GgPP7otdKWEEDMQHPKXmHoKWhhg26ht3wfICqrLzLyhQhFjpYCo32d6rhLfe844ICaqEfrLnlN4wfHb19pRXhuQpMCwdsnRarGKBkQmsRW2+LtvjDvARBdefpuAEtATWfcY/48nwibOp/xPkdYKbaNSceEbDWists5tXFw== jared@node1

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA6tSOqPq76+s8EU8qj5wtcHRan9MHHWJD9HmJkhtstcDyXnoBVU0sEJdJ5sAr/2B7pq8NMAloD54KcjxhRzbj0gKO3NDBwE4Yg69hoo+uD7rNRW6yqPoONVpKEr5ngMEwjh0xh6U4whWORHfhI8sqEJX+snTNxMed3Vv7OqJVno+MplyEpTrf+vlZa9nG9Woe1QONM8s5/lJMsZHY+lgT0e1u3jR+Kedc9RMch4hfOowc1BA4IQI/bhuYAgClYkTiFZzFlX/Crio4rq22XzpFFB5+QWiUKqMCrdo9ikPhlfw3MSnnEb+/GqP8LDGuuCuzrrLj7y1184QBydFOMZPLCQ== jared@node2

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAw7Y53YhJ5L9OANGmGE6bzCT82QIVXR+AJycIGk/O5NDMiuPYKOU+HUYfWyNiY/yPKYiQbFLb4o0rTIbUOvpTLGEz8Tz7pm5Dd8OJca4DlUN/PK8Cp1osXsZa2IeGyeL/yP+RplK/zDm1xrldDjSUhFyPTOGMcAzMQkB3N+hc6s6UleV+J78YJVBeaz5foGir/gR5MBr5bpZpiYH0KVxDw65rwsBHu7KlVy5Q4lKkMUmccnKLdyVO0gnwWWenpc71UHJ0yADOzdQSpZtDjgf0dyrfiVpWDzLbj49Ie34X1kzKKXtrOeLZfSYssjf7585Qra3L+TO52Sq7yHc7oVBVLQ== jared@node3


配置hosts文件,并且分别拷贝至各个节点的相同路径下

vim /etc/hosts

添加如下内容:

192.168.255.25 master

192.168.255.26 node1

192.168.255.27 node2

192.168.255.28 node3


hadoop配置文件

vim hadoop-env.sh

添加内容:

export JAVA_HOME=/usr/java/jdk1.7.0_51


核心配置文件

vim core-site.xml

新添加如下内容:

<property>

<name>fs.default.name</name>

<value>hdfs://master:9000</value>

<final>true</final>

</property>

<property>

 <name>hadoop.tmp.dir</name>

 <value>/home/jared/hadoop/tmp</value>

 <description>A base for other temporary directories</description>

</property>


hdfs配置文件

vim hdfs-site.xml

新添加如下内容:

<property>

<name>dfs.name.dir</name>

<value>/home/jared/hadoop/name</value>

<final>true</final>

</property>


<property>

<name>dfs.data.dir</name>

<value>/home/jared/hadoop/data</value>

<final>true</final>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

<final>true</final>

</property>


mapreduce配置文件

vim mapred-site.xml

新添加如下内容:

<property>

<name>mapred.job.tracker</name>

<value>192.168.255.25:9001</value>>

</property>


指定master节点

vim masters

新添加如下内容:

master


指定slave节点

vim slaves

新添加如下内容:

node1

node2

node3



向各节点复制hadoop,路径均为/home/jared/

copy方法:

[jared@master ~]$ scp -r ./hadoop/ node1:~

[jared@master ~]$ scp -r ./hadoop/ node2:~

[jared@master ~]$ scp -r ./hadoop/ node3:~



首次启动hadoop需要先格式化文件系统

[jared@master conf]$ hadoop namenode -format

14/02/20 23:36:55 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = master/192.168.255.25

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 0.20.2

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

14/02/20 23:36:55 INFO namenode.FSNamesystem: fsOwner=jared,hadoop,adm

14/02/20 23:36:55 INFO namenode.FSNamesystem: supergroup=supergroup

14/02/20 23:36:55 INFO namenode.FSNamesystem: isPermissionEnabled=true

14/02/20 23:36:56 INFO common.Storage: Image file of size 95 saved in 0 seconds.

14/02/20 23:36:56 INFO common.Storage: Storage directory /home/jared/hadoop/name has been successfully formatted.

14/02/20 23:36:56 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at master/192.168.255.25

************************************************************/


启动hadoop

[jared@master ~]$ start-all.sh

starting namenode, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-namenode-master.out

node1: starting datanode, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-datanode-node1.out

node2: starting datanode, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-datanode-node2.out

master: starting secondarynamenode, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-secondarynamenode-master.out

starting jobtracker, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-jobtracker-master.out

node1: starting tasktracker, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-tasktracker-node1.out

node2: starting tasktracker, logging to /home/jared/hadoop/bin/../logs/hadoop-jared-tasktracker-node2.out


用jps检验各后台进程是否启动

[jared@master ~]$ /usr/java/jdk1.7.0_51/bin/jps

22642 SecondaryNameNode

22503 NameNode

22810 Jps

22705 JobTracker


[jared@node1 conf]$ /usr/java/jdk1.7.0_51/bin/jps

22703 Jps

22610 TaskTracker

22542 DataNode


[root@node2 conf]# /usr/java/jdk1.7.0_51/bin/jps

22609 Jps

22503 TaskTracker

22445 DataNode


测试hadoop wordcount

[jared@master ~]$ /usr/java/jdk1.7.0_51/bin/jps

22642 SecondaryNameNode

22503 NameNode

23874 Jps

22705 JobTracker

[jared@master ~]$ pwd

/home/jared

[jared@master ~]$ mkdir input

[jared@master ~]$ cd input/

[jared@master input]$ ls

[jared@master input]$ echo "hello world">test1.txt

[jared@master input]$ echo "hello hadoop">test2.txt

[jared@master input]$ ls

test1.txt  test2.txt

[jared@master input]$ cat test1.txt

hello world

[jared@master input]$ cat test2.txt  

hello hadoop


上传文件到HDFS

[jared@master input]$ hadoop dfs -put ../input in

[jared@master input]$ hadoop dfs -ls in

Found 2 items

-rw-r--r--   2 jared supergroup         12 2014-02-21 00:14 /user/jared/in/test1.txt

-rw-r--r--   2 jared supergroup         13 2014-02-21 00:14 /user/jared/in/test2.txt


hadoop测试,统计文件中单词的数量, wordcount

[jared@master input]$ hadoop jar /home/jared/hadoop/hadoop-0.20.2-examples.jar wordcount in out

14/02/21 00:17:01 INFO input.FileInputFormat: Total input paths to process : 2

14/02/21 00:17:02 INFO mapred.JobClient: Running job: job_201402202338_0001

14/02/21 00:17:03 INFO mapred.JobClient:  map 0% reduce 0%

14/02/21 00:17:12 INFO mapred.JobClient:  map 50% reduce 0%

14/02/21 00:17:13 INFO mapred.JobClient:  map 100% reduce 0%

14/02/21 00:17:24 INFO mapred.JobClient:  map 100% reduce 100%

14/02/21 00:17:26 INFO mapred.JobClient: Job complete: job_201402202338_0001

14/02/21 00:17:26 INFO mapred.JobClient: Counters: 17

14/02/21 00:17:26 INFO mapred.JobClient:   Job Counters

14/02/21 00:17:26 INFO mapred.JobClient:     Launched reduce tasks=1

14/02/21 00:17:26 INFO mapred.JobClient:     Launched map tasks=2

14/02/21 00:17:26 INFO mapred.JobClient:     Data-local map tasks=2

14/02/21 00:17:26 INFO mapred.JobClient:   FileSystemCounters

14/02/21 00:17:26 INFO mapred.JobClient:     FILE_BYTES_READ=55

14/02/21 00:17:26 INFO mapred.JobClient:     HDFS_BYTES_READ=25

14/02/21 00:17:26 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=180

14/02/21 00:17:26 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=25

14/02/21 00:17:26 INFO mapred.JobClient:   Map-Reduce Framework

14/02/21 00:17:26 INFO mapred.JobClient:     Reduce input groups=3

14/02/21 00:17:26 INFO mapred.JobClient:     Combine output records=4

14/02/21 00:17:26 INFO mapred.JobClient:     Map input records=2

14/02/21 00:17:26 INFO mapred.JobClient:     Reduce shuffle bytes=61

14/02/21 00:17:26 INFO mapred.JobClient:     Reduce output records=3

14/02/21 00:17:26 INFO mapred.JobClient:     Spilled Records=8

14/02/21 00:17:26 INFO mapred.JobClient:     Map output bytes=41

14/02/21 00:17:26 INFO mapred.JobClient:     Combine input records=4

14/02/21 00:17:26 INFO mapred.JobClient:     Map output records=4

14/02/21 00:17:26 INFO mapred.JobClient:     Reduce input records=4



列出HDFS下的文件

[jared@master input]$ hadoop dfs -ls

Found 2 items

drwxr-xr-x   - jared supergroup          0 2014-02-21 00:14 /user/jared/in

drwxr-xr-x   - jared supergroup          0 2014-02-21 00:17 /user/jared/out

[jared@master input]$ hadoop dfs -ls out

Found 2 items

drwxr-xr-x   - jared supergroup          0 2014-02-21 00:17 /user/jared/out/_logs

-rw-r--r--   2 jared supergroup         25 2014-02-21 00:17 /user/jared/out/part-r-00000


查看HDFS下某个文件的内容

[jared@master input]$ hadoop dfs -cat out/part-r-00000

hadoop  1

hello   2

world   1

[jared@master input]$


[jared@node1 data]$ pwd

/home/jared/hadoop/data

[jared@node1 data]$ ls -lR

.:

total 16

drwxr-xr-x. 2 jared hadoop 4096 Feb 20 23:50 current

drwxr-xr-x. 2 jared hadoop 4096 Feb 20 22:53 detach

-rw-r--r--. 1 jared hadoop    0 Feb 20 22:53 in_use.lock

-rw-r--r--. 1 jared hadoop  157 Feb 20 22:53 storage

drwxr-xr-x. 2 jared hadoop 4096 Feb 20 23:50 tmp


./current:

total 368

-rw-r--r--. 1 jared hadoop      4 Feb 20 22:54 blk_1488417308273842703

-rw-r--r--. 1 jared hadoop     11 Feb 20 22:54 blk_1488417308273842703_1001.meta

-rw-r--r--. 1 jared hadoop  16746 Feb 20 23:49 blk_1659744422027317455

-rw-r--r--. 1 jared hadoop    139 Feb 20 23:49 blk_1659744422027317455_1033.meta

-rw-r--r--. 1 jared hadoop   8690 Feb 20 23:50 blk_-3027154220961892181

-rw-r--r--. 1 jared hadoop     75 Feb 20 23:50 blk_-3027154220961892181_1034.meta

-rw-r--r--. 1 jared hadoop 142466 Feb 20 23:38 blk_3123495904277639429

-rw-r--r--. 1 jared hadoop   1123 Feb 20 23:38 blk_3123495904277639429_1013.meta

-rw-r--r--. 1 jared hadoop     12 Feb 20 23:49 blk_5040281988852807225

-rw-r--r--. 1 jared hadoop     11 Feb 20 23:49 blk_5040281988852807225_1028.meta

-rw-r--r--. 1 jared hadoop     25 Feb 20 23:50 blk_-538339897708158192

-rw-r--r--. 1 jared hadoop     11 Feb 20 23:50 blk_-538339897708158192_1034.meta

-rw-r--r--. 1 jared hadoop     13 Feb 20 23:49 blk_6041811899305324558

-rw-r--r--. 1 jared hadoop     11 Feb 20 23:49 blk_6041811899305324558_1027.meta

-rw-r--r--. 1 jared hadoop 142466 Feb 20 23:38 blk_-7701193131489368534

-rw-r--r--. 1 jared hadoop   1123 Feb 20 23:38 blk_-7701193131489368534_1010.meta

-rw-r--r--. 1 jared hadoop   1540 Feb 20 23:50 dncp_block_verification.log.curr

-rw-r--r--. 1 jared hadoop    155 Feb 20 22:53 VERSION


./detach:

total 0


./tmp:

total 0

[jared@node1 data]$



查看HDFS基本统计信息

[jared@node1 data]$ hadoop dfsadmin -report

Configured Capacity: 103366975488 (96.27 GB)

Present Capacity: 94912688128 (88.39 GB)

DFS Remaining: 94911893504 (88.39 GB)

DFS Used: 794624 (776 KB)

DFS Used%: 0%

Under replicated blocks: 2

Blocks with corrupt replicas: 0

Missing blocks: 0


-------------------------------------------------

Datanodes available: 2 (2 total, 0 dead)


Name: 192.168.255.27:50010

Decommission Status : Normal

Configured Capacity: 51683487744 (48.13 GB)

DFS Used: 397312 (388 KB)

Non DFS Used: 4226998272 (3.94 GB)

DFS Remaining: 47456092160(44.2 GB)

DFS Used%: 0%

DFS Remaining%: 91.82%

Last contact: Fri Feb 21 08:32:47 EST 2014



Name: 192.168.255.26:50010

Decommission Status : Normal

Configured Capacity: 51683487744 (48.13 GB)

DFS Used: 397312 (388 KB)

Non DFS Used: 4227289088 (3.94 GB)

DFS Remaining: 47455801344(44.2 GB)

DFS Used%: 0%

DFS Remaining%: 91.82%

Last contact: Fri Feb 21 08:32:46 EST 2014



[jared@node1 data]$



进入和退出安全模式

[jared@node1 data]$ hadoop dfsadmin -safemode enter

Safe mode is ON

[jared@node1 data]$ hadoop dfsadmin -safemode leave

Safe mode is OFF

[jared@node1 data]$


添加新节点步骤:

 在新节点安装好hadoop

 把namenode的有关配置文件复制到该节点

 修改masters和slaves文件,增加该节点

 设置ssh免密码进出该节点

 单独启动该节点上的datanode和tasktracker(hadoop-daemon.sh start datanode/tasktracker)

 运行start-balancer.sh进行数据负载均衡  作用:当节点出现故障,或新增加节点时,数据块分布可能不均匀,负载均衡可以重 新平衡各个datanode上数据块的分布



新加节点操作历史:

用户:root

  54  groupadd hadoop

  55  useradd -s /bin/bash -d /home/jared -m jared -g hadoop -G adm

  56  passwd jared

  57  rpm -ivh /home/jared/jdk-7u51-linux-x64.rpm

  58  vim /etc/profile

  59  source /etc/profile

  60  exit


用户:jared

   1  ssh-keygen -t rsa

   2  cd .ssh/

   3  ls

   4  vim id_rsa.pub

   5  ls

   6  vim authorized_keys

   8  ssh node3

   9  ssh node1

  10  ssh node2

  11  ssh master

  12  ls

  13  cd

  14  ls

  15  cd /usr/java/

  16  ls

  17  cd

  18  ls

  19  vim /etc/profile

  20  su - root

  21  echo $JAVA_HOME

  22  source /etc/profile

  23  echo $JAVA_HOME

  24  cat /etc/hosts

  26  ls

  27  ll

  28  ls

  29  cd hadoop/

  30  ls

  31  cd

  32  vim /etc/profile

  33  echo $HADOOP_INSTALL

  34  hadoop-daemon.sh start datanode

  35  hadoop-daemon.sh start tasktracker

  36  /usr/java/jdk1.7.0_51/jps

  37  start-balancer.sh

  38  /usr/java/jdk1.7.0_51/jps

  39  source /etc/profile

  40  /usr/java/jdk1.7.0_51/bin/jps

  42  history


你可能感兴趣的:(hadoop,完全分布式安装)