Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)

一.安装虚拟机及前期准备

1.      VMWare 下安装CentOS 6.0系统,网络制式采用NAT

2.      将hadoop添加到sudoers

su  root

输入root的口令,成功后就换成了root用户

继续输入命令:

chmod u+w/etc/sudoers

vi/etc/sudoers

在此行:root ALL=(ALL:ALL) ALL 后加一行:

hadoopALL=(ALL:ALL) ALL

意思就是允许hadoop用户sudo运行任何命令

保存

chmod u-w/etc/sudoers

这是把sudoers文件的权限改回440,即root用户通常也只读。Ubuntulinux的sudo命令运行时会检查这个文件权限是否440, 如果不是440, sudo命令都没有办法工作。所以改完之后一定要改回原来的440.

3.      用winscp将jdk-6u24-linux-i586.bin、hadoop-0.20.2.tar.gz、hive-0.11.0.tar.gz、pig-0.5.0.tar.gz、zookeeper-3.4.3分别放入/usr/java、/usr/hadoop、/usr、/usr、/usr文件中。

二.ssh设置

1.      Master(NameNode| JobTracker)作为客户端,要实现无密码公钥认证,连接到服务器Salve(DataNode| Tasktracker)上时,需要在Master上生成一个密钥对,包括一个公钥和一个私钥,而后将公钥复制到所有的Slave上。当Master通过SSH连接Salve时,Salve就会生成一个随机数并用Master的公钥对随机数进行加密,并发送给Master。Master收到加密数之后再用私钥解密,并将解密数回传给Slave,Slave确认解密数无误之后就允许Master进行连接了。这就是一个公钥认证过程,其间不需要用户手工输入密码。重要过程是将客户端Master复制到Slave上。

2.      Master机器上生成密码对(以hadoop1登入)

ssh-keygen–t rsa –P ''

这条命是生成其无密码密钥对,询问其保存路径时直接回车采用默认路径。生成的密钥对:id_rsa和id_rsa.pub,默认存储在"/home/hadoop/.ssh"目录下。

查看"/home/hadoop/"下是否有".ssh"文件夹,且".ssh"文件下是否有两个刚生产的无密码密钥对。

接着在Master节点上做如下配置,把id_rsa.pub追加到授权的key里面去。

cat~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

3.      在验证前,需要做两件事儿。第一件事儿是修改文件"authorized_keys"权限(权限的设置非常重要,因为不安全的设置安全设置,会让你不能使用RSA功能),另一件事儿是用root用户设置"/etc/ssh/sshd_config"的内容。使其无密码登录有效。

1)修改文件"authorized_keys"

chmod 600~/.ssh/authorized_keys

备注:如果不进行设置,在验证时,扔提示你输入密码,在这里花费了将近半天时间来查找原因。

2)设置SSH配置

 

  用root用户登录服务器修改SSH配置文件"/etc/ssh/sshd_config"的下列内容。

vim/etc/ssh/sshd_config

RSAAuthentication yes #启用 RSA 认证

PubkeyAuthentication yes# 启用公钥私钥配对认证方式

AuthorizedKeysFile.ssh/authorized_keys # 公钥文件路径(和上面生成的文件同)

设置完之后记得重启SSH服务,才能使刚才设置有效。

service sshdrestart

退出root登录,使用hadoop1普通用户验证是否成功。

sshlocalhost

三.安装JDK(JDK1.6)

(1)    #cd /usr/java
#sudo chmod 777 jdk-6u24-linux-i586.bin

使当前用户拥有对jdk-6u24-linux-i586.bin的执行权限;

(2)    #sudo ./ jdk-6u24-linux-i586.bin
运行jdk-6u24-linux-i586.bin,这时会显示出JDK的安装许可协议,按空格翻页,最后程序会问你是不是同意上面的协议,当然同意啦,输入“yes”之后开始解压JDK到当前目录。此时屏幕上会显示解压的进度。

解压完成后 /usr/java目录下会新建一个名为“jdk-1.6.0_24”的目录,至此我们已经在CentOS下安装好了JDK。

(3)  以用户hadoop1登录,进入用户主目录/home/hadoop1,命令行中执行命令“vi.bashrc”,加入以下内容,配置用户的人环境变量,对系统的环境变量不会造成影响。

# set java environment

export JAVA_HOME=/usr/java/jdk1.6.0_24

export JRE_HOME=/usr/java/jdk1.6.0_24/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

在vi编辑器增加以上内容后保存退出,并执行以下命令使配置生效

    chmod +x/home/hadoop/ .bashrc ;增加执行权限

    source/home/hadoop/ .bashrc;

配置完毕后,在命令行中输入java-version,如出现下列信息说明java环境安装成功。

java version "1.6.0_24"

Java(TM) SE Runtime Environment (build 1.6.0_24-b04)

Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixedmode)

四.安装hadoop(以root登陆或者修改读写权限)

1. sudo chmod 777hadoop-0.20.2.tar.gz

sudo tar zxvfhadoop-0.20.2.tar.gz 进行hadoop压缩文件解压。

2.将hadoop的用户和用户组都改成创建者

sudo chown –Rhadoop:hadoop /usr/hadoop

这样就可以保存运行过程中产生的datanode和namenode等存储文件;

sudo chmod -Ra+w /usr/local

将hadoop的目录权限设为当前用户可写

3.配置hadoop-env.sh文件

命令为sudo vi hadoop-env.sh

添加 # set java environment

exportJAVA_HOME=/usr/java/jdk1.6.0_24

编辑后保存退出。

4.配置core-site.xml

[hadoop1@masterconf]# vi core-site.xml

 

 

   

       hadoop.tmp.dir

       /usr/hadoop/hadoop-0.20.2/tmp

       A base for othertemporarydirectories.

   

   

       fs.default.name

       hdfs://master:9000

   

5.配置hdfs-site.xml

[hadoop1@masterconf]# vi hdfs-site.xml

 

 

   

       dfs.replication

       1

   

6.配置mapred-site.xml

[hadoop@vm10110041conf]$ sudo vi mapred-site.xml

 

 

   

       mapred.job.tracker

       http://192.168.131.131:9001

   

 

7.配置masters文件和slaves文件

[hadoop@masterconf]# vi masters

192.168.131.131

[hadoop@masterconf]# sudo vi slaves

192.168.131.131

注:因为在伪分布模式下,作为master的namenode与作为slave的datanode是同一台服务器,所以配置文件中的ip是一样的。

8.编辑主机名

[hadoop@master ~]#sudo vi /etc/hosts

# Do not removethe following line, or various programs

that requirenetwork functionality will fail.

127.0.0.1localhost

192.168.131.131master

192.168.131.131slave

9.修改PATH

修改自己的环境变量:

Vi  /home/hadoop/.bashrc

未修改前是:Export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

在$PATH前面加入Hadoop的路径,不能在后面。

改完后是:

Export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:/usr/hadoop/hadoop-0.20.2/bin:$PATH

 

10.启动hadoop(第一次需要格式化)

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第1张图片

启动过程

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第2张图片

查看进程(一个都不能少)

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第3张图片

五.Windows下,用Eclipse配置与使用hadoop

1.      解压hadoop-0.20.2.tar.gz到windows下本地磁盘,并将目录下/contrib/eclipse-plugin中hadoop-0.20.2-eclipse-plugin复制到eclipse的plugin目录下。

2.      Eclipse下openperspective添加Map/Reduce

 

3.      切换到Map/Reduce下,右下角出现Map/ReduceLocations,点击添加

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第4张图片

Advancedparameters下修改hadoop.job.ugi中第一个用户修改为hadoop(如果没有该选项则重启eclipse继续查看),mapred.System.dir修改为/hadoop/mapred/system。

 

4.      现在在eclipse下可以看到DFSLocation,查看DFS中的文件夹,并进行新增、修改、删除操作。

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第5张图片

1.      Eclipse下windows->preference->HadoopMap/Reduce选择解压hadoop-0.20.2.tar.gz到windows下本地磁盘的路径。

2.      新建project,选择Map/ReduceProject命名为WordCount。

3.      在project下新建包wordCount,在包下新建Mapper类型WordCountMapper.classReducer类型WordCountReducer.classDriver类型WordCount.class

4. WordCountMapper.class

package wordCount;

importjava.io.IOException;

importjava.util.StringTokenizer;

importorg.apache.hadoop.io.IntWritable;

importorg.apache.hadoop.io.LongWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.io.Writable;

importorg.apache.hadoop.io.WritableComparable;

importorg.apache.hadoop.mapred.MapReduceBase;

importorg.apache.hadoop.mapred.Mapper;

importorg.apache.hadoop.mapred.OutputCollector;

importorg.apache.hadoop.mapred.Reporter;

 

public class WordCountMapperextends MapReduceBase

implementsMapper {

 

private final IntWritable one =new IntWritable(1);

private Text word = newText();

 

public voidmap(WritableComparable key, Writable value,

                   OutputCollector output, Reporter reporter) throws IOException{

 

          String line = value.toString();

          StringTokenizer itr = newStringTokenizer(line.toLowerCase());

          while(itr.hasMoreTokens()) {

                   word.set(itr.nextToken());

                   output.collect(word, one);

          }

}

 

       // found myself having to add this for Eclipse to behappy...

       // it matches the definition of the map() function better than whatthe hadoop example

       // does...  Oh well...

public void map(LongWritablekey, Text value,

                   OutputCollectoroutput, Reporter reporter) throws IOException {

          String line = value.toString();

          StringTokenizer itr = newStringTokenizer(line.toLowerCase());

          while(itr.hasMoreTokens()) {

                   word.set(itr.nextToken());

                   output.collect(word, one);

          }

}

}

5. WordCountReducer.class

package wordCount;

 

importjava.io.IOException;

importjava.util.Iterator;

 

importorg.apache.hadoop.io.IntWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.io.WritableComparable;

importorg.apache.hadoop.mapred.MapReduceBase;

importorg.apache.hadoop.mapred.OutputCollector;

importorg.apache.hadoop.mapred.Reducer;

importorg.apache.hadoop.mapred.Reporter;

 

public class WordCountReducerextends MapReduceBase

   implements Reducer {

 

  public voidreduce(Text key, Iterator values,

     OutputCollector output, Reporter reporter) throws IOException{

 

    intsum = 0;

    while(values.hasNext()) {

     IntWritable value = (IntWritable) values.next();

     sum += value.get(); // process value

   }

 

   output.collect(key, new IntWritable(sum));

  }

}

6.      Wordcount.class

package wordCount;

 

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.IntWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapred.JobClient;

importorg.apache.hadoop.mapred.JobConf;

importorg.apache.hadoop.mapred.Mapper;

importorg.apache.hadoop.mapred.Reducer;

importorg.apache.hadoop.mapred.TextInputFormat;

importorg.apache.hadoop.mapred.TextOutputFormat;

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.IntWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapred.FileInputFormat;

importorg.apache.hadoop.mapred.FileOutputFormat;

importorg.apache.hadoop.mapred.JobClient;

importorg.apache.hadoop.mapred.JobConf;

 

public class WordCount{

 

  publicstatic void main(String[] args) {

   JobClient client = new JobClient();

   JobConf conf = new JobConf(WordCount.class);

 

    //specify output types

   conf.setOutputKeyClass(Text.class);

   conf.setOutputValueClass(IntWritable.class);

 

    //specify input and output dirs

   //FileInputPath.addInputPath(conf, new Path("input"));

   //FileOutputPath.addOutputPath(conf, newPath("output"));

   conf.setInputFormat(TextInputFormat.class);

   conf.setOutputFormat(TextOutputFormat.class);

 

    //make sure In directory exists in the DFS area

    //make sure Out directory does NOT exist in DFS area

   FileInputFormat.addInputPath(conf, new Path("input"));

   FileOutputFormat.setOutputPath(conf, newPath("output"));

 

    //specify a mapper

   conf.setMapperClass(WordCountMapper.class);

 

    //specify a reducer

   conf.setReducerClass(WordCountReducer.class);

   conf.setCombinerClass(WordCountReducer.class);

 

   client.setConf(conf);

    try{

     JobClient.runJob(conf);

    }catch (Exception e) {

     e.printStackTrace();

   }

  }

}

7.      在usr/hadoop目录下新建两个文件夹分别命名为input和output,往input文件夹上传若干.txt文件。(foo.txt和sss.txt)

 

8.      选择 RunAs * Run On Hadoop

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第6张图片

13.      刷新查看output文件夹得到计数结果。

六.安装配置hive(derby)

1.      在/usr/目录下新建文件夹hive,用winscp将hive.0.11.0.tar.gz上传到该文件夹

2.      cd /usr/hive

sudo chmod 777 hive-0.11.0.tar.gz

sudo tar zxvf hive-0.11.0.tar.gz

接着建立软连接

ln –s hive-0.11.0 hive

然后,

vi /home/hadoop1/.bashrc

•添加环境变量

•export HIVE_HOME=/usr/hive/hive-0.11.0

•export PATH=…:$HIVE_HOME/bin:$PATH

3.      •进入hive/conf目录

•依据hive-env.sh.template,创建hive-env.sh文件

•cp  hive-env.sh.templatehive-env.sh

•修改hive-env.sh

•指定hive配置文件的路径

•exportHIVE_CONF_DIR=/usr/hive/hive-0.11.0/conf

•指定Hadoop路径

•HADOOP_HOME=/usr/hadoop/hadoop-0.20.2  

4.      将conf/hive-default.xml.template复制两份,分别命名为hive-default.xml(用于保留默认配置)和hive-site.xml(用于个性化配置,可覆盖默认配置)

5.      sudo chown –R hadoop:hadoop /usr/hive

 

6.      在确定hadoop正常启动的情况下启动hive

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第7张图片

7.      建立表格

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第8张图片

1.      退出hive

Exit;

七.安装配置hive(mysql)

1.      切换到root用户下,

# yum -y installmysql-server

2.      启动MySQL服务

[root@localhost ~]# chkconfig mysqld on ← 设置MySQL服务随系统启动自启动

[root@ localhost ~]# chkconfig --listmysqld ← 确认MySQL自启动

mysqld 0:off 1:off 2:on 3:on 4:on 5:on6:off ← 如果2--5为on的状态就OK

[root@ localhost ~]#/etc/rc.d/init.d/mysqldstart ← 启动MySQL服务

设置root密码(root)

[root@localhost ~]# mysql -u root ← 用root用户登录MySQL服务器

mysql> set password forroot@localhost=password('在这里填入root密码');root ←

创建hive数据库:createdatabase hive;

创建用户hive,它只能从localhost连接到数据库并可以连接到wordpress数据库:grantall on *.* to hive@localhost identified by 'hive'hive

3.      在Hive的conf目录下修改配置文件hive-site.xml,配置文件修改如下

 

javax.jdo.option.ConnectionURL 

jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true 

JDBC connectstring for a JDBCmetastore 

 

 

javax.jdo.option.ConnectionDriverName 

com.mysql.jdbc.Driver 

Driver class namefor a JDBCmetastore 

 

 

javax.jdo.option.ConnectionUserName 

hive 

username to useagainst metastoredatabase 

 

 

javax.jdo.option.ConnectionPassword 

hive 

password to useagainst metastoredatabase 

4.      把MySQL的JDBC驱动包(我使用的是mysql-connector-java-5.0.8-bin.jar,从http://downloads.mysql.com/archives/mysql-connector-java-5.0/mysql-connector-java-5.0.8.tar.gz下载并解压后可以找到)复制到Hive的lib目录下。

5.      启动Hive shell,执行

show tables;

如果不报错,表明基于独立元数据库的Hive已经安装成功了。

查看一下元数据的效果。

在Hive上建立数据表:

CREATE TABLE my(id INT,name string) ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t';

show tables;

select name from my;

然后我们以刚刚建立的hive帐号登录MySQL查看元数据信息。

mysql> use hive

Reading table information for completion of table and columnnames

You can turn off this feature to get a quicker startup with-A

Database changed

mysql> show tables;

+-----------------+

| Tables_in_hive  |

+-----------------+

| BUCKETING_COLS  |

|COLUMNS        |

| DATABASE_PARAMS |

|DBS            |

| PARTITION_KEYS  |

|SDS            |

|SD_PARAMS      |

| SEQUENCE_TABLE  |

|SERDES         |

|SERDE_PARAMS   |

|SORT_COLS      |

|TABLE_PARAMS   |

|TBLS           |

+-----------------+

13 rows in set (0.00 sec)

 

mysql> select * from TBLS;

+--------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+

| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME |OWNER  | RETENTION | SD_ID | TBL_NAME |TBL_TYPE     | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |

+--------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+

|     1 |  1319445990|    1|               0 | hadoop|        0|    1 |my      | MANAGED_TABLE |NULL              |NULL              |

+--------+-------------+-------+------------------+--------+-----------+-------+----------+---------------+--------------------+--------------------+

1 row in set (0.00 sec)

在TBLS中可以看到Hive表的元数据。

6.      应用实例(jdbc)

l 关闭防火墙

#chkconfig --level35 iptables off

(注意中间的是两个英式小短线;重启)

在使用 JDBC 开发 Hive 程序时,  必须首先开启 Hive 的远程服务接口。使用下面命令进行开启:

hive -service hiveserver

1). 测试数据(/usr)

userinfo.txt文件内容(每行数据之间用tab键隔开):

1   xiapi

2  xiaoxue   

3   qingqing

2). 在eclipse新建一个javaprojectHiveJdbcClient,packageHiveJdbcClient,classHiveJdbcClient.class。右击工程properties,library添加/usr/hive/hive-0.11.0/lib里的所有jar以及/usr/hadoop/hadoop-0.20.2里的hadoop-0.20.2-core.jar.

3).程序代码

packageHiveJdbcClient;

importjava.sql.Connection;

importjava.sql.DriverManager;

importjava.sql.ResultSet;

importjava.sql.SQLException;

importjava.sql.Statement;

importorg.apache.log4j.Logger;

 

public classHiveJdbcClient {

       private static String driverName ="org.apache.hadoop.hive.jdbc.HiveDriver";

       private static String url ="jdbc:hive://192.168.131.131:10000/default";

       private static String user = "";

       private static String password = "";

       private static String sql = "";

       private static ResultSet res;

       private static final Logger log =Logger.getLogger(HiveJdbcClient.class);

 

       public static void main(String[] args) {

               try {

                       Class.forName(driverName);

                       Connection conn = DriverManager.getConnection(url, user,password);

                       Statement stmt = conn.createStatement();

 

                       // 创建的表名

                       String tableName = "testHiveDriverTable";

                       

                       sql = "drop table " + tableName;

                       stmt.executeQuery(sql);

 

                       

                       sql = "create table " + tableName + " (key int, valuestring)  row format delimited fields terminated by'\t'";

                       stmt.executeQuery(sql);

 

                       // 执行“show tables”操作

                       sql = "show tables '" + tableName + "'";

                       System.out.println("Running:" + sql);

                       res = stmt.executeQuery(sql);

                       System.out.println("执行“showtables”运行结果:");

                       if (res.next()) {

                               System.out.println(res.getString(1));

                       }

 

                       // 执行“describe table”操作

                       sql = "describe " + tableName;

                       System.out.println("Running:" + sql);

                       res = stmt.executeQuery(sql);

                       System.out.println("执行“describetable”运行结果:");

                       while (res.next()) { 

                               System.out.println(res.getString(1) + "\t" +res.getString(2));

                       }

 

                       // 执行“load data into table”操作

                       String filepath = "/usr/userinfo.txt";

                       sql = "load data local inpath '" + filepath + "' into table " +tableName;

                       System.out.println("Running:" + sql);

                       res = stmt.executeQuery(sql);

                       

                       // 执行“select * query”操作

                       sql = "select * from " + tableName;

                       System.out.println("Running:" + sql);

                       res= stmt.executeQuery(sql);

                       System.out.println("执行“select* query”运行结果:");

                       while (res.next()) {

                               System.out.println(res.getInt(1) + "\t" +res.getString(2));

                       }

 

                       // 执行“regular hive query”操作

                       sql = "select count(1) from " + tableName;

                       System.out.println("Running:" + sql);

                       res = stmt.executeQuery(sql);

                       System.out.println("执行“regularhive query”运行结果:");

                       while (res.next()) {

                               System.out.println(res.getString(1));

 

                       }

 

                       conn.close();

                       conn = null;

               } catch (ClassNotFoundException e) {

                       e.printStackTrace();

                       log.error(driverName + " not found!", e);

                       System.exit(1);

               }catch (SQLException e) {

                       e.printStackTrace();

                       log.error("Connection error!", e);

                       System.exit(1);

               }

 

       }

}

得到结果:(eclipse)

Running:showtables 'testHiveDriverTable'

执行“show tables”运行结果:

testhivedrivertable

Running:describe testHiveDriverTable

执行“describe table”运行结果:

key                     int                

value                    string             

Running:loaddata local inpath '/usr/userinfo.txt' into tabletestHiveDriverTable

Running:select* from testHiveDriverTable

执行“select * query”运行结果:

1   xiapi

2   xiaoxue

3   qingqing

Running:selectcount(1) from testHiveDriverTable

执行“regular hive query”运行结果:

3

 

在centos终端显示

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第9张图片
Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第10张图片

八.PIG安装与配置

1. cd /usr

sudo chmod 777 pig-0.5.0.tar.gz

sudo tar zxvf pig-0.5.0.tar.gz 进行压缩文件解压。

2.vim /home/hadoop/.bashrc

添加下列几行

      export PIG_HOME=/usr/pig-0.5.0

      export PIG_HADOOP_VERSION=20

      export PIG_CLASSPATH=/usr/hadoop/hadoop-0.20.2/conf

      export PATH=···:$PIG_HOME/bin:$PATH

1.      source /home/hadoop/.bashrc

 

2.      %pig,见如下配置成功

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第11张图片

九.Zookeeper配置与安装

1.       cd /usr

sudo chmod 777 zookeeper-3.4.3.tar.gz

sudo tar zxvf zookeeper-3.4.3.tar.gz 进行压缩文件解压。

chown -R hadoop:hadoop zookeeper-3.4.3

2.       vim /home/hadoop/.bashrc

添加下列几行

      export ZOOKEEPER_HOME=/usr/zookeeper-3.4.3

      export CLASSPATH=···:$ZOOKEEPER_HOME/lib

      export PATH=···:$ZOOKEEPER_HOME/bin:$PATH

source /home/hadoop/.bashrc

3.      cd /usr/zoopkeeper-3.4.3/conf

zoo_sample.cfd文件名称改为zoo.cfg

# The number ofmilliseconds of each tick

tickTime=2000

# The number ofticks that the initial

# synchronizationphase can take

initLimit=10

# The number ofticks that can pass between

# sending arequest and getting an acknowledgement

syncLimit=5

# the directorywhere the snapshot is stored.

# do not use /tmpfor storage, /tmp here is just

# examplesakes.

dataDir=/usr/zookeeper-3.4.3/data

# the port atwhich the clients will connect

clientPort=2181

#

# Be sure to readthe maintenance section of the

# administratorguide before turning on autopurge.

#

#http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number ofsnapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge taskinterval in hours

# Set to "0" todisable auto purge feature

#autopurge.purgeInterval=1

4.      cd /usr/zoopkeeper-3.4.3

bin/zkServer.shstart启动zookeeper

十.常见问题

 

1.

Hadoop伪分布式环境搭建(hadoop-0.20.2、hive-0.11.0、pig-0.5.0、zookeeper-3.4.3)_第12张图片

2.

问题:

在eclipse上操作DFS时,正常连接,但无法进一步展开查看具体内容

分析与解决方法:

一般情况下,这种情况下是namenode或者datanode未启动或者启动之后又自动停止工作,在虚拟机下用JPS命令行下查看具体情况,如果namenode未启动,则先运行stop-all.sh,重新启动start-all.sh,JPS查看没问题,过一段时间再看如果还是自动停止则重新格式化namenode;如果是datanode未启动,则进入/usr/hadoop/hadoop-0.20.2/tmp删除掉data文件再重新启动hadoop即可解决!

 

 

3.安装过程可能不断出现一些问题,上述教程已经确保如果完全按照教程操作不会出错(除了问题1、2),如果出错,请检查是否漏掉某些步骤或者设置。

你可能感兴趣的:(database)