淘宝Mdrill号称很强大,其所用硬件设备也很强大。但对于学习者来说,虚拟机是最经济的选择了,本文只说明如何在干净虚拟机(cen os 6.4)上安装并调试Mdrill。原理不做说明,具体请参考官方文档<< INSTALL.docx>>。
centos 6.4 final x86_64,linux用户名:mdrill
jdk1.6
hadoop cdh3u3
zookeeper-3.4.5
zeromq-2.1.7
jzmq 2.1.0
mdrill 安装包,0.20.8.3 普通版
eclipse-jee-kepler-SR1-win32
机器连接互联网
Vmware9.0
注意以上版本必须严格对应,否则有可能出现安装异常或者运行异常。
相关安装文件请联系[email protected],文件列表如下:
安装所需依赖包 |
数据导入和jdbc调用源码的eclipse工程 |
eclipse-jee-kepler-SR1-win32-with-hadoop-zookeeper-plugin.rar |
jdk_1.6.0_31.tar.gz |
Mdrill单机版安装说明.docx |
一个安装好mdrill的虚拟机-vmware9.0 |
淘云盘打包下载_20140218_14_49.zip |
1、安装 centos 6.4 final
虚拟机内存至少设置1G,使用 vmware9.0 ,只需在最开始填写用户名:mdrill,然后next,然后等待安装完成。安装完成后的界面应该是这样子的:
System->Administration->firewall
vi /etc/hosts |
修改为以下内容:
127.0.0.1 localhost 191.168.3.149 mdrill |
将 191.168.3.149修改为本机IP地址。
vi /etc/sysconfig/network |
修改network文件为以下内容:
NETWORKING=yes HOSTNAME=mdrill |
reboot
将下载下来的JDK安装包拷贝到虚拟机/home/mdrill 目录下:
[mdrill@mdrill ~]$ tar -xvf jdk_1.6.0_31.tar.gz |
命令运行完成后可以看见jdk安装文件夹
配置环境变量 JAVA_HOME,
如果不会用vi,使用gedit也可以 [mdrill@mdrill ~]$ vi .bashrc 增加如下内容 JAVA_HOME=/home/mdrill/jdk_1.6.0_31 export JAVA_HOME PATH=$PATH:$JAVA_HOME/bin export PATH 使配置生效 [mdrill@mdrill ~]$ source .bashrc 查看java版本 [mdrill@mdrill ~]$ java -version java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) |
如果出现 java version "1.6.0_31",则说明安装成功了。
首先拷贝hadoop-0.20.2-cdh3u3.tar.gz到“/home/mdrill”
解压hadoop安装包:
[mdrill@mdrill ~]$ tar -xvf hadoop-0.20.2-cdh3u3.tar.gz |
[mdrill@mdrill ~]$ vi .bashrc
增加如下内容:
HADOOP_HOME=/home/mdrill/hadoop-0.20.2-cdh3u3 PATH=$PATH:$HADOOP_HOME/bin export PATH |
[mdrill@mdrill ~]$ source .bashrc
完成后验证:
[mdrill@mdrill ~]$ hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode ….. |
如果出现:
“Usage: hadoop [--config confdir] COMMAND”,证明配置成功!
运行命令:ssh-keygen,提示输入直接回车!完成后显示如下:
[mdrill@mdrill ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/mdrill/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/mdrill/.ssh/id_rsa. Your public key has been saved in /home/mdrill/.ssh/id_rsa.pub. The key fingerprint is: 6e:0c:6f:94:09:7c:f3:44:4a:57:ce:ba:cb:aa:7d:e5 [email protected] The key's randomart image is: +--[ RSA 2048]----+ | . o.. | | . . + o | | o + . o | | o * . | | . S o | | * .. | | * .o | | + ...E | | ..ooo | +-----------------+ [mdrill@mdrill ~]$ |
进入.ssh目录:
[mdrill@mdrill ~]$ cd .ssh [mdrill@mdrill .ssh]$ ls id_rsa id_rsa.pub [mdrill@mdrill .ssh]$ |
创建authorized_keys文件
[mdrill@mdrill .ssh]$ cat id_rsa.pub >>authorized_keys [mdrill@mdrill .ssh]$ ls authorized_keys id_rsa id_rsa.pub |
授权 authorized_keys文件
[mdrill@mdrill .ssh]$ chmod 700 authorized_keys [mdrill@mdrill .ssh]$ ls authorized_keys id_rsa id_rsa.pub |
测试ssh
[mdrill@mdrill .ssh]$ ssh mdrill The authenticity of host 'mdrill (::1)' can't be established. RSA key fingerprint is 15:8f:e0:b5:37:43:60:0b:b1:fb:32:0a:a4:3b:6c:8d. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'mdrill' (RSA) to the list of known hosts. |
提示:Are you sure you want to continue connecting (yes/no),输入yes回车。
使用 “ssh mdrill”命令不提示密码输入证明已经配置成功!
需要修改/home/mdrill/hadoop-0.20.2-cdh3u3/conf下的3个文件内容:core-site.xml,mapred-site.xml,hdfs-site.xml。
core-site.xml修改为:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/mdrill/tmp</value> <description>A base for other temporary directories.</description> </property> <!-- file system properties --> <property> <name>fs.default.name</name> <value>hdfs://mdrill:9000</value> </property> </configuration> |
mapred-site.xml修改为:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>http://mdrill:9001</value> </property> </configuration> |
hdfs-site.xml修改为:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> |
注意:所有配置项和对应值不要有空格。
配置完成后启动hadoop,执行命令:start-all.sh
[mdrill@mdrill conf]$ start-all.sh starting namenode, logging to …… |
启动完成后查看hadoop进程:jps
[mdrill@mdrill conf]$ jps 14231 JobTracker 14155 SecondaryNameNode 13931 NameNode 14037 DataNode 14366 Jps 14349 TaskTracker |
可以看到以上5个进程,证明hadoop已经启动成功。
在浏览器端输入 mdrill:50070可以看到如下界面:
打开eclipse,点击右上角的图标,出现open perspective 对话框,如下图:
选择 Map/reduce,出现hadoop连接配置界面
新建配置,如下:
IP换成自己的,端口不要写反了。
新建,完成后出现hdfs目录树,如下图:
使用命令“yum -y install libtool”
[mdrill@mdrill zeromq-2.1.7]$ su Password: [root@mdrill zeromq-2.1.7]# yum -y install libtool |
获得root权限需要输入密码,密码和mdrill相同。
yum -y install gcc-c++ |
yum -y install uuid-devel yum -y install libuuid-devel |
将zeromq-2.1.7.tar.gz拷贝到目录/home/mdrill下。
cd /home/mdrill tar –xvf zeromq-2.1.7.tar.gz cd zeromq-2.1.7 ./autogen.sh ./configure ./make ./make install |
将jzmq-master.zip拷贝到目录/home/mdrill下
su mdrill cd /home/mdrill unzip jzmq-master.zip cd jzmq-master ./autogen.sh ./configure make su make install |
[mdrill@mdrill perf]$ vi /home/mdrill/.bashrc |
增加一行:
export LD_LIBRARY_PATH=/usr/local/lib
source /home/mdrill/.bashrc
cd /home/mdrill/jzmq-master/perf sh local_lat.sh tcp://127.0.0.1:5000 1 100 |
另外启动控制台:
cd /home/mdrill/jzmq-master/perf sh remote_lat.sh tcp://127.0.0.1:5000 1 100 message size: 1 [B] roundtrip count: 100 mean latency: 275.0 [us] |
看到 message size: 1 [B]….则说明配置成功。
拷贝zookeeper-3.4.5.tar.gz到目录:/home/mdrill
cd /home/mdrill tar -xvf zookeeper-3.4.5.tar.gz cd zookeeper-3.4.5 |
配置环境变量:
vi /home/mdrill/.bashrc |
.bashrc中增加如下内容:
ZOOKEEPER_HOME=/home/mdrill/zookeeper-3.4.5 export ZOOKEEPER_HOME PATH=$PATH:$ZOOKEEPER_HOME/bin export PATH |
配置生效:
source /home/mdrill/.bashrc |
验证:
[mdrill@mdrill ~]$ zkServer.sh JMX enabled by default Using config: /home/mdrill/zookeeper-3.4.5/bin/../conf/zoo.cfg grep: /home/mdrill/zookeeper-3.4.5/bin/../conf/zoo.cfg: No such file or directory mkdir: cannot create directory `': No such file or directory Usage: /home/mdrill/zookeeper-3.4.5/bin/zkServer.sh {start|start-foreground|stop|restart|status|upgrade|print-cmd} [mdrill@mdrill ~]$ |
cd /home/mdrill/zookeeper-3.4.5/conf cp cp zoo_sample.cfg zoo.cfg vi zoo.cfg |
修改zoo.cfg 的12 行为:
dataDir=/home/mdrill/zookeeperdata |
在末尾增加:
server.1=mdrill:2888:3888 |
配置完成后的zoo.cfg文件如下:
# synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/home/mdrill/zookeeperdata # the port at which the clients will connect clientPort=2181 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=mdrill:2888:3888 |
新建zookeeper服务数据文件夹
mkdir /home/mdrill/zookeeperdata vi /home/mdrill/zookeeperdata/myid |
给文件”myid”中写入1
使用命令:zkServer.sh start
[mdrill@mdrill ~]$ zkServer.sh start JMX enabled by default Using config: /home/mdrill/zookeeper-3.4.5/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [mdrill@mdrill ~]$ jps 35079 Jps 35049 QuorumPeerMain |
使用jps查看到QuorumPeerMain进程说明启动成功!
[mdrill@mdrill ~]$ zkServer.sh status JMX enabled by default Using config: /home/mdrill/zookeeper-3.4.5/bin/../conf/zoo.cfg Mode: standalone [mdrill@mdrill ~]$ |
使用命令:zkCli.sh -server mdrill:2181
[mdrill@mdrill conf]$ zkCli.sh -server mdrill:2181 Connecting to mdrill:2181 2014-03-13 03:03:22,880 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT …… 2014-03-13 03:03:23,117 [myid:] - INFO [main-SendThread(mdrill:2181):ClientCnxn$SendThread@849] - Socket connection established to mdrill/0:0:0:0:0:0:0:1:2181, initiating session [zk: mdrill:2181(CONNECTING) 0] 2014-03-13 03:03:23,366 [myid:] - INFO [main-SendThread(mdrill:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server mdrill/0:0:0:0:0:0:0:1:2181, sessionid = 0x144badfe5a60000, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null |
未完待续