【 Hypertable有几种安装方式,入下表:
1. 单机:安装于单机,采用本地文件系统
2. Hadoop:分布式安装,在Hadoop之上(HDFS)
3. MapR:分布式安装,在MapR之上
4. ThriftBroker:在应用服务器上安装ThriftBroker】
(http://hypertable.com/documentation/installation/mapr/)
MapR is a scalable filesystem written in C++ thatis modeled after the Google File System and is 100% API compatible withApache Hadoop. Hypertable ships with a MapR filesystem broker thatcommunicates with the MapR servers directly and allows Hypertable toefficienly run on top of the MapR filesystem for maximumperformance. This document describes how to get Hypertable up andrunning on top of the MapR filesystem.
MapR是一个用C++编写的采用GFS结构模型的可扩展的文件系统,它的API100%与Apache Hadoop兼容。Hypertable带着一个MapR文件系统的代理,它可以直接与MapR服务器通讯,因此Hypertable可以最高效地运行在MapR之上。本文描述怎样使Hypertable运行在MapR之上。
Prerequisites
Step 1 - Install MapR
Step 2 - Install Capistrano
Step 3 - Edit Capistrano Capfile
Step 4 - Install Hypertable Binaries
Step 5 - FHS-ize Installation
Step 6 - Create and Distributehypertable.cfg
Step 7 - Set "current" link
Step 8 - Synchronize Clocks
Step 9 - Start Hypertable
Step 10 - Verify Installation
Step 11 - Stop Hypertable
What Next?
Before you getstarted with the installation, there are some general system requirements thatneed to be satisfied before proceeding. These requirements are describedin the following list.
在开始安装以前,有一些基本的系统要求必须满足,这些条件如下:
The first step ingetting Hypertable up and running on top of MapR is to install MapR andconfigure it to work with Hypertable. The Hypertable 64-bit packagescontain a MapR broker that allows Hypertable to run on top of the MapRfilesystem (MapR is currently unavailable for 32-bit platforms). FollowtheMapRInstallation Instructions and be sure to co-locate the MapR mfsprocesses with the Hypertable RangeServer machines.
To configure MapRto work properly with Hypertable, you will need to tell thewardenprocess to launch the mfs processes with a smaller heap size than thedefault. To do this, add the following line to warden configuration file(/opt/mapr/conf/warden.conf):
启动Hypertable并运行在MapR上的第一步是安装MapR,并配置它,使之与Hypertable一起工作。Hypertable的64位包中包含了一个MapR代理,允许Hypertable运行在MapR文件系统之上(MapR目前没有在32位平台上运行的版本)。遵循MapR安装指南(http://www.mapr.com/doc/display/MapR/Installation+Guide)并保证MapR的mfs进程与Hypertable RangeServer在同一机器上。
为正确配置MapR,使之与Hypertable一起工作,你需要告诉warden进程设置一个比缺省值更小的堆来启动mfs进程,为此,在warden配置文件(/opt/mapr/conf/warden.conf)中添加如下行:
service.command.mfs.heapsize.percent=20
Then launch theMapR filesystem. Once you have it up and running, create a volumecalled "hypertable" and mount it at /hypertable, use the following:
然后启动MapR文件系统,一旦它起来并运行了,用如下命令,创建一个叫“hypertable”的卷,并把它挂载到/hypertable:
$ maprcli volume create -name hypertable -path/hypertable
The Hypertable distribution comes with anumber of scripts to start and stop the various servers that make up aHypertable cluster. You can use your own cluster management tool to launchthese scripts and deploy new binaries. However, if you're not already using acluster management tool, we recommendCapistrano. The distribution comes with a Capistrano config file(conf/Capfile.cluster) that makes deploying and launching Hypertable a breeze.
Capistrano is a simple tool for automatingthe remote execution of tasks. It uses ssh to do the remote execution. To easedeployment, you should have password-less ssh access (i.e. public key) to allof the machines in your cluster. Installing Capistrano is pretty simple. Onmost systems you just need to execute the following command (Internet accessrequired):
Hypertable的安装包带有一系列启停构成Hypertable集群的各种服务的脚本,你可以用自己的集群管理工具来运行这些脚步和部署新的二进制包。但是如果你还没有一个集群管理工具,建议采用Capistrano。Hypertable安装包所带的Capistrano配置文件(conf/Capfile.cluster),会使部署和启动Hypertable非常容易。
Capistrano是一个自动化远程任务运行的小工具,它用ssh来实现远程运行。为简化部署,你应该使集群中的所有机器的ssh都采用password-less访问方式(例如,采用公钥)。安装Capistrano很简单,在远程机器上,你只需运行如下命令(需要Internet访问权限):
$ sudo gem update
$ sudo gem install capistrano
After this installation step you should nowhave the cap program in your path:
本步安装完成后,在你的路径下,你应该有如下的cap程序:
$ cap --version
Capistrano v2.9.0
Once you have Capistrano installed, copy theconf/Capfile.cluster that comes with the Hypertable distribution to yourworking directory (e.g. home directory) on admin1, rename it to Capfile, andtailor it for your environment. The cap command reads the file Capfile in thecurrent working directory by default. There are some variables that are set atthe top that you need to modify for your particular environment. The followingshows the variables at the top of the Capfile that need modification:
一旦你完成了Capistrano的安装,在admin1上拷贝随Hypertable带的conf/Capfile.cluster到你的工作目录(例如home目录),重新命名为Capfile,并根据你的环境裁剪它。缺省情况下cap命令读取工作目录的Capfile文件。对于你特定的环境,头部有几个变量需要修改,下面显示了这几个你需要修改的变量:
set :source_machine, "admin1"
set :install_dir, "/opt/hypertable"
set :hypertable_version, "0.9.5.5"
set :default_pkg, "/tmp/hypertable-0.9.5.5-linux-x86_64.rpm"
set :default_dfs, "mapr"
set :default_config, "/root/hypertable.cfg"
Here's a brief description of each variable:
以下是每个变量的简介。
Table 2. Hypertable Capistrano Variables |
|
Variable |
Description |
source_machine |
machine from which you will build the binaries, distribute them to the other machines, and launch the service. |
install_dir |
directory on source_machine where you have installed Hypertable. It is also the directory on the remote machines where the installation will get rsync'ed to. |
hypertable_version |
version of Hypertable you are deploying |
default_pkg |
Path to binary package file (.dmg, .rpm, or .tar.bz2) on source machine |
default_dfs |
distributed file system you are running Hypertable on top of. For MapR use "mapr' |
default_config |
location of the default Hypertable configuration file that plan to use |
表2 Hypertable Capistrano变量 |
|
变量 |
描述 |
source_machine |
你要构建二进制包、分发它们并启动服务的机器 |
install_dir |
source_machine上你安装Hypertable的目录。它也是安装程序镜像到远程机器上的目录。 |
hypertable_version |
部署的Hypertable的版本 |
default_pkg |
source_machine机器上指向二进制包的路径(.dmg, .rpm, or .tar.bz2) |
default_dfs |
Hypertable运行的依赖的分布式文件系统,对MapR,为”mapr” |
default_config |
Hypertable缺省的配置文件的位置 |
In addition to the above variables, you alsoneed to define three roles, one for the machine that will run the masterprocesses, one for the machines that will run the Hyperspace replicas, and onefor the machines that will run the RangeServers. Edit the following lines:
除了以上变量,你也需要定义3个角色,一个是针对运行master有关进程的机器,一个是针对运行Hypersapce复制的机器,一个是针对运行运行RangeServer的机器,编辑以下行:
role :source, "admin1"
role :master, "master"
role :hyperspace, "hyperspace001","hyperspace002", "hyperspace003"
role :slave, "slave001", "slave002", "slave003","slave004", "slave005", "slave006","slave007", "slave008"
role :localhost, "admin1"
role :thriftbroker
role :spare
The following table describes each role.
下表描述每个角色。
Table 3. Hypertable Capistrano Roles |
|
Role |
Description |
source |
The machine from which you will be distributing the binaries (admin1 in this example). |
master |
The machine that will run the Hypertable master process as well as a DFS broker. Ideally this machine is high quality and somewhat lightly loaded (e.g. not running a RangeServer). Typically you would have a high quality machine running the Hypertable master, a Hyperspace replica, and the HDFS NameNode |
hyperspace |
The machines that will run Hyperspace replicas. There should be at least one machine defined for this role. The machines that take on this role should be somewhat lightly loaded (e.g. not running a RangeServer) |
slave |
The machines that will run RangeServers. Hypertable is designed to run on a filesystem like HDFS. In fact, the system works best from a performance standpoint when the RangeServers are run on the same machines as the HDFS DataNodes. This role will also launch a DFS broker and a ThriftBroker. |
localhost |
The name of the machine that you're administering the cluster from (admin1 in this example). |
thriftbroker |
Additional machines that will be running a ThriftBroker (e.g. web servers). NOTE: You do not have to add the slave machines to this role, since a ThriftBroker is automatically started on each slave machine to support MapReduce. |
spare |
Machines that will act as standbys. They will be kept current with the latest binaries. |
表3. Hypertable Capistrano角色 |
|
Role |
Description |
source |
你准备分发二进制包到其他机器的机器(本例中是admin1). |
master |
这台机器中将运行master进程和DFS代理。理想情况下,这台机器的质量很好并且是轻载的(例如它不运行RangeServer)。典型情况下,这台好质量的计算机运行Hypertable master,Hyperspace replica和HDFS NameNode。 |
hyperspace |
这台机器将运行Hyperspace replicas。至少有一台计算机被定义成这个这个角色。承担这个角色的机器应该是那种轻载的机器 (例如不运行 RangeServer) |
slave |
这台机器运行RangeServer。Hypertable被设计成可以运行在诸如HDFS之上,从最佳性能的观点,RangeServer与HDFS DataNodes应在一台机器上。这个角色也将启动DSF broker和ThriftBroker。 |
localhost |
你从这台机器管理集群中的其他机器(本例中是admin1) |
thriftbroker |
另外的运行ThriftBroker的机器(例如web server)。注:你不必将slave机器加入到这个角色,因为为了支持MapReduce,ThriftBroker会在slave机器上自动启动。 |
spare |
备份机。它们保存有最新的二进制包。 |
The Hypertable binaries can either bedownloaded prepackaged, or you can compile them from source code. To installthe prepackaged version,download the Hypertablepackage (.dmg, .rpm, or .tar.bz2) you want to install and put it somewhereaccessible on the source machine (admin1 in this example). Modify thehypertable_version and default_pkg variables at the top of the Capfile tocontain the version of Hypertable you are installing and the absolute path tothe package file on the source machine, respectively. For example, ifyou're upgrading to version 0.9.5.5 and using the RPM package, set thevariables as follows.
Hypertable的二进制包可以是下载的预编译包或从源代码自己编译而成,为安装预编译包,下载它们(.dmg, .rpm, or .tar.bz2),并把它们放到source machine(本例中的admin1)能访问的地方,在Capfile文件头部,修改hypertable_version和default_pkg变量,使它们分别为Hypertable的版本和source机器上的包文件的绝对路径。例如,如果你安装的版本为0.9.5.5,并且使用RPM包,则变量的设置如下:
set :hypertable_version, "0.9.5.5"
set :default_pkg, "/tmp/hypertable-0.9.5.5-linux-x86_64.rpm"
To distribute and install the binary packageon all necessary machines, issue the following command. This command willcause the package to get rsync'ed to all participating machines and installedwith the appropriate package manager (rpm, dpkg, or tar) depending on thepackage type.
为分发并安装二进制包到所有必要的机器上,发出如下命令。这个命令能根据包的类型(rpm, dpkg, or tar),使用合适的包管理器,使包镜像到所有参与的机器上。
$ cap install_package
If you prefer compiling the binariesfrom source, you can use Capistrano to distribute the binaries with rsync.On admin1 be sure Hypertable is installed in the location specified by theinstall_dir variable at the top of the Capfile and that the hypertable_versionvariable at the top of the Capfile matches the version you are installing(/opt/hypertable and 0.9.5.5 in this example). Then distribute thebinaries with the following command:
如果你喜欢从源代码编译而成的二进制包,你可以用Capistrano,采用rsync命令分发二进制包。在admin1机器上,请确保Hypertable已被安装在Capfile头部install_dir变量所指定的位置,hypertable_version变量的值与你要安装的版本一致(本例中分别为opt/hypertable和0.9.5.5),然后用下面命令分发二进制包。
$ cap dist
See Filesystem Hierarchy Standard for an introduction to FHS. If you're running as a user other than root,first create the directories /etc/opt/hypertable and /var/opt/hypertable on allmachines in the cluster and change ownership to the user account under whichthe binaries will be run. For example:
FHS的介绍参阅“Filesystem Hierarchy Standard”(http://hypertable.com/documentation/misc/filesystem_hierarchy_standard_fhs/)。如果你的运行账户不是root,首先在集群的所有机器上创建两个目录/etc/opt/hypertable和 /var/opt/hypertable,把它们的所有者修改成你的运行账户,例如
$ sudo cap shell
cap> mkdir /etc/opt/hypertable /var/opt/hypertable
cap> chown chris:staff /etc/opt/hypertable/var/opt/hypertable
Then FHS-ize the installation with the following command:
接着,用如下命令完成FHS-ize安装。
$ cap fhsize
The next step is to create a hypertable.cfgfile that is specific to your deployment. A basic hypertable.cfg file canbe found in the conf/ subdirectory of your hypertable installation which can becopied and modified as needed. The following table shows the minimum set ofrequired and recommended properties that you need to modify.
下一步是创建关于你特定部署的hypertable.cfg。在Hypertable安装目录的conf/子目录下一个基本的hypertable.cfg文件,你需要的话可以拷过来修改后使用。下表描述了你可能需要修改的最少的和推荐的属性集合。
Table 1. Recommended and Required Properties |
|
Property |
Description |
Hyperspace.Replica.Host |
Hostname of Hyperspace replica |
Hypertable.RangeServer.Monitoring.DataDirectories |
This property is optional, but recommended. It contains a list of directories that are the mount points of the HDFS data node storage volumes. By setting this property appropriately, the Hypertable monitoring system will be able to provide accurate disk usage information. |
表1.推荐和必需的属性 |
|
属性 |
描述 |
Hyperspace.Replica.Host |
Hyperspace replica的宿主名字 |
Hypertable.RangeServer.Monitoring.DataDirectories |
该属性是可选的,但推荐设置。它包含一系列目录,这些目录是HDFS数据存储卷的挂载点。合适地设置此值,Hypertable监控系统就能够提供准确的磁盘使用信息。 |
You can leave all other properties at theirdefault values. Hypertable is designed to adapt to the hardware on whichit runs and to dynamically adapt to changes in workload, so no specialconfiguration is needed beyond the basic properties listed in the above table. For example, the following shows the changes we made to thehypertable.cfg file for our test cluster.
你可以不管其他的属性,就用它们的缺省值。Hypertable能够适应它运行的硬件,并且会根据负载进行调整,所以除了上面表中列出的基本属性外,其他配置项不需要特别的值。作为例子,下面给出了hypertable.cfg中我们做的修改,它用于我们的测试集群。
Hyperspace.Replica.Host=hyperspace001
Hyperspace.Replica.Host=hyperspace002
Hyperspace.Replica.Host=hyperspace003
Hypertable.RangeServer.Monitoring.DataDirectories="/data/1,/data/2,/data/3,/data/4"
See hypertable-example.cfg
请参阅hypertable-example.cfg(http://www.hypertable.org/pub/hypertable-example-cfg.txt)
Once you've created the hypertable.cfg filefor your cluster, put it on the source machine (admin1) and set the absolutepathname referenced in the default_config Capfile variable to point to thisfile (e.g. /root/hypertable.cfg). Then distribute the custom config files withthe following command.
一旦你创建了自己集群的hypertable.cfg,把它发到source机器上(admin1),设置Capfile的变量default_config,为指向这个文件的绝度路径(例如/etc/opt/hypertable/hypertable.cfg),然后用以下命令分发这个定制的配置文件。
$ cap push_config
If you ever need to make changes to theconfig file, make the changes, re-run cap push_config, and then restartHypertable (see sections 9 and 11, below).
如果你需要修改这个配置文件,修改它,然后重新用cap push_config分发它,再重启Hypertable(见第9,11节)
To make the latest version of Hypertablereferenceable from a well-known location, create a "current" link topoint to the latest installation. This can be accomplished with thefollowing command.
为使Hypertable能从一个公开的位置得到最新的版本,建议设置一个”current”,指向最新版Hypertable的安装位置。采用如下命令可完成此任务。
$ cap set_current
The system cannot operate correctly unlessthe clocks on all machines are synchronized. Use theNetwork TimeProtocol (ntp) to ensure that the clocks get synchronizedand remain in sync. Run the 'date' command on all machines to make sure theyare in sync. The following Capistrano shell session show the output of acluster with properly synchronized clocks.
如果所有机器上的时钟不同步,系统将不能正确运行。采用网络时钟协议(NTP)来保证所有的时钟同步。在所有的机器上运行“date”命令来确保它们一致。下面的Capistrano对话显示了一个集群在时间同步后的输出。
cap> date
[establishing connection(s) to master, hyperspace001,hyperspace002, hyperspace003, slave001, slave002, slave003, slave004, slave005,slave006, slave007, slave008]
** [out ::master] Sat Jan 3 18:05:33 PST 2009
** [out ::hyperspace001] Sat Jan 3 18:05:33 PST2009
** [out ::hyperspace002] Sat Jan 3 18:05:33 PST2009
** [out ::hyperspace003] Sat Jan 3 18:05:33 PST2009
** [out ::slave001] Sat Jan 3 18:05:33 PST 2009
** [out ::slave002] Sat Jan 3 18:05:33 PST 2009
** [out ::slave003] Sat Jan 3 18:05:33 PST 2009
** [out ::slave004] Sat Jan 3 18:05:33 PST 2009
** [out ::slave005] Sat Jan 3 18:05:33 PST 2009
** [out ::slave007] Sat Jan 3 18:05:33 PST 2009
** [out ::slave008] Sat Jan 3 18:05:33 PST 2009
The following commands should be run from thedirectory containing the Capfile. To start all of the Hypertable servers:
下面的命令应该在包含有Capfile的目录下运行。为启动所有的Hypertable服务器,采用:
$ cap start
If you want to launch the service using adifferent config file than the default (e.g. /home/chris/alternate.cfg):
如果你不想采用缺省的配置文件,而采用另一个(例如/home/chris/alternate.cfg)来启动,采用:
$ cap -S config=/home/chris/alternate.cfg start
You'll need to specify the same config filewhen running Hypertable commands such as the command shell, for example:
运行Hypertable命令时,你可能需要指定同样的配置文件,例如,运行命令shell:
$ /opt/hypertable/current/bin/ht shell--config=/home/chris/alternate.cfg
Create a table.
创建表。
echo "USE '/'; CREATE TABLE foo ( c1, c2 ); GETLISTING;" \
|/opt/hypertable/current/bin/ht shell --batch
The output of this command should look like:
它的输出应该像这样:
foo
sys (namespace)
Load some data.
载入数据。
echo "USE '/'; INSERT INTO foo VALUES('001','c1', 'very'), \
('000','c1', 'Hypertable'), ('001', 'c2', 'easy'), ('000', 'c2', 'is');" \
|/opt/hypertable/current/bin/ht shell --batch
Dump the table.
导出数据。
echo "USE '/'; SELECT * FROM foo;" \
| /opt/hypertable/current/bin/htshell --batch
The output of this command should look like:
它的输出像这样:
000 c1 Hypertable
000 c2 is
001 c1 very
001 c2 easy
To stop theservice, shutting down all servers:
为停止服务,用如下命令停止所有服务器:
$ cap stop
If you want towipe your database clean, removing all namespaces and tables:
如果你想清理数据库,删除所有空间和表,采用:
$ cap cleandb
Congratulations! Now that you have successfully installed Hypertable, we recommend thatyou walk through theHQL Tutorial to get familiar with using the system
恭喜!现在你已经成功安装了Hypertable,我们建议你继续HQL之旅。