【 Hypertable有几种安装方式,入下表:
1. 单机:安装于单机,采用本地文件系统
2. Hadoop:分布式安装,在Hadoop之上(HDFS)
3. MapR:分布式安装,在MapR之上
4. ThriftBroker:在应用服务器上安装ThriftBroker】
(http://hypertable.com/documentation/installation/quick_start_cluster_installation/)
Hadoop is an opensource implementation of the Google Filesystem and MapReduce parallelcomputation framework. The Hadoop filesystem (HDFS) is the filesystemthat most people run Hypertable on top of as it contains all of thearchitectural features required to efficiently support Hypertable. This document describes how to get Hypertable up and running on top of theHadoop filesystem.
Hadoop是Google文件系统和MapReduce并行计算框架的开源实现。因为Hadoop文件系统(HDFS)架构包含了所有高效支持Hypertable的特征,所有大多数人将Hypertable运行于它之上。本文描述了怎样使Hypertable启动并运行在HDFS之上。
Prerequisites
Step 1 - Install HDFS
Step 2 - Install Capistrano
Step 3 - Edit Capistrano Capfile
Step 4 - Install Hypertable Binaries
Step 5 - FHS-ize Installation
Step 6 - Create and Distributehypertable.cfg
Step 7 - Set "current" link
Step 8 - Synchronize Clocks
Step 9 - Start Hypertable
Step 10 - Verify Installation
Step 11 - Stop Hypertable
What Next?
Before you getstarted with the installation, there are some general system requirements thatneed to be satisfied before proceeding. These requirements are describedin the following list.
在开始安装以前,有一些基本的系统要求必须满足,这些条件如下:
The first step ingetting Hypertable up and running on top of Hadoop is to install HDFS. Hypertable currently builds against Cloudera's CDH3 distribution ofHadoop (seeCDH3 Installation for installation instructions). Each RangeServer process should runon a machine that is also running an HDFS DataNode. It's best not to runthe HDFS NameNode on the same machine as a RangeServer since both of thoseprocesses tend to consume a lot of RAM.
To accommodateBigtable-style workload, HDFS needs to be specially configured. Thedfs.datanode.max.xcievers property, which controls the number of files that aDataNode can service concurrently, should be increased to at least 4096 and thedfs.namenode.handler.count, whichcontrols the number of NameNode threads available to handle RPCs, should beincreased to at least 20. This can be accomplished by adding the followinglines to the conf/hdfs-site.xml file.
在Hadoop上启动并运行Hypertable的第一步是安装HDFS。目前,Hypertable是在Hadoop的Cloudera CDH3(关于CDH3请参阅其安装指南)上构建的,每个运行RangeServer进程的机器上,也要运行一个HDFS的DataNode。最好不要将HDFS的NameNode和一个RangeServer运行在一台机器上,因为它们都要消耗大量内存。
为了满足像Bigtable一样的工作负载,HDFS需要特别的配置。属性dfs.datanode.max.xcievers(一个DataNode能同时控制的文件数目)应至少增加到4096,处理RPC的NameNode的线程数dfs.namenode.handler.count应至少增加到20。这些都是通过在conf/hdfs-site.xml增加以下行来完成。
<property>
<name>dfs.namenode.handler.count
<value>20</value>
</name></property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
Once thefilesystem is installed, create a /hypertable directory that is readable andwritable by the user account in which hypertable will run. For example:
文件系统安装好以后,创建目录/hypertable,使它对运行Hypertable的账户具有读写权限,例如:
sudo -u hdfs hadoop fs -mkdir /hypertable
sudo -u hdfs hadoop fs -chmod 777 /hypertable
The Hypertabledistribution comes with a number of scripts to start and stop the variousservers that make up a Hypertable cluster. You can use your own clustermanagement tool to launch these scripts and deploy new binaries. However, ifyou're not already using a cluster management tool, we recommendCapistrano. The distribution comes with a Capistrano config file(conf/Capfile.cluster) that makes deploying and launching Hypertable a breeze.
Capistrano is asimple tool for automating the remote execution of tasks. It uses ssh to do theremote execution. To ease deployment, you should have password-less ssh access(i.e. public key) to all of the machines in your cluster. Installing Capistranois pretty simple. On most systems you just need to execute the followingcommands (Internet access required):
Hypertable的安装包带有一系列启停构成Hypertable集群的各种服务的脚本,你可以用自己的集群管理工具来运行这些脚步和部署新的二进制包。但是如果你还没有一个集群管理工具,建议采用Capistrano。Hypertable安装包所带的Capistrano配置文件(conf/Capfile.cluster),会使部署和启动Hypertable非常容易。
Capistrano是一个自动化远程任务运行的小工具,它用ssh来实现远程运行。为简化部署,你应该使集群中的所有机器的ssh都采用password-less访问方式(例如,采用公钥)。安装Capistrano很简单,在远程机器上,你只需运行如下命令(需要Internet访问权限):
$ sudo gem update
$ sudo gem install capistrano
After thisinstallation step you should now have the cap program in your path:
本步安装完成后,在你的路径下,你应该有如下的cap程序:
$ cap --version
Capistrano v2.9.0
Once you haveCapistrano installed, copy the conf/Capfile.cluster that comes with theHypertable distribution to your working directory (e.g. home directory) onadmin1, rename it to Capfile, and tailor it for your environment. The capcommand reads the file Capfile in the current working directory by default.There are some variables that are set at the top that you need to modify foryour particular environment. The following shows the variables at the top ofthe Capfile that need modification:
一旦你完成了Capistrano的安装,在admin1上拷贝随Hypertable带的conf/Capfile.cluster到你的工作目录(例如home目录),重新命名为Capfile,并根据你的环境裁剪它。缺省情况下cap命令读取工作目录的Capfile文件。对于你特定的环境,头部有几个变量需要修改,下面显示了这几个你需要修改的变量:
set :source_machine, "admin1"
set :install_dir, "/opt/hypertable"
set :hypertable_version, "0.9.5.5"
set :default_pkg, "/tmp/hypertable-0.9.5.5-linux-x86_64.rpm"
set :default_dfs, "hadoop"
set :default_config, "/root/hypertable.cfg"
Here's a briefdescription of each variable:
以下是每个变量的简介。
Table 2. Hypertable Capistrano Variables |
|
Variable |
Description |
source_machine |
machine from which you will build the binaries, distribute them to the other machines, and launch the service. |
install_dir |
directory on source_machine where you have installed Hypertable. It is also the directory on the remote machines where the installation will get rsync'ed to. |
hypertable_version |
version of Hypertable you are deploying |
default_pkg |
Path to binary package file (.dmg, .rpm, or .tar.bz2) on source machine |
default_dfs |
distributed file system you are running Hypertable on top of. Valid values are "local", "hadoop", "kfs", or "ceph" |
default_config |
location of the default Hypertable configuration file that you plan to use |
表2 Hypertable Capistrano变量 |
|
变量 |
描述 |
source_machine |
你要构建二进制包、分发它们并启动服务的机器 |
install_dir |
source_machine上你安装Hypertable的目录。它也是安装程序镜像到远程机器上的目录。 |
hypertable_version |
部署的Hypertable的版本 |
default_pkg |
source_machine机器上指向二进制包的路径(.dmg, .rpm, or .tar.bz2) |
default_dfs |
Hypertable运行的依赖的分布式文件系统,合法值为"local", "hadoop", "kfs",或 "ceph" |
default_config |
Hypertable缺省的配置文件的位置 |
In addition tothe above variables, you also need to define three roles, one for the machinethat will run the master processes, one for the machines that will run theHyperspace replicas, and one for the machines that will run the RangeServers.Edit the following lines:
除了以上变量,你也需要定义3个角色,一个是针对运行master有关进程的机器,一个是针对运行Hypersapce复制的机器,一个是针对运行运行RangeServer的机器,编辑以下行:
role :source, "admin1"
role :master, "master"
role :hyperspace, "hyperspace001","hyperspace002", "hyperspace003"
role :slave, "slave001", "slave002", "slave003","slave004", "slave005", "slave006","slave007", "slave008"
role :localhost, "admin1"
role :thriftbroker
role :spare
The followingtable describes each role.
下表描述每个角色。
Table 3. Hypertable Capistrano Roles |
|
Role |
Description |
source |
The machine from which you will be distributing the binaries (admin1 in this example). |
master |
The machine that will run the Hypertable master process as well as a DFS broker. Ideally this machine is high quality and somewhat lightly loaded (e.g. not running a RangeServer). Typically you would have a high quality machine running the Hypertable master, a Hyperspace replica, and the HDFS NameNode |
hyperspace |
The machines that will run Hyperspace replicas. There should be at least one machine defined for this role. The machines that take on this role should be somewhat lightly loaded (e.g. not running a RangeServer) |
slave |
The machines that will run RangeServers. Hypertable is designed to run on a filesystem like HDFS. In fact, the system works best from a performance standpoint when the RangeServers are run on the same machines as the HDFS DataNodes. This role will also launch a DFS broker and a ThriftBroker. |
localhost |
The name of the machine that you're administering the cluster from (admin1 in this example). |
thriftbroker |
Additional machines that will be running a ThriftBroker (e.g. web servers). NOTE: You do not have to add the slave machines to this role, since a ThriftBroker is automatically started on each slave machine to support MapReduce. |
spare |
Machines that will act as standbys. They will be kept current with the latest binaries. |
表3. Hypertable Capistrano角色 |
|
Role |
Description |
source |
你准备分发二进制包到其他机器的机器(本例中是admin1). |
master |
这台机器中将运行master进程和DFS代理。理想情况下,这台机器的质量很好并且是轻载的(例如它不运行RangeServer)。典型情况下,这台好质量的计算机运行Hypertable master,Hyperspace replica和HDFS NameNode。 |
hyperspace |
这台机器将运行Hyperspace replicas。至少有一台计算机被定义成这个这个角色。承担这个角色的机器应该是那种轻载的机器 (例如不运行 RangeServer) |
slave |
这台机器运行RangeServer。Hypertable被设计成可以运行在诸如HDFS之上,从最佳性能的观点,RangeServer与HDFS DataNodes应在一台机器上。这个角色也将启动DSF broker和ThriftBroker。 |
localhost |
你从这台机器管理集群中的其他机器(本例中是admin1) |
thriftbroker |
另外的运行ThriftBroker的机器(例如web server)。注:你不必将slave机器加入到这个角色,因为为支持MapReduce,ThriftBroker会在slave机器上自动启动。 |
spare |
备份机。它们保存有最新的二进制包。 |
The Hypertablebinaries can either be downloaded prepackaged, or you can compile them fromsource code. To install the prepackaged version,download the Hypertablepackage (.dmg, .rpm, or .tar.bz2) that you want to install and put itsomewhere accessible on the source machine (admin1 in this example). Modify thehypertable_version and default_pkg variables at the top of the Capfile tocontain the version of Hypertable you are installing and the absolute path tothe package file on the source machine, respectively. For example, ifyou're upgrading to version 0.9.5.5 and using the RPM package, set thevariables as follows.
Hypertable的二进制包可以是下载的预编译包或从源代码自己编译而成,为安装预编译包,下载它们(.dmg, .rpm, or .tar.bz2),并把它们放到source machine(本例中的admin1)能访问的地方,在Capfile文件头部,修改hypertable_version和default_pkg变量,使它们分别为Hypertable的版本和source机器上的包文件的绝对路径。例如,如果你安装的版本为0.9.5.5,并且使用RPM包,则变量的设置如下:
set :hypertable_version, "0.9.5.5"
set :default_pkg, "/tmp/hypertable-0.9.5.5-linux-x86_64.rpm"
To distribute andinstall the binary package on all necessary machines, issue the followingcommand. This command will cause the package to get rsync'ed to allparticipating machines and installed with the appropriate package manager (rpm,dpkg, or tar) depending on the package type.
为分发并安装二进制包到所有必要的机器上,发出如下命令。这个命令能根据包的类型(rpm, dpkg, or tar),使用合适的包管理器,使包镜像到所有参与的机器上。
$ cap install_package
If you prefercompiling the binaries from source, you can use Capistrano todistribute the binaries with rsync. On admin1 be sure Hypertable is installedin the location specified by the install_dir variable at the top of the Capfileand that the hypertable_version variable at the top of the Capfile matches theversion you are installing (/opt/hypertable and 0.9.5.5 in this example).Then distribute the binaries with the following command.
如果你喜欢从源代码编译而成的二进制包,你可以用Capistrano,采用rsync命令分发二进制包。在admin1机器上,请确保Hypertable已被安装在Capfile头部install_dir变量所指定的位置,hypertable_version变量的值与你要安装的版本一致(本例中分别为opt/hypertable和0.9.5.5),然后用下面命令分发二进制包。
$ cap dist
SeeFilesystem Hierarchy Standard for an introduction to FHS. If you're running as a user other than root,first create the directories /etc/opt/hypertable and /var/opt/hypertable on allmachines in the cluster and change ownership to the user account under whichthe binaries will be run. For example:
FHS的介绍参阅“Filesystem Hierarchy Standard”(http://hypertable.com/documentation/misc/filesystem_hierarchy_standard_fhs/)。如果你的运行账户不是root,首先在集群的所有机器上创建两个目录/etc/opt/hypertable和 /var/opt/hypertable,把它们的所有者修改成你的运行账户,例如
$ sudo cap shell
cap> mkdir /etc/opt/hypertable /var/opt/hypertable
cap> chown chris:staff /etc/opt/hypertable/var/opt/hypertable
Then FHS-ize theinstallation with the following command:
接着,用如下命令完成FHS-ize安装。
$ cap fhsize
The next step isto create a hypertable.cfg file that is specific to your deployment. Abasic hypertable.cfg file can be found in the conf/ subdirectory of yourhypertable installation which can be copied and modified as needed. Thefollowing table shows the minimum set of required and recommended propertiesthat you need to modify.
下一步是创建关于你特定部署的hypertable.cfg。在Hypertable安装目录的conf/子目录下一个基本的hypertable.cfg文件,你需要的话可以拷过来修改后使用。下表描述了你可能需要修改的最少的和推荐的属性集合。
Table 1. Recommended and Required Properties |
|
Property |
Description |
HdfsBroker.fs.default.name |
URL of the HDFS NameNode. Should match fs.default.name property of Hadoop configuration file hdfs-site.xml |
Hyperspace.Replica.Host |
Hostname of Hyperspace replica |
Hypertable.RangeServer.Monitoring.DataDirectories |
This property is optional, but recommended. It contains a list of directories that are the mount points of the HDFS data node storage volumes. By setting this property appropriately, the Hypertable monitoring system will be able to provide accurate disk usage information. |
表1.推荐和必需的属性 |
|
属性 |
描述 |
HdfsBroker.fs.default.name |
HDFS NameNode的URL,应该与Hadoop配置文件hdfs-site.xml中的属性fs.default.name一致 |
Hyperspace.Replica.Host |
Hyperspace replica的宿主名字 |
Hypertable.RangeServer.Monitoring.DataDirectories |
该属性是可选的,但推荐设置。它包含一系列目录,这些目录是HDFS数据存储卷的挂载点。合适地设置此值,Hypertable监控系统就能够提供准确的磁盘使用信息。 |
You can leave all other properties at theirdefault values. Hypertable is designed to adapt to the hardware on whichit runs and to dynamically adapt to changes in workload, so no specialconfiguration is needed beyond the basic properties listed in the above table. For example, the following shows the changes we made to thehypertable.cfg file for our test cluster.
你可以不管其他的属性,就用它们的缺省值。Hypertable能够适应它运行的硬件,并且会根据负载进行调整,所以除了上面表中列出的基本属性外,其他配置项不需要特别的值。作为例子,下面给出了hypertable.cfg中我们做的修改,它用于我们的测试集群。
HdfsBroker.fs.default.name=hdfs://master:9000
Hyperspace.Replica.Host=hyperspace001
Hyperspace.Replica.Host=hyperspace002
Hyperspace.Replica.Host=hyperspace003
Hypertable.RangeServer.Monitoring.DataDirectories="/data/1,/data/2,/data/3,/data/4"
Seehypertable-example.cfg
请参阅hypertable-example.cfg(http://www.hypertable.org/pub/hypertable-example-cfg.txt)
Once you'vecreated the hypertable.cfg file for your cluster, put it on the source machine(admin1) and set the absolute pathname referenced in the default_config Capfilevariable to point to this file (e.g. /etc/opt/hypertable/hypertable.cfg). Thendistribute the custom config files with the following command.
一旦你创建了自己集群的hypertable.cfg,把它发到source机器上(admin1),设置Capfile的变量default_config,为指向这个文件的绝度路径(例如/etc/opt/hypertable/hypertable.cfg),然后用以下命令分发这个定制的配置文件。
$ cap push_config
If you ever needto make changes to the config file, make the changes, re-run cap push_config,and then restart Hypertable (see sections 9 and 11, below).
如果你需要修改这个配置文件,修改它,然后重新用cap push_config分发它,再重启Hypertable(见第9,11节)
To make thelatest version of Hypertable referenceable from a well-known location, create a"current" link to point to the latest installation. This can beaccomplished with the following command.
为使Hypertable能从一个公开的位置得到最新的版本,建议设置一个”current”,指向最新版Hypertable的安装位置。采用如下命令可完成此任务。
$ cap set_current
The system cannotoperate correctly unless the clocks on all machines are synchronized. Use theNetwork Time Protocol (ntp) to ensure that the clocks get synchronized and remain in sync. Run the'date' command on all machines to make sure they are in sync. The followingCapistrano shell session show the output of a cluster with properlysynchronized clocks.
如果所有机器上的时钟不同步,系统将不能正确运行。采用网络时钟协议(NTP)来保证所有的时钟同步。在所有的机器上运行“date”命令来确保它们一致。下面的Capistrano对话显示了一个集群在时间同步后的输出。
cap> date
[establishing connection(s) to master, hyperspace001,hyperspace002, hyperspace003, slave001, slave002, slave003, slave004, slave005,slave006, slave007, slave008]
** [out ::master] Sat Jan 3 18:05:33 PST 2009
** [out ::hyperspace001] Sat Jan 3 18:05:33 PST2009
** [out ::hyperspace002] Sat Jan 3 18:05:33 PST2009
** [out ::hyperspace003] Sat Jan 3 18:05:33 PST2009
** [out ::slave001] Sat Jan 3 18:05:33 PST 2009
** [out ::slave002] Sat Jan 3 18:05:33 PST 2009
** [out ::slave003] Sat Jan 3 18:05:33 PST 2009
** [out ::slave004] Sat Jan 3 18:05:33 PST 2009
** [out ::slave005] Sat Jan 3 18:05:33 PST 2009
** [out ::slave007] Sat Jan 3 18:05:33 PST 2009
** [out ::slave008] Sat Jan 3 18:05:33 PST 2009
The followingcommands should be run from the directory containing the Capfile. To start allof the Hypertable servers:
下面的命令应该在包含有Capfile的目录下运行。为启动所有的Hypertable服务器,采用:
$ cap start
If you want tolaunch the service using a different config file than the default (e.g./home/chris/alternate.cfg):
如果你不想采用缺省的配置文件,而采用另一个(例如/home/chris/alternate.cfg)来启动,采用:
$ cap -S config=/home/chris/alternate.cfg start
You'll need tospecify the same config file when running Hypertable commands such as thecommand shell, for example:
运行Hypertable命令时,你可能需要指定同样的配置文件,例如,运行命令shell:
$ /opt/hypertable/current/bin/ht shell--config=/home/chris/alternate.cfg
Create a table.
创建表
echo "USE '/'; CREATE TABLE foo ( c1, c2 ); GETLISTING;" \
|/opt/hypertable/current/bin/ht shell --batch
The output ofthis command should look like:
输出应该像这样:
foo
sys (namespace)
Load some data.
载入数据
echo "USE '/'; INSERT INTO foo VALUES('001','c1', 'very'), \
('000','c1', 'Hypertable'), ('001', 'c2', 'easy'), ('000', 'c2', 'is');" \
|/opt/hypertable/current/bin/ht shell --batch
Dump the table.
导出数据
echo "USE '/'; SELECT * FROM foo;" \
|/opt/hypertable/current/bin/ht shell --batch
The output ofthis command should look like:
输出应该像这样
000 c1 Hypertable
000 c2 is
001 c1 very
001 c2 easy
To stop theservice, shutting down all servers:
为停止服务,用如下命令停止所有服务器:
$ cap stop
If you want towipe your database clean, removing all namespaces and tables:
如果你想清理数据库,删除所有空间和表,采用:
$ cap cleandb
Congratulations! Now that you have successfully installed Hypertable, we recommend thatyou walk through theHQL Tutorial to get familiar with using the system
恭喜!现在你已经成功安装了Hypertable,我们建议你继续HQL之旅。