1. 单机:安装于单机,采用本地文件系统
2. Hadoop:分布式安装,在Hadoop之上(HDFS)
3. MapR:分布式安装,在MapR之上
4. ThriftBroker:在应用服务器上安装ThriftBroker】
MapR is a scalable filesystem written in C++ thatis modeled after the Google File System and is 100% API compatible withApache Hadoop. Hypertable ships with a MapR filesystem broker thatcommunicates with the MapR servers directly and allows Hypertable toefficienly run on top of the MapR filesystem for maximumperformance. This document describes how to get Hypertable up andrunning on top of the MapR filesystem.
MapR是一个用C++编写的采用GFS结构模型的可扩展的文件系统,它的API100%与Apache Hadoop兼容。Hypertable带着一个MapR文件系统的代理,它可以直接与MapR服务器通讯,因此Hypertable可以最高效地运行在MapR之上。本文描述怎样使Hypertable运行在MapR之上。
Step 1 - Install MapR
Step 2 - Install Capistrano
Step 3 - Edit Capistrano Capfile
Step 4 - Install Hypertable Binaries
Step 5 - FHS-ize Installation
Step 6 - Create and Distributehypertable.cfg
Step 7 - Set "current" link
Step 8 - Synchronize Clocks
Step 9 - Start Hypertable
Step 10 - Verify Installation
Step 11 - Stop Hypertable
What Next?
Before you getstarted with the installation, there are some general system requirements thatneed to be satisfied before proceeding. These requirements are describedin the following list.
The first step ingetting Hypertable up and running on top of MapR is to install MapR andconfigure it to work with Hypertable. The Hypertable 64-bit packagescontain a MapR broker that allows Hypertable to run on top of the MapRfilesystem (MapR is currently unavailable for 32-bit platforms). FollowtheMapRInstallation Instructions and be sure to co-locate the MapR mfsprocesses with the Hypertable RangeServer machines.
To configure MapRto work properly with Hypertable, you will need to tell thewardenprocess to launch the mfs processes with a smaller heap size than thedefault. To do this, add the following line to warden configuration file(/opt/mapr/conf/warden.conf):
启动Hypertable并运行在MapR上的第一步是安装MapR,并配置它,使之与Hypertable一起工作。Hypertable的64位包中包含了一个MapR代理,允许Hypertable运行在MapR文件系统之上(MapR目前没有在32位平台上运行的版本)。遵循MapR安装指南(http://www.mapr.com/doc/display/MapR/Installation+Guide)并保证MapR的mfs进程与Hypertable RangeServer在同一机器上。
Then launch theMapR filesystem. Once you have it up and running, create a volumecalled "hypertable" and mount it at /hypertable, use the following:
$ maprcli volume create -name hypertable -path/hypertable
The Hypertable distribution comes with anumber of scripts to start and stop the various servers that make up aHypertable cluster. You can use your own cluster management tool to launchthese scripts and deploy new binaries. However, if you're not already using acluster management tool, we recommendCapistrano. The distribution comes with a Capistrano config file(conf/Capfile.cluster) that makes deploying and launching Hypertable a breeze.
Capistrano is a simple tool for automatingthe remote execution of tasks. It uses ssh to do the remote execution. To easedeployment, you should have password-less ssh access (i.e. public key) to allof the machines in your cluster. Installing Capistrano is pretty simple. Onmost systems you just need to execute the following command (Internet accessrequired):
$ sudo gem update
$ sudo gem install capistrano
After this installation step you should nowhave the cap program in your path:
$ cap --version
Capistrano v2.9.0
Once you have Capistrano installed, copy theconf/Capfile.cluster that comes with the Hypertable distribution to yourworking directory (e.g. home directory) on admin1, rename it to Capfile, andtailor it for your environment. The cap command reads the file Capfile in thecurrent working directory by default. There are some variables that are set atthe top that you need to modify for your particular environment. The followingshows the variables at the top of the Capfile that need modification:
set :source_machine, "admin1"
set :install_dir, "/opt/hypertable"
set :hypertable_version, ""
set :default_pkg, "/tmp/hypertable-"
set :default_dfs, "mapr"
set :default_config, "/root/hypertable.cfg"
Here's a brief description of each variable:
Table 2. Hypertable Capistrano Variables |
Variable |
Description |
source_machine |
machine from which you will build the binaries, distribute them to the other machines, and launch the service. |
install_dir |
directory on source_machine where you have installed Hypertable. It is also the directory on the remote machines where the installation will get rsync'ed to. |
hypertable_version |
version of Hypertable you are deploying |
default_pkg |
Path to binary package file (.dmg, .rpm, or .tar.bz2) on source machine |
default_dfs |
distributed file system you are running Hypertable on top of. For MapR use "mapr' |
default_config |
location of the default Hypertable configuration file that plan to use |
表2 Hypertable Capistrano变量 |
变量 |
描述 |
source_machine |
你要构建二进制包、分发它们并启动服务的机器 |
install_dir |
source_machine上你安装Hypertable的目录。它也是安装程序镜像到远程机器上的目录。 |
hypertable_version |
部署的Hypertable的版本 |
default_pkg |
source_machine机器上指向二进制包的路径(.dmg, .rpm, or .tar.bz2) |
default_dfs |
Hypertable运行的依赖的分布式文件系统,对MapR,为”mapr” |
default_config |
Hypertable缺省的配置文件的位置 |
In addition to the above variables, you alsoneed to define three roles, one for the machine that will run the masterprocesses, one for the machines that will run the Hyperspace replicas, and onefor the machines that will run the RangeServers. Edit the following lines:
role :source, "admin1"
role :master, "master"
role :hyperspace, "hyperspace001","hyperspace002", "hyperspace003"
role :slave, "slave001", "slave002", "slave003","slave004", "slave005", "slave006","slave007", "slave008"
role :localhost, "admin1"
role :thriftbroker
role :spare
The following table describes each role.
Table 3. Hypertable Capistrano Roles |
Role |
Description |
source |
The machine from which you will be distributing the binaries (admin1 in this example). |
master |
The machine that will run the Hypertable master process as well as a DFS broker. Ideally this machine is high quality and somewhat lightly loaded (e.g. not running a RangeServer). Typically you would have a high quality machine running the Hypertable master, a Hyperspace replica, and the HDFS NameNode |
hyperspace |
The machines that will run Hyperspace replicas. There should be at least one machine defined for this role. The machines that take on this role should be somewhat lightly loaded (e.g. not running a RangeServer) |
slave |
The machines that will run RangeServers. Hypertable is designed to run on a filesystem like HDFS. In fact, the system works best from a performance standpoint when the RangeServers are run on the same machines as the HDFS DataNodes. This role will also launch a DFS broker and a ThriftBroker. |
localhost |
The name of the machine that you're administering the cluster from (admin1 in this example). |
thriftbroker |
Additional machines that will be running a ThriftBroker (e.g. web servers). NOTE: You do not have to add the slave machines to this role, since a ThriftBroker is automatically started on each slave machine to support MapReduce. |
spare |
Machines that will act as standbys. They will be kept current with the latest binaries. |
表3. Hypertable Capistrano角色 |
Role |
Description |
source |
你准备分发二进制包到其他机器的机器(本例中是admin1). |
master |
这台机器中将运行master进程和DFS代理。理想情况下,这台机器的质量很好并且是轻载的(例如它不运行RangeServer)。典型情况下,这台好质量的计算机运行Hypertable master,Hyperspace replica和HDFS NameNode。 |
hyperspace |
这台机器将运行Hyperspace replicas。至少有一台计算机被定义成这个这个角色。承担这个角色的机器应该是那种轻载的机器 (例如不运行 RangeServer) |
slave |
这台机器运行RangeServer。Hypertable被设计成可以运行在诸如HDFS之上,从最佳性能的观点,RangeServer与HDFS DataNodes应在一台机器上。这个角色也将启动DSF broker和ThriftBroker。 |
localhost |
你从这台机器管理集群中的其他机器(本例中是admin1) |
thriftbroker |
另外的运行ThriftBroker的机器(例如web server)。注:你不必将slave机器加入到这个角色,因为为了支持MapReduce,ThriftBroker会在slave机器上自动启动。 |
spare |
备份机。它们保存有最新的二进制包。 |
The Hypertable binaries can either bedownloaded prepackaged, or you can compile them from source code. To installthe prepackaged version,download the Hypertablepackage (.dmg, .rpm, or .tar.bz2) you want to install and put it somewhereaccessible on the source machine (admin1 in this example). Modify thehypertable_version and default_pkg variables at the top of the Capfile tocontain the version of Hypertable you are installing and the absolute path tothe package file on the source machine, respectively. For example, ifyou're upgrading to version and using the RPM package, set thevariables as follows.
Hypertable的二进制包可以是下载的预编译包或从源代码自己编译而成,为安装预编译包,下载它们(.dmg, .rpm, or .tar.bz2),并把它们放到source machine(本例中的admin1)能访问的地方,在Capfile文件头部,修改hypertable_version和default_pkg变量,使它们分别为Hypertable的版本和source机器上的包文件的绝对路径。例如,如果你安装的版本为0.9.5.5,并且使用RPM包,则变量的设置如下:
set :hypertable_version, ""
set :default_pkg, "/tmp/hypertable-"
To distribute and install the binary packageon all necessary machines, issue the following command. This command willcause the package to get rsync'ed to all participating machines and installedwith the appropriate package manager (rpm, dpkg, or tar) depending on thepackage type.
为分发并安装二进制包到所有必要的机器上,发出如下命令。这个命令能根据包的类型(rpm, dpkg, or tar),使用合适的包管理器,使包镜像到所有参与的机器上。
$ cap install_package
If you prefer compiling the binariesfrom source, you can use Capistrano to distribute the binaries with rsync.On admin1 be sure Hypertable is installed in the location specified by theinstall_dir variable at the top of the Capfile and that the hypertable_versionvariable at the top of the Capfile matches the version you are installing(/opt/hypertable and in this example). Then distribute thebinaries with the following command:
$ cap dist
See Filesystem Hierarchy Standard for an introduction to FHS. If you're running as a user other than root,first create the directories /etc/opt/hypertable and /var/opt/hypertable on allmachines in the cluster and change ownership to the user account under whichthe binaries will be run. For example:
FHS的介绍参阅“Filesystem Hierarchy Standard”(http://hypertable.com/documentation/misc/filesystem_hierarchy_standard_fhs/)。如果你的运行账户不是root,首先在集群的所有机器上创建两个目录/etc/opt/hypertable和 /var/opt/hypertable,把它们的所有者修改成你的运行账户,例如
$ sudo cap shell
cap> mkdir /etc/opt/hypertable /var/opt/hypertable
cap> chown chris:staff /etc/opt/hypertable/var/opt/hypertable
Then FHS-ize the installation with the following command:
$ cap fhsize
The next step is to create a hypertable.cfgfile that is specific to your deployment. A basic hypertable.cfg file canbe found in the conf/ subdirectory of your hypertable installation which can becopied and modified as needed. The following table shows the minimum set ofrequired and recommended properties that you need to modify.
Table 1. Recommended and Required Properties |
Property |
Description |
Hyperspace.Replica.Host |
Hostname of Hyperspace replica |
Hypertable.RangeServer.Monitoring.DataDirectories |
This property is optional, but recommended. It contains a list of directories that are the mount points of the HDFS data node storage volumes. By setting this property appropriately, the Hypertable monitoring system will be able to provide accurate disk usage information. |
表1.推荐和必需的属性 |
属性 |
描述 |
Hyperspace.Replica.Host |
Hyperspace replica的宿主名字 |
Hypertable.RangeServer.Monitoring.DataDirectories |
该属性是可选的,但推荐设置。它包含一系列目录,这些目录是HDFS数据存储卷的挂载点。合适地设置此值,Hypertable监控系统就能够提供准确的磁盘使用信息。 |
You can leave all other properties at theirdefault values. Hypertable is designed to adapt to the hardware on whichit runs and to dynamically adapt to changes in workload, so no specialconfiguration is needed beyond the basic properties listed in the above table. For example, the following shows the changes we made to thehypertable.cfg file for our test cluster.
See hypertable-example.cfg
Once you've created the hypertable.cfg filefor your cluster, put it on the source machine (admin1) and set the absolutepathname referenced in the default_config Capfile variable to point to thisfile (e.g. /root/hypertable.cfg). Then distribute the custom config files withthe following command.
$ cap push_config
If you ever need to make changes to theconfig file, make the changes, re-run cap push_config, and then restartHypertable (see sections 9 and 11, below).
如果你需要修改这个配置文件,修改它,然后重新用cap push_config分发它,再重启Hypertable(见第9,11节)
To make the latest version of Hypertablereferenceable from a well-known location, create a "current" link topoint to the latest installation. This can be accomplished with thefollowing command.
$ cap set_current
The system cannot operate correctly unlessthe clocks on all machines are synchronized. Use theNetwork TimeProtocol (ntp) to ensure that the clocks get synchronizedand remain in sync. Run the 'date' command on all machines to make sure theyare in sync. The following Capistrano shell session show the output of acluster with properly synchronized clocks.
cap> date
[establishing connection(s) to master, hyperspace001,hyperspace002, hyperspace003, slave001, slave002, slave003, slave004, slave005,slave006, slave007, slave008]
** [out ::master] Sat Jan 3 18:05:33 PST 2009
** [out ::hyperspace001] Sat Jan 3 18:05:33 PST2009
** [out ::hyperspace002] Sat Jan 3 18:05:33 PST2009
** [out ::hyperspace003] Sat Jan 3 18:05:33 PST2009
** [out ::slave001] Sat Jan 3 18:05:33 PST 2009
** [out ::slave002] Sat Jan 3 18:05:33 PST 2009
** [out ::slave003] Sat Jan 3 18:05:33 PST 2009
** [out ::slave004] Sat Jan 3 18:05:33 PST 2009
** [out ::slave005] Sat Jan 3 18:05:33 PST 2009
** [out ::slave007] Sat Jan 3 18:05:33 PST 2009
** [out ::slave008] Sat Jan 3 18:05:33 PST 2009
The following commands should be run from thedirectory containing the Capfile. To start all of the Hypertable servers:
$ cap start
If you want to launch the service using adifferent config file than the default (e.g. /home/chris/alternate.cfg):
$ cap -S config=/home/chris/alternate.cfg start
You'll need to specify the same config filewhen running Hypertable commands such as the command shell, for example:
$ /opt/hypertable/current/bin/ht shell--config=/home/chris/alternate.cfg
Create a table.
echo "USE '/'; CREATE TABLE foo ( c1, c2 ); GETLISTING;" \
|/opt/hypertable/current/bin/ht shell --batch
The output of this command should look like:
sys (namespace)
Load some data.
echo "USE '/'; INSERT INTO foo VALUES('001','c1', 'very'), \
('000','c1', 'Hypertable'), ('001', 'c2', 'easy'), ('000', 'c2', 'is');" \
|/opt/hypertable/current/bin/ht shell --batch
Dump the table.
echo "USE '/'; SELECT * FROM foo;" \
| /opt/hypertable/current/bin/htshell --batch
The output of this command should look like:
000 c1 Hypertable
000 c2 is
001 c1 very
001 c2 easy
To stop theservice, shutting down all servers:
$ cap stop
If you want towipe your database clean, removing all namespaces and tables:
$ cap cleandb
Congratulations! Now that you have successfully installed Hypertable, we recommend thatyou walk through theHQL Tutorial to get familiar with using the system