Tachyon的单机部署以及操作等

Tachyon官网:

http://www.tachyon-project.org/

官网描述:

Tachyon is amemory-centric distributed storage system enabling reliable data sharing atmemory-speed across cluster frameworks, such as Spark and MapReduce. Itachieves high performance by leveraging lineage information and using memoryaggressively. Tachyon caches working set files in memory, thereby avoidinggoing to disk to load datasets that are frequently read. This enables differentjobs/queries and frameworks to access cached files at memory speed.

Tachyon isHadoop compatible. Existing Spark and MapReduce programs can run on top of itwithout any code change. The project is open source (ApacheLicense 2.0) and isdeployed at multiple companies. It has more than 80contributors from over 30institutions, including Yahoo,IntelRedHat, and Tachyon Nexus. The project is the storage layer of theBerkeley Data Analytics Stack (BDAS) and also part of the Fedoradistribution.

Tachyon是一个以内存为中心的高容错的分布式文件存储系统,允许文件以内存的速度在集群框架中进行可靠的共享,就像Spark和 MapReduce那样。通过利用信息继承,内存侵入,Tachyon获得了高性能。Tachyon工作集文件缓存在内存中,并且让不同的 Jobs/Queries以及框架都能内存的速度来访问缓存文件。因此,Tachyon可以减少那些需要经常使用的数据集通过访问磁盘来获得的次数。

Tachyon是Hadoop兼容的,在不修改任何代码的情况下,已经存在的Spark和MapReduce程序可以在其上运行。

 

本文将介绍Tachyon的单机部署(Run Tachyon Standalone on a Single Machine)

1、 下载Tachyon,这里使用目前最新版本0.7.1

http://www.tachyon-project.org/downloads/files/0.7.1/tachyon-0.7.1-bin.tar.gz

2、 解压缩

tar -zxvftachyon-0.7.1-bin.tar.gz

3、 将解压后的目录拷贝到需要安装的路径

mvtachyon-0.7.1 /usr/local/

cd/usr/local/

 

ln -stachyon-0.7.1 tachyon

4、 配置tachyon

[root@gpmaster local]# cd tachyon/conf/

[root@gpmaster conf]# pwd

/usr/local/tachyon/conf

[root@gpmaster conf]# cptachyon-env.sh.template tachyon-env.sh

然后在tachyon-env.sh添加以下的参数:

export JAVA_HOME=/usr/java/jdk1.7.0_60

exportHADOOP_HOME=/home/hadoop/hadoop-2.6.0

exportTACHYON_HOME=/usr/local/tachyon-0.7.1

exportTACHYON_MASTER_ADDRESS=192.168.1.128

export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underFSStorage

export TACHYON_WORKER_MEMORY_SIZE=100m

export TACHYON_RAM_FOLDER=$TACHYON_HOME/ramdisk

配置文件里面有很多参数,我都使用了默认值。

[root@gpmaster conf]# cpcore-site.xml.template core-site.xml

添加如下属性:

<property>

   <name>fs.tachyon.impl</name>

   <value>tachyon.hadoop.TFS</value>

</property>

5、 格式化tachyon

[root@gpmaster tachyon]# pwd

/usr/local/tachyon

[root@gpmaster tachyon]# ./bin/tachyonformat

Connecting to localhost as root...

Warning: Permanently added 'localhost'(RSA) to the list of known hosts.

Formatting Tachyon Worker @ gpmaster

Connection to localhost closed.

Formatting Tachyon Master @ 192.168.1.128

[root@gpmaster tachyon]#

然后我们查看日志信息:

[root@gpmaster tachyon]# cat logs/user.log

2015-10-07 21:39:14,440 INFO  USER_LOGGER (Format.java:formatFolder) -FormattingJOURNAL_FOLDER:/usr/local/tachyon-0.7.1/journal/

2015-10-07 21:39:14,446 INFO  USER_LOGGER (Format.java:formatFolder) -FormattingUNDERFS_DATA_FOLDER:/usr/local/tachyon-0.7.1/underFSStorage/tmp/tachyon/data

2015-10-07 21:39:14,486 INFO  USER_LOGGER (Format.java:formatFolder) -FormattingUNDERFS_WORKERS_FOLDER:/usr/local/tachyon-0.7.1/underFSStorage/tmp/tachyon/workers

可以看到格式化成功了。

6、 启动Tachyon

[root@gpmaster tachyon]# ./bin/tachyon-start.sh local

Killed 0 processes on gpmaster

Killed 0 processes on gpmaster

Connecting to localhost as root...

Killed 0 processes on gpmaster

Connection to localhost closed.

Formatting RamFS: /usr/local/tachyon-0.7.1/ramdisk(100mb)

Starting master @ 192.168.1.128

Starting worker @ gpmaster

查看后台进程:

[root@gpmaster tachyon-0.7.1]# jps

2810 TachyonMaster

2865 Jps

2833 TachyonWorker

因为我搭建的单机模式,所以Master和Worker进程处于一个节点。

注释:tachyon-start.shlocal命令将同时在本地启动Master和Worker进程。需要注意的是,运行tachyon-start.sh local命令一定要拥有切换root的密码,否者会无法启动。这是因为RamFS的格式化需要root权限。

 

7、 通过页面验证Tachyon运行状态

http://192.168.1.128:19999/

截图如下:



8、 运行测试用例

[root@gpmaster tachyon-0.7.1]# ./bin/tachyon runTest Basic CACHE_THROUGH

/default_tests_files/BasicFile_CACHE_THROUGH has been removed

2015-10-07 21:50:43,933 INFO  (MasterClient.java:connect) - Tachyon client (version 0.7.1) is tryingto connect with master @ gpmaster/192.168.1.128:19998

2015-10-07 21:50:43,979 INFO  (MasterClient.java:connect) - User registered with the master @gpmaster/192.168.1.128:19998; got UserId 4

2015-10-07 21:50:44,015 INFO  (CommonUtils.java:printTimeTakenMs) - createFile with fileId 3 took 99ms.

2015-10-07 21:50:44,063 INFO  (WorkerClient.java:connect) - Trying to get local worker host : gpmaster

2015-10-07 21:50:44,090 INFO  (WorkerClient.java:connect) - Connecting local worker @gpmaster/192.168.1.128:29998

2015-10-07 21:50:44,227 INFO  (BlockOutStream.java:get) - Writing with local stream. tachyonFile:/default_tests_files/BasicFile_CACHE_THROUGH, blockIndex: 0, opType:CACHE_THROUGH

2015-10-07 21:50:44,329 INFO  (CommonUtils.java:createBlockPath) - Folder /usr/local/tachyon-0.7.1/ramdisk/tachyonworker/4was created!

2015-10-07 21:50:44,350 INFO  (LocalBlockOutStream.java:<init>) -/usr/local/tachyon-0.7.1/ramdisk/tachyonworker/4/3221225472 was created!tachyonFile: /default_tests_files/BasicFile_CACHE_THROUGH, blockIndex: 0,blockId: 3221225472, blockCapacityByte: 536870912

2015-10-07 21:50:44,465 INFO  (CommonUtils.java:printTimeTakenMs) - writeFile to file/default_tests_files/BasicFile_CACHE_THROUGH took 449 ms.

2015-10-07 21:50:44,578 INFO  (CommonUtils.java:printTimeTakenMs) - readFile file/default_tests_files/BasicFile_CACHE_THROUGH took 105 ms.

Passed the test!

 

再执行更全面的完整性检查的测试用例:

[root@gpmaster tachyon-0.7.1]# ./bin/tachyonrunTests

通过页面查看:

Tachyon的单机部署以及操作等_第1张图片

可以看到内存中有很多块的信息。

 

9、 停止Tachyon

[root@gpmaster tachyon-0.7.1]# ./bin/tachyon-stop.sh

Killed 1 processes on gpmaster

Killed 1 processes on gpmaster

Connecting to localhost as root...

Killed 0 processes on gpmaster

Connection to localhost closed.

10、        Tachyon命令行操作

Tachyon提供了命令行工具为用户提供了简单的交互功能,使用方式为:

[[email protected]]# bin/tachyon tfs

Usage: javaTFsShell

       [cat <path>]

       [count <path>]

       [ls <path>]

       [lsr <path>]

       [mkdir <path>]

       [rm <path>]

       [rmr <path>]

       [tail <path>]

       [touch <path>]

       [mv <src> <dst>]

       [copyFromLocal <src><remoteDst>]

       [copyToLocal <src> <localDst>]

       [fileinfo <path>]

       [location <path>]

       [report <path>]

       [request <tachyonaddress><dependencyId>]

       [pin <path>]

       [unpin <path>]

       [free <file path|folder path>]

       [getUsedBytes]

       [getCapacityBytes]

              [du <path>]

可以看出很多操作方式和HDFS相似,下面我们举几个例子:

(1)  创建目录:

[root@gpmaster tachyon-0.7.1]# bin/tachyon tfs mkdir/mytachyon

(2)  上传文件,将当期目录下面的NOTICE文件上传到内存分布式目录下面:

[root@gpmaster tachyon-0.7.1]# bin/tachyon tfs copyFromLocal NOTICE /mytachyon/

(3)  查看文件内容

[root@gpmaster tachyon-0.7.1]# bin/tachyon tfs cat /mytachyon/NOTICE

 

我们通过页面来查看一下:

Tachyon的单机部署以及操作等_第2张图片

可以看到我们创建的目录,继续点击目录查看文件信息:

Tachyon的单机部署以及操作等_第3张图片

此时查看到的文件都处于内存中,如果你继续点击文件就可以看到文件内容了,此处不截图了。

至于更加详细的操作和管理等内容,后续再单独整理文档。

 

11、        Tachyon的配置

Tachyon中的可配置项分为两类:

一种是系统环境变量,用于在不同脚本间共享配置信息;

另一种是程序运行参数,通过-D选项传入运行Tachyon的JVM中。程序运行参数又分为通用配置(Common Configuration)、TachyonMaster配置(Master Configuration)、TachyonWorker配置(WorkerConfiguration)和用户配置(User Configuration)。要修改或添加这些可配置项,请修改conf/tachyon-env.sh文件。

 

Tachyon环境变量

l  JAVA_HOME:系统中java的安装路径

l  TACHYON_RAM_FOLDER:配置ramfs挂载的文件目录,默认为/mnt/ramdisk。

l  TACHYON_MASTER_ADDRESS:启动TachyonMaster的地址,默认为localhost,所以在单机模式下不用更改

l  TACHYON_UNDERFS_ADDRESS:Tachyon使用的底层文件系统的路径,本地文件系统(单机模式下),如"/tmp/tachyon",或HDFS,如"hdfs://ip:port"

l  TACHYON_WORKER_MEMORY_SIZE:每个TachyonWorker使用的RamFS大小,默认为1GB

l  TACHYON_UNDERFS_HDFS_IMPL:设置使用的HDFS实现接口,比如

com.mapr.fs.MapRFileSystem或org.apache.hadoop.hdfs.DistributedFileSystem

 

程序运行参数的配置比较多,可以参考官网说明进行配置和调优

http://www.tachyon-project.org/documentation/Configuration-Settings.html

 

 

 

你可能感兴趣的:(Tachyon的单机部署以及操作等)