spark1.5.1 支持 tachyon0.7.1
jdk需要1.7
1.spark
下载spark source
http://spark.apache.org/downloads.html
编译spark
export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m"
mvn -Dhadoop.version=2.3.0 -DskipTests clean package
spark-env.sh
因为需要访问hdfs,hive,所以需要压缩lzo,和mysql
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/data/hadoop/hadoop-2.3.0-cdh5.1.0/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/data/hadoop/hadoop-2.3.0-cdh5.1.0/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/data/hadoop/spark/spark-1.5.1-bin-hadoop2.3/lib/mysql-connector-java-5.1.20-bin.jar
export SPARK_PID_DIR=/data/hadoop/spark/spark-1.5.1-bin-hadoop2.3
写slave文件
host136
host137
host138
添加配置文件到conf里:
core-site.xml
hdfs-site.xml
hive-site.xml
修改hive-site.xml
hive.metastore.warehouse.dir
tachyon://host136:19998/warehouse
2.tachyon
下载tachyon,http://www.tachyon-project.org/
编译tachyon
mvn clean package -Djava.version=1.7 -Dhadoop.version=2.3.0 -DskipTests
修改配置:tachyon-env.sh
export TACHYON_RAM_FOLDER=/data/hadoop/spark/tachyon-0.7.1-2.3/data
//内存数据mount目录
fi
if [ -z "$JAVA_HOME" ]; then
export JAVA_HOME="$(dirname $(which java))/.."
fi
export JAVA="${JAVA_HOME}/bin/java"
export TACHYON_MASTER_ADDRESS=host136
#hdfs
export TACHYON_UNDERFS_ADDRESS=hdfs://mycluster
#-Dtachyon.master.journal.folder=hdfs://mycluster/tachyon/journal/
#export TACHYON_UNDERFS_ADDRESS=${TACHYON_UNDERFS_ADDRESS:-hdfs://localhost:9000}
#使用内存大小
export TACHYON_WORKER_MEMORY_SIZE=20GB
export TACHYON_UNDERFS_HDFS_IMPL=org.apache.hadoop.hdfs.DistributedFileSystem
export TACHYON_WORKER_MAX_WORKER_THREADS=2048
export TACHYON_MASTER_MAX_WORKER_THREADS=2048
export TACHYON_SSH_FOREGROUND="yes"
export TACHYON_WORKER_SLEEP="0.02"
可以配置多level数据
export TACHYON_JAVA_OPTS+="
-Dlog4j.configuration=file:$CONF_DIR/log4j.properties
-Dtachyon.debug=false
-Dtachyon.worker.tieredstore.level.max=1
-Dtachyon.worker.tieredstore.level0.alias=MEM
-Dtachyon.worker.tieredstore.level0.dirs.path=$TACHYON_RAM_FOLDER
-Dtachyon.worker.tieredstore.level0.dirs.quota=$TACHYON_WORKER_MEMORY_SIZE
-Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS
-Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL
-Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data
-Dtachyon.worker.max.worker.threads=$TACHYON_WORKER_MAX_WORKER_THREADS
-Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/workers
-Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE
-Dtachyon.worker.data.folder=/tachyonworker/
-Dtachyon.master.max.worker.threads=$TACHYON_MASTER_MAX_WORKER_THREADS
-Dtachyon.master.worker.timeout.ms=60000
-Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS
-Dtachyon.master.journal.folder=$TACHYON_HOME/journal/
-Dorg.apache.jasper.compiler.disablejsr199=true
-Djava.net.preferIPv4Stack=true
"
写slaves,workers
host136
host137
host138
添加配置文件到conf:
hdfs-site.xml
core-site.xml
在core-site.xml中添加
fs.tachyon.impl
tachyon.hadoop.TFS
fs.tachyon-ft.impl
tachyon.hadoop.TFSFT
3启动服务:
tachyon:
需要用户有sudo权限,或是用root执行tachyon的mount操作
如果不想每次启动Tachyon都挂载一次RamFS,可以先使用命令 bin/tachyon-mount.sh Mount workers 或 bin/tachyon-mount.sh SudoMount workers 挂载好所有RamFS,然后使用 bin/tachyon-start.sh all NoMount 命令启动Tachyon。
bin/tachyon format
bin/tachyon-mount.sh SudoMount workers
bin/tachyon-start.sh all NoMount
bin/tachyon-stop.sh
当添加一个tachyon节点
bin/tachyon formatWorker
bin/tachyon-mount.sh SudoMount local
bin/tachyon-start.sh worker NoMount
spark:
bin/spark-sql --master spark://host136:7077
bin/spark-shell --master spark://host136:7077
4.spark操作tachyon
用spark直接通过tachyon读取hdfs数据:
var textFile = sc.textFile("tachyon://host136:19998/hdfspath");
用spark在tachyon上建hive表:
CREATE EXTERNAL TABLE `test4`(
`aaa` string,
`ccc` string,
`bbb` string)
PARTITIONED BY (
`day_id` string,
`hour_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'tachyon://host136:19998/tmp/storm2'
页面:
http://tachyonmasterIP:19999/home
http://sparkIP:8099/