carbondata 1.1.0安装文档

1.概念

carbondata是一种带索引的列型数据格式,用于大数据快速分析平台例如:hadoop、spark等。说白了:一种数据格式!

2. build CarbonData

由于CarbonData刚刚开源,目前官方文档不规范并且较少。

2.1 先决条件:

OS: centos(类unix OS) 
Apache Maven(推荐版本3.3或更高版本) 
Oracle Java 7或8 
Apache Thrift 0.9.3 
以上条件缺一不可 

2.2 下载

git 下载carbondata
官网下载已经发行的版本。
地址:https://dist.apache.org/repos/dist/release/carbondata/1.1.0/

3. 构建命令

进入到cabondata 的目录下,运行构建命令
构建无需测试,默认情况下carbondata采用Spark 1.6.2进行构建

mvn -DskipTests clean package(默认)

carbondata也支持使用不同版本的Spark构建(目前支持的版本有一下几个)。


mvn -DskipTests -Pspark-1.5 -Dspark.version=1.5.1 clean package
mvn -DskipTests -Pspark-1.5 -Dspark.version=1.5.2 clean package
mvn -DskipTests -Pspark-1.6 -Dspark.version=1.6.1 clean package 
mvn -DskipTests -Pspark-1.6 -Dspark.version=1.6.2 clean package 
mvn -DskipTests -Pspark-1.6 -Dspark.version=1.6.3 clean package    
mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package

4.开始编译

cc@lcc carbondata-parent-1.1.0$ pwd
/Users/lcc/soft/carbondata/carbondata-parent-1.1.0
lcc@lcc carbondata-parent-1.1.0$ mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package
...
[INFO] Reactor Summary:
[INFO]
[INFO] Apache CarbonData :: Parent ........................ SUCCESS [  6.080 s]
[INFO] Apache CarbonData :: Common ........................ SUCCESS [ 13.184 s]
[INFO] Apache CarbonData :: Core .......................... SUCCESS [ 29.356 s]
[INFO] Apache CarbonData :: Processing .................... SUCCESS [ 10.520 s]
[INFO] Apache CarbonData :: Hadoop ........................ SUCCESS [  7.743 s]
[INFO] Apache CarbonData :: Spark Common .................. SUCCESS [01:45 min]
[INFO] Apache CarbonData :: Spark2 ........................ SUCCESS [01:58 min]
[INFO] Apache CarbonData :: Spark Common Test ............. SUCCESS [ 46.705 s]
[INFO] Apache CarbonData :: Assembly ...................... SUCCESS [  6.791 s]
[INFO] Apache CarbonData :: Spark2 Examples ............... SUCCESS [ 19.808 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:05 min
[INFO] Finished at: 2018-09-26T16:08:08+08:00
[INFO] Final Memory: 92M/810M
[INFO] ------------------------------------------------------------------------

lcc@lcc carbondata-parent-1.1.0$ ll assembly/target/scala-2.11/
total 19512
drwxr-xr-x  3 lcc  staff       96  9 26 16:07 ./
drwxr-xr-x  5 lcc  staff      160  9 26 16:07 ../
-rw-r--r--  1 lcc  staff  9986219  9 26 16:07 carbondata_2.11-1.1.0-shade-hadoop2.2.0.jar

直接编译成功了

5.复制安装

  1. 复制./assembly/target/scala-2.1x/carbondata_xxx.jarSPARK_HOME/carbonlib文件夹。 注意:如果carbonlib文件夹在SPARK_HOME路径中不存在,则创建它。
  2. 在Spark类路径中添加carbonlib文件夹路径。(SPARK_HOME/conf/spark-env.sh文件并修改SPARK_CLASSPATH附加SPARK_HOME/carbonlib/*到现有值的值)
  3. ./conf/carbon.properties.template文件从CarbonData存储库复制到文件$SPARK_HOME/conf/夹,并将文件重命名为carbon.properties。
  4. 在集群的所有节点中重复步骤2到步骤5。
  5. 在Spark节点[master]中,配置$SPARK_HOME/conf/spark-defaults.conf文件中下表中提到的属性。
    spark.driver.extraJavaOptions
    -Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties
    额外传递给驱动程序的JVM选项。例如,GC设置或其他日志记录。

spark.executor.extraJavaOptions
-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties
额外传递给executors的JVM选项。例如,GC设置或其他日志记录。注意:您可以输入以空格分隔的多个值。

lcc@lcc carbondata-parent-1.1.0$ mkdir /Users/lcc/soft/spark/spark-2.0.1-bin-hadoop2.7/carbonlib
lcc@lcc carbondata-parent-1.1.0$ cp assembly/target/scala-2.11/carbondata_2.11-1.1.0-shade-hadoop2.2.0.jar /Users/lcc/soft/spark/spark-2.0.1-bin-hadoop2.7/carbonlib
lcc@lcc carbondata-parent-1.1.0$
lcc@lcc carbondata-parent-1.1.0$ cp conf/carbon.properties.template  /Users/lcc/soft/spark/spark-2.0.1-bin-hadoop2.7/conf/carbon.properties

lcc@lcc spark-2.0.1-bin-hadoop2.7$ vim conf/spark-env.sh
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
HADOOP_CONF_DIR=/Users/lcc/soft/hadoop/hadoop/etc/hadoop
SCALA_HOME=/Users/lcc/soft/scala/scala-2.12.6

SPARK_MASTER_HOST=lcc
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080

SPARK_WORKER_CORES=1
SPQRK_WORKER_MEMORY=1000m
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=512M
SPARK_WORKER_INSTANCES=1

#spark.executor.extraClassPath
#spark.driver.extraClassPath=/Users/lcc/IdeaProjects/spark-authorizer/spark-auth/target/*

SPARK_CLASSPATH=$SPARK_HOME/carbonlib/*

lcc@lcc spark-2.0.1-bin-hadoop2.7$ vim conf/spark-defaults.conf
spark.driver.extraJavaOptions="-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties"
spark.executor.extraJavaOptions="-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties"


lcc@lcc spark-2.0.1-bin-hadoop2.7$ vi conf/carbon.properties
carbon.storelocation=hdfs://lcc:9000/Opt/CarbonStore

6. 验证安装

直接执行

lcc@lcc spark-2.0.1-bin-hadoop2.7$ spark-shell
lcc@lcc spark-2.0.1-bin-hadoop2.7$ spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/lcc/soft/spark/spark-2.0.1-bin-hadoop2.7/carbonlib/carbondata_2.11-1.1.0-shade-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/lcc/soft/spark/spark-2.0.1-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/26 17:34:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/26 17:34:44 WARN SparkConf:
SPARK_CLASSPATH was detected (set to '/Users/lcc/soft/spark/spark-2.0.1-bin-hadoop2.7/carbonlib/*').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath

18/09/26 17:34:44 WARN SparkConf: Setting 'spark.executor.extraClassPath' to '/Users/lcc/soft/spark/spark-2.0.1-bin-hadoop2.7/carbonlib/*' as a work-around.
18/09/26 17:34:44 WARN SparkConf: Setting 'spark.driver.extraClassPath' to '/Users/lcc/soft/spark/spark-2.0.1-bin-hadoop2.7/carbonlib/*' as a work-around.
18/09/26 17:34:44 WARN SparkConf:
SPARK_WORKER_INSTANCES was detected (set to '1').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --num-executors to specify the number of executors
 - Or set SPARK_EXECUTOR_INSTANCES
 - spark.executor.instances to configure the number of instances in the spark config.

18/09/26 17:34:44 WARN Utils: Your hostname, lcc resolves to a loopback address: 127.0.0.1; using 192.168.1.184 instead (on interface en0)
18/09/26 17:34:44 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/09/26 17:34:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/09/26 17:34:46 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://192.168.1.184:4041
Spark context available as 'sc' (master = local[*], app id = local-1537954485827).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.1
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession

scala> import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.CarbonSession._

scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://lcc:9000/Opt/CarbonStore")
18/09/26 17:36:04 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
carbon: org.apache.spark.sql.SparkSession = org.apache.spark.sql.CarbonSession@708dfe10

scala>

重点看上面出现了。红色的字体,就对了,我没有开启hadoop,也没用开启hive,后面的没有继续测试

参考:
https://blog.csdn.net/u013181284/article/details/73331170
这里说编译需要连接,实际上不需要

你可能感兴趣的:(数据格式-carbondata)