CDH集群各组件配置

目录

技术组件

Flume

Hive

Hue

Spark

Yarn

其他


3台4核32G机器

技术组件

CDH集群各组件配置_第1张图片

Flume

代理名称  agent1

配置文件

# 设置代理命令
agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1

# 设置数据源
agent1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
agent1.sources.r1.batchSize = 1000
agent1.sources.r1.batchDurationMillis = 1000
agent1.sources.r1.kafka.bootstrap.servers = xx.xx.xx.xx:9092,xx.xx.xx.xx:9092,xx.xx.xx.xx:9092
agent1.sources.r1.kafka.consumer.group.id = sync-bigdata
agent1.sources.r1.kafka.topics.regex = ^[a-zA-Z0-9\\-]+-sync-bigdata$
agent1.sources.r1.kafka.consumer.request.timeout.ms = 80000
agent1.sources.r1.kafka.consumer.fetch.max.wait.ms=7000
agent1.sources.r1.kafka.consumer.session.timeout.ms = 70000
agent1.sources.r1.kafka.consumer.heartbeat.interval.ms = 60000
agent1.sources.r1.kafka.consumer.enable.auto.commit = false
# 设置拦截器
agent1.sources.r1.interceptors= i1
agent1.sources.r1.interceptors.i1.type=com.cn.bigdata.flume.FlumeInterceptor$Builder

# 设置通道类型及缓存设置
agent1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
agent1.channels.c1.kafka.bootstrap.servers = xx.xx.xx.xx:9092,xx.xx.xx.xx:9092,xx.xx.xx.xx:9092
agent1.channels.c1.kafka.topic = kafka-channel-hdfs
agent1.channels.c1.kafka.consumer.group.id = kafka-channel
agent1.channels.c1.kafka.consumer.request.timeout.ms = 80000
agent1.channels.c1.kafka.consumer.fetch.max.wait.ms=7000
agent1.channels.c1.kafka.consumer.session.timeout.ms = 70000
agent1.channels.c1.kafka.consumer.heartbeat.interval.ms = 60000
agent1.channels.c1.kafka.consumer.enable.auto.commit = false

# 设置sink目的地
agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = hdfs://cdh-xxx-xxx-hue/user/hive/warehouse/ods_tmp_t.db/o_flume_kafka_data_origin/dt=%{eventDate}
agent1.sinks.k1.hdfs.filePrefix = log_%Y%m%d_%H
agent1.sinks.k1.hdfs.fileType=DataStream
agent1.sinks.k1.hdfs.rollCount = 0
agent1.sinks.k1.hdfs.rollSize = 134217728
agent1.sinks.k1.hdfs.rollInterval = 600
agent1.sinks.k1.hdfs.batchSize = 100
agent1.sinks.k1.hdfs.threadsPoolSize = 10
agent1.sinks.k1.hdfs.idleTimeout = 0
agent1.sinks.k1.hdfs.minBlockReplicas = 1
agent1.sinks.k1.hdfs.useLocalTimeStamp = true
agent1.sinks.k1.hdfs.timeZone = Asia/Shanghai

# 将数据源和目的地绑定到通道上
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

Hive

Hive 辅助 JAR 目录

/etc/hive/auxlib

hive-site.xml 的 Hive 服务高级配置代码段(安全阀)


    hive.spark.client.future.timeout
    1800
    Timeout for requests from Hive client to remote Spark driver.


    hive.spark.client.connect.timeout
    30000
    Timeout for remote Spark driver in connecting back to Hive client.


    hive.spark.client.server.connect.timeout
    300000


    hive.txn.manager
    org.apache.hadoop.hive.ql.lockmgr.DbTxnManager


    hive.compactor.initiator.on
    true


    hive.compactor.worker.threads
    1


    hive.support.concurrency
    true


    hive.enforce.bucketing
    true


    hive.exec.dynamic.partition.mode
    nonstrict
    支持动态分区


    hive.exec.dynamic.partition
    true


    hive.warehouse.subdir.inherit.perms
    false


    hive.exec.stagingdir
    /tmp/hive/.hive-staging


    HIVE_AUXLIB_JARS_PATH
    /etc/hive/auxlib
    hive辅助jar包存放目录

默认:

CDH集群各组件配置_第2张图片

实际:

CDH集群各组件配置_第3张图片

CDH集群各组件配置_第4张图片

元数据字符集设置

①. 进入mysql ,查看hive 数据库当前编码

show create database hive ;

②  如果是utf8 则执行下面sql将 hive 数据库默认编码改成 latin1

alter database hive default character set latin1 ;

③ 执行下面sql ,修改 表/字段/分区/索引 等部分的注释字符集

use hive;
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;

UDF及扩展

CDH集群各组件配置_第5张图片

  •   将这3个包放入到对应hdfs目录下
  •   将json-serde包放入到服务器目录/etc/hive/auxlib/
  •   create function json_array as 'com.cn.bigdata.hive.func.JsonArray' using jar "hdfs:///tmp/udf/lib/json-array-1.0-SNAPSHOT.jar";
  • 将flume-interceptor包放入到服务器目录/opt/cloudera/parcels/CDH/lib/flume-ng/lib
  •   重启hive、flume

 Hue

CDH集群各组件配置_第6张图片

 Sentry

CDH集群各组件配置_第7张图片

CDH集群各组件配置_第8张图片

 

CDH集群各组件配置_第9张图片

 CDH集群各组件配置_第10张图片

 Spark

CDH集群各组件配置_第11张图片

Yarn

CDH集群各组件配置_第12张图片

CDH集群各组件配置_第13张图片

其他

所有日志相关目录,前面均价上/data (原 /var/log/flume-ng)

你可能感兴趣的:(大数据,cdh,hadoop,kafka,big,data,java,cdh,组件)