下文中的${}中的变量需要根据实际情况进行替换
A.软件依赖
Apache Hadoop
Apache Zookeeper
Apache Kafka
Apache HBase
Apache Solr
Apache Hive
B.命令依赖
tar
zip
A.传送jar包到服务器并解压
tar -zxf apache-atlas-2.0.0-bin.tar.gz
B.集成Hbase
a.在atlas-application.properties文件中修改配置
vim atlas-application.properties
#修改atlas存储数据主机
atlas.graph.storage.hostname=${zk_hostname1}:2181,${zk_hostname2}:2181,${zk_hostname3}:2181
b. 将hbase的配置文件链接到${ATLAS_HOME}
ln -s ${HBASE_HOME}/conf/ ${ATLAS_HOME}/conf/hbase/
c. 在atlas_env.sh文件中添加HBASE_CONF_DIR
vim atlas-env.sh
export HBASE_CONF_DIR=${ATLAS_HOME}/conf/hbase/conf
C.集成Solr
a. 在atlas-application.properties文件中修改配置
vim atlas-application.properties
#修改如下配置
atlas.graph.index.search.solr.zookeeper-url=${zk_hostname1}:2181/solr
b. 实例化atlas配置文件
solrctl instancedir --create atlas ${ATLAS_HOME}/conf/solr
c. 创建collection
solrctl collection --create vertex_index -s 1 -c atlas -r 1
solrctl collection --create edge_index -s 1 -c atlas -r 1
solrctl collection --create fulltext_index -s 1 -c atlas -r 1
D.集成Kafka
a. 在atlas-application.properties文件中修改配置
vim atlas-application.properties
######### Notification Configs #########
atlas.notification.embedded=false
atlas.kafka.zookeeper.connect=${zk_hostname1}:2181,${zk_hostname2}:2181,${zk_hostname3}:2181
atlas.kafka.bootstrap.servers=${kafka_hostname1}:9092,${kafka_hostname2}:9092,${kafka_hostname2}:9092
atlas.kafka.zookeeper.session.timeout.ms=4000
atlas.kafka.zookeeper.connection.timeout.ms=2000
atlas.kafka.enable.auto.commit=true
b. 在启动好的kafka集群中创建Topic
kafka-topics --zookeeper ${zk_hostname1}:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics --zookeeper ${zk_hostname1}:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
E. Atlas配置
a.在atlas-application.properties文件中修改配置
vim atlas-application.properties
######### Server Properties #########
atlas.rest.address=http://hadoop102:21001
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.zookeeper.quorum=${zk_hostname1}:2181,${zk_hostname2}:2181,${zk_hostname3}:2181
F. 集成Hive
a. 在atlas-application.properties文件中修改配置
vim atlas-application.properties
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
b. 将atlas-application.properties文件夹复制到hive/conf目录下
cp ${ATLAS_HOME}/conf/atlas-application.properties ${HIVE_HOME}/conf
c. 将atlas-application.properties配置文件加入到atlas-plugin-classloader-2.0.0.jar中
zip -u ${ATLAS_HOME}/hook/hive/atlas-plugin-classloader-2.0.0.jar ${ATLAS_HOME}/conf/atlas-application.properties
d. 在CDH中配置hive-site
Hive Auxiliary JARs Directory |
${ATLAS_HOME}/hook/hive |
Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hive-env.sh |
HIVE_AUX_JARS_PATH=${ATLAS_HOME}/hook/hive |
HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml |
hive.exec.post.hooks org.apache.atlas.hive.hook.HiveHook atlas.cluster.name primary hive.reloadable.aux.jars.path ${ATLAS_HOME}/hook/hive |
e.在CM页面重新启动Hive相关服务
G.分发atlas软件包到各个节点
cd ${ATLAS_HOME}
bin/atlas_start.py
输入账户密码 user:admin password:admin
访问地址 http://${HOST_NAME}:21001
cd ${ATLAS_HOME}
bin/imort_hive.sh
输入账户密码 user:admin password:admin
A.查看Atlas Web UI是否能够正常运行
访问http://${HOST_NAME}:21001
B.查看Hive元数据是否正常导入
登录Atlas Web UI, 在下拉菜单选则hive db 查看hive元数据是否正常导入
C.跑一个ods层的workflow,看是否有正常
需要验证的点有:
a.Sqoop能正常从Mysql库导入数据到Hive
b.Hive能够正常跑程序
c.能够在Atlas的Web页面中看到这个 表的血缘关系
若第5步的测试无法通过且长时间无法解决存在的问题,则需要回滚hive的相关配置 并重启 Hive相关的服务,以保证线上业务的正常。