Docker下的Apache-Atlas元数据治理组件安装

Docker下的Apache-Atlas元数据治理组件安装

  • 一、Docker镜像制作
    • 组件版本
    • 一、安装JDK
    • 二、安装MAVEN
    • 安装zookeeper
    • 安装hadoop
    • 安装hbase,在hive之前
    • 安装hive
    • 安装Kafka
    • 安装solr
    • 三、安装atlas
    • 集成hbase
    • 完结撒花!!!
  • 二、Docker镜像启动
      • 1. 加载镜像
      • 2. 创建容器
      • 3. 快速启动
      • 4. 访问端口
  • 附录
    • 1. 常用命令集合

操作环境:

  • 操作系统:Windows10
  • Docker Desktop:4.10.1
  • Docker version: 20.10.17, build 100c701

一、Docker镜像制作

组件版本

组件名称 组件版本

Hadoop 3.2.1
Hive 3.1.2
Hbase 2.3.4
Zookeeper 3.5.9
Kafka 2.6.2
Solr 7.4.0
Atlas 2.1.0
jdk 1.8
python 2.7
Maven 3.6.3

步骤一
在三个节点中执行下面命令,生产密钥文件

ssh-keygen

执行命令后会要求确认密钥文件的存储位置(默认~/.ssh/),这个过程直接按“Enter”键即可,id_rsa是本机私钥文件,id_rsa.pub是本机公钥文件

步骤二
分别在三个节点中执行下面命令:

ssh-copy-id hadoop01
ssh-copy-id hadoop02
ssh-copy-id hadoop03

这个过程会要求输入yes或者no,这里直接输入yes,然后输入主机密码

步骤三
在各节点用以下命令测试ssh免密登录

ssh hadoop01
ssh hadoop02
ssh hadoop03

安装noded
1.下载解压
wget https://cdn.npm.taobao.org/dist/node/v12.16.2/node-v12.16.2-linux-x64.tar.xz
tar -xf node-v12.16.2-linux-x64.tar.xz
cd node-v12.16.2-linux-x64/bin
./node -v

2.添加环境变量
export PATH= P A T H : PATH: PATH:NODE_HOME/bin

一、安装JDK

# 1.下载解压jdk到指定目录(先创建好目录)
tar -zxvf {file-dir}/jdk-8u341-linux-x64.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录

# 2.配置环境变量
vim /etc/profile
export JAVA_HOME=/root/environments/jdk1.8.0_341
export PATH=$PATH:$JAVA_HOME/bin

# 3.刷新使环境变量生效
source /etc/profile

# 4.验证
java -version

二、安装MAVEN

maven下载地址:https://dlcdn.apache.org/maven/maven-3/

# 1.下载解压maven到指定目录(先创建好目录)
tar -zxvf {file-dir}/apache-maven-3.6.3-bin.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录

# 2.配置环境变量
export MVN_HOME=/root/environments/apache-maven-3.6.3
export PATH=$PATH:$MVN_HOME/bin

# 3.刷新使环境变量生效
source /etc/profile

# 4.验证
mvn -version 

# 5.配置maven仓库地址
vim $MVN_HOME/conf/settings.xml

	<mirror>
    	<id>alimavenid>
    	<name>aliyun mavenname>
    	<url>http://maven.aliyun.com/nexus/content/groups/public/url>
    	<mirrorOf>centralmirrorOf>
	mirror>

    <mirror>
        <id>repo1id>
        <mirrorOf>centralmirrorOf>
        <name>Human Readable Name for this Mirror.name>
        <url>https://repo1.maven.org/maven2/url>
    mirror>

    <mirror>
        <id>repo2id>
        <mirrorOf>centralmirrorOf>
        <name>Human Readable Name for this Mirror.name>
        <url>https://repo2.maven.org/maven2/url>
    mirror>

maven在调配置文件的时候优先调用的是/root/.m2/(隐藏目录)下的内容,创建/root/.m2目录一个然后将配置文件复制过去

mkdir /root/.m2
cp $MVN_HOME/conf/settings.xml /root/.m2/

安装顺序zookeeper ,hadoop,hbase,hive,kafka,solr,atlas

安装zookeeper

所有组件版本可在apache的仓库中找到https://archive.apache.org/dist/hbase/,国内镜像缺少很多版本,多为稳定版

 
tar -zxvf {file-dir}/apache-zookeeper-3.5.9-bin.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录

cd /root/environments/zookeeper-3.4.6/conf
将zoo_sample.cfg拷贝一份
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/root/environments/zookeeper-3.4.6/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888

创建环境变量

export ZK_HOME=/root/environments/zookeeper-3.4.6
export PATH=$PATH:$ZK_HOME/bin
source /etc/profile
创建data文件
mkdir /root/environments/zookeeper-3.4.6/data
cd /root/environments/zookeeper-3.4.6/data
touch myid && echo "1" > myid

然后将/root/environments/zookeeper-3.4.6整个文件夹拷贝到hadoop02、hadoop03并配置环境变量

scp -r /root/environments/zookeeper-3.4.6 hadoop02:/root/environments/
scp -r /root/environments/zookeeper-3.4.6 hadoop03:/root/environments/

并修改hadoop02、hadoop03机器上的/root/environments/zookeeper-3.4.6/data/myid文件(#不一样 ---------- 01≠02≠03

hadoop02   2
hadoop03   3
3台机器上分别启动zk
zkServer.sh start

zkServer.sh status 查看状态

安装hadoop

1.解压

tar -zxvf {file-dir}/hadoop-3.1.1.tar.gz  -C /root/environments/ # {file-dir}为存放安装包的目录

2.加入环境变量

vi /etc/profile
#tip:在文件末尾追加
export HADOOP_HOME=/root/environments/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# 使配置文件生效
source /etc/profile

#测试
hadoop version

3.需要编辑的文件都在/root/environments/hadoop-3.1.1/etc/hadoop目录下



<configuration>
    
    <property>
        <name>fs.defaultFSname>
        <value>hdfs://myclustervalue>
    property>

    
    <property>
        <name>hadoop.tmp.dirname>
        <value>/data/hadoopvalue>
    property>

    
    <property>
        <name>hadoop.http.staticuser.username>
        <value>rootvalue>
    property>

    
    <property>
        <name>ha.zookeeper.quorumname>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181value>
    property>
    <property>
        <name>dfs.permissions.enabledname>
        <value>falsevalue>
    property>

    <property>
        <name>hadoop.proxyuser.root.hostsname>
        <value>*value>
    property>

    <property>
        <name>hadoop.proxyuser.root.groupsname>
        <value>*value>
    property>
configuration>
vi hadoop-env.sh

export JAVA_HOME=/root/environments/jdk1.8.0_341
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_ZKFC_USER="root"
export HDFS_JOURNALNODE_USER="root"

hdfs-site.xml
其中还设置hadoop01,hadoop02为NN()

<configuration>
	<property>
       <name>dfs.replicationname>
       <value>2value>
   property>
   <property>
        <name>dfs.permissions.enabledname>
        <value>falsevalue>
   property>
   
   <property>
       <name>dfs.nameservicesname>
       <value>myclustervalue>
   property>
   
   <property>
        <name>dfs.ha.namenodes.myclustername>
        <value>nn1,nn2value>
   property>
   
   <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1name>
        <value>hadoop01:8020value>
   property>
   <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2name>
        <value>hadoop02:8020value>
   property>
 
   <property>
        <name>dfs.namenode.http-address.mycluster.nn1name>
        <value>hadoop01:9870value>
   property>
   <property>
        <name>dfs.namenode.http-address.mycluster.nn2name>
        <value>hadoop02:9870value>
   property>
   
   <property>
        <name>dfs.namenode.shared.edits.dirname>
        <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/myclustervalue>
   property>
   
   <property>
        <name>dfs.journalnode.edits.dirname>
        <value>/data/hadoop/ha-hadoop/journaldatavalue>
   property>
	
   <property>
        <name>dfs.ha.automatic-failover.enabledname>
        <value>truevalue>
   property>
   
   <property>
        <name>dfs.client.failover.proxy.provider.myclustername>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
   property>
   
   <property>
        <name>dfs.ha.fencing.methodsname>
        <value>
                sshfence
                shell(/bin/true)
        value>
   property>
   
   <property>
        <name>dfs.ha.fencing.ssh.private-key-filesname>
        <value>/root/.ssh/id_rsavalue>
   property>
   
   <property>
        <name>dfs.ha.fencing.ssh.connect-timeoutname>
        <value>30000value>
   property>
configuration>

mapred-env.sh

export JAVA_HOME=/root/environments/jdk1.8.0_341

mapred-site.xml

<configuration>
     
    <property>
        <name>mapreduce.framework.namename>
        <value>yarnvalue>
    property>

    
    <property>
        <name>mapreduce.jobhistory.addressname>
        <value>hadoop01:10020value>
    property>

    
    <property>
        <name>mapreduce.jobhistory.webapp.addressname>
        <value>hadoop01:19888value>
    property>

    <property>
      <name>mapreduce.application.classpathname>
      <value>
                /root/environments/hadoop-3.1.1/etc/hadoop,
                /root/environments/hadoop-3.1.1/share/hadoop/common/*,
                /root/environments/hadoop-3.1.1/share/hadoop/common/lib/*,
                /root/environments/hadoop-3.1.1/share/hadoop/hdfs/*,
                /root/environments/hadoop-3.1.1/share/hadoop/hdfs/lib/*,
                /root/environments/hadoop-3.1.1/share/hadoop/mapreduce/*,
                /root/environments/hadoop-3.1.1/share/hadoop/mapreduce/lib/*,
                /root/environments/hadoop-3.1.1/share/hadoop/yarn/*,
                /root/environments/hadoop-3.1.1/share/hadoop/yarn/lib/*
      value>
    property>
configuration>

yarn-env.sh

export JAVA_HOME=/root/environments/jdk1.8.0_341

yarn-site.xml
其中还设置hadoop01,hadoop02为RM

<configuration>

    
    <property>
        <name>yarn.resourcemanager.ha.enabledname>
        <value>truevalue>
    property>

    
    <property>
        <name>yarn.resourcemanager.cluster-idname>
        <value>cluster1value>
    property>

    
    <property>
        <name>yarn.resourcemanager.ha.rm-idsname>
        <value>rm1,rm2value>
    property>

    
    <property>
        <name>yarn.resourcemanager.hostname.rm1name>
        <value>hadoop01value>
    property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2name>
        <value>hadoop02value>
    property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm1name>
        <value>hadoop01:8088value>
    property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm2name>
        <value>hadoop02:8088value>
    property>

    
    <property>
        <name>yarn.resourcemanager.zk-addressname>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181value>
    property>
    <property>
        <name>yarn.nodemanager.aux-servicesname>
        <value>mapreduce_shufflevalue>
    property>

    <property>
        <name>yarn.log-aggregation-enablename>
        <value>truevalue>
    property>

    <property>
        <name>yarn.log-aggregation.retain-secondsname>
        <value>86400value>
    property>

    
    <property>
        <name>yarn.resourcemanager.recovery.enabledname>
        <value>truevalue>
    property>

    
    <property>
        <name>yarn.resourcemanager.store.classname>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStorevalue>
    property>

    
    <property>
        <name>yarn.nodemanager.vmem-check-enabledname>
        <value>falsevalue>
    property>

    <property>
        <name>yarn.nodemanager.vmem-pmem-rationame>
        <value>5value>
    property>

configuration>

workers

hadoop01
hadoop02
hadoop03

hadoop3有权限问题,为避免因权限问题造成的启动失败,在如下文件添加指定用户

vim /root/environments/hadoop-3.1.1/sbin/start-dfs.sh
vim /root/environments/hadoop-3.1.1/sbin/stop-dfs.sh

添加
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs # 已过时系统建议使用 HADOOP_SECURE_DN_USER
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
vim /root/environments/hadoop-3.1.1/sbin/start-yarn.sh
vim /root/environments/hadoop-3.1.1/sbin/stop-yarn.sh

添加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn # 已过时系统建议使用 HADOOP_SECURE_DN_USER
YARN_NODEMANAGER_USER=root

启动Zookeeper->JournalNode->格式化NameNode->创建命名空间zkfs->NameNode->Datanode->ResourceManager->NodeManager

3台机器上启动JournalNode

3台机器上启动JournalNode
cd /root/environments/hadoop-3.1.1/sbin/
./hadoop-daemon.sh start journalnode  启动journalnode

在hadoop01上执行格式化namenode
同步hadoop02的配置(#不一样 ---------- 01=02≠03

#在hadoop01上执行
hadoop namenode -format
#将/data/hadoop/dfs/name目录下的内容拷贝到备用namenode主机
 
#如果备用namenode主机没有该目录就创建一个
scp -r /data/hadoop/dfs/name hadoop02:/data/hadoop/dfs/name/

格式化zkfc,在两个namenode主机上进行zkfc的格式化(#不一样 ---------- 01=02≠03

./hdfs zkfc -formatZK

关闭JournalNode

#3台机器上关闭JournalNode
cd /root/environments/hadoop-3.1.1/sbin/
./hadoop-daemon.sh stop journalnode

启动hadoop

#在hadoop01机器上执行:
start-all.sh

安装hbase,在hive之前

tar -xzvf hbase-2.0.2-bin.tar.gz -C /root/environments/
 

hbase-env.sh

export JAVA_HOME=/root/environments/jdk1.8.0_341
export HBASE_CLASSPATH=/root/environments/hadoop-3.1.1/etc/hadoop
export HBASE_MANAGES_ZK=false # 使用自己安装的zookeeper。 一定要加这个,不使用自带的zookeeper,否则自己的zookeeper就无法启动了

hbase-site.xml

<configuration>
	
	<property>
	        <name>hbase.rootdirname>
	        <value>hdfs://mycluster/hbasevalue>
	property>
	<property>
	        <name>hbase.mastername>
	        <value>8020value>
	property>
	
	<property>
	        <name>hbase.zookeeper.quorumname>
	        <value>hadoop01,hadoop02,hadoop03value>
	property>
	<property>
	        <name>hbase.zookeeper.property.clientProtname>
	        <value>2181value>
	property>
	<property>
	        <name>hbase.zookeeper.property.dataDirname>
	        <value>/root/environments/zookeeper-3.4.6/confvalue>
	property>
	<property>
	        <name>hbase.tmp.dirname>
	        <value>/var/hbase/tmpvalue>
	property>
	<property>
	        <name>hbase.cluster.distributedname>
	        <value>truevalue>
	property>
	<property>
	    <name>hbase.cluster.distributedname>
	    <value>truevalue>
	property>
	
	
configuration>

regionservers

hadoop01
hadoop02
hadoop03

Hbase启动高可用需要编辑文件backup-masters(里面添加备用的HMaster的主机)

vim backup-masters

hadoop03

配置环境变量

export HBASE_HOME=/root/environments/hbase-2.0.2
export PATH=$PATH:$HBASE_HOME/bin
source /etc/profile

拷贝到其他节点

scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/hbase-2.0.2 hadoop02:/root/environments/
scp -r /root/environments/hbase-2.0.2 hadoop03:/root/environments/

在 HMaster 节点启动,想让谁做HMaster 就在谁上面启动,本例中适合在hadoop01或hadoop02上启动。因为hadoop03是备用HMaster

start-hbase.sh

yarn rmadmin -getAllServiceState
查看http://hadoop03:16010/master-status

安装hive

mysql安装
略

tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /app
mv apache-hive-3.1.2-bin apache-hive-3.1.2

需要编辑的文件都在/root/environments/apache-hive-3.1.0/conf目录下

vi hive-env.sh
export HADOOP_HOME=/root/environments/hadoop-3.1.1
export HIVE_CONF_DIR=/root/environments/apache-hive-3.1.0/conf

hive-site.xml

<configuration>
	
	<property>
		<name>javax.jdo.option.ConnectionURLname>
		<value>jdbc:mysql://mysql57:3307/hive?createDatabaseIfNotExist=true&useSSL=falsevalue>
	property>
	
	
	<property>
		<name>javax.jdo.option.ConnectionDriverNamename>
		<value>com.mysql.jdbc.Drivervalue>
	property>
	
	
	<property>
		<name>javax.jdo.option.ConnectionUserNamename>
		<value>rootvalue>
	property>
	<property>
		<name>javax.jdo.option.ConnectionPasswordname>
		<value>123456value>
	property>
	
	<property>
		<name>hive.metastore.warehouse.dirname>
		<value>/user/hive/warehousevalue>
	property>
	
	<property>
		<name>hive.exec.scratchdirname>
		<value>/user/hive/tmpvalue>
	property>
	
	
	<property>
		<name>hive.querylog.locationname>
		<value>/user/hive/logvalue>
	property>
	
	
	
	<property>
	  <name>hive.metastore.localname>
	  <value>falsevalue>
	property>
	<property>
		<name>hive.metastore.urisname>
		<value>thrift://hadoop01:9083value>
	property>
	
	<property>
		<name>hive.server2.thrift.portname>
		<value>10000value>
	property>
	<property>
		<name>hive.server2.thrift.bind.hostname>
		<value>0.0.0.0value>
	property>
	<property>
		<name>hive.server2.webui.hostname>
		<value>0.0.0.0value>
	property>
	
	
	<property>
		<name>hive.server2.webui.portname>
		<value>10002value>
	property>
	
	<property>
		<name>hive.server2.long.polling.timeoutname>
		<value>5000value>
	property>
	
	<property>
		<name>hive.server2.enable.doAsname>
		<value>truevalue>
	property>
	
	<property>
		<name>datanucleus.autoCreateSchemaname>
		<value>falsevalue>
	property>
	
	<property>
		<name>datanucleus.fixedDatastorename>
		<value>truevalue>
	property>
	
	<property>
		<name>hive.execution.enginename>
		<value>mrvalue>
	property>
configuration>

将mysql的驱动jar包上传到hive的lib目录下
https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20

配置环境变量

export HIVE_HOME=/root/environments/apache-hive-3.1.0
export PATH=$PATH:$HIVE_HOME/bin

刷新

source /etc/profile

初始化hive的元数据库

schematool -dbType mysql -initSchema

启动hive的matestore(重要 不知道为什么依赖hbase,应该是我看错了)

hive --service metastore 
hive --service metastore & #后台启动

使用ps查看metastore服务是否起来

ps -ef | grep metastore # ps -ef表示查看全格式的全部进程。 -e 显示所有进程。-f 全格式。-h 不显示标题。-l 长格式。-w 宽输出

进入hive进行验证

hive
命令: create database filetest;
show databases;
切换filetest数据库:use filetest;

将/app/hive目录进行分发(目的是所有机器都可以使用hive,不需要修改任何配置)

scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/apache-hive-3.1.0  hadoop02:/root/environments/
scp -r /root/environments/apache-hive-3.1.0  hadoop03:/root/environments/

并刷新

source /etc/profile

安装Kafka

 tar -xzvf kafka_2.12-2.0.0.tgz -C /root/environments/
 #需要编辑的文件都在/app/kafka/config目录下

修改server.properties中的

broker.id=1
zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181

修改zookeeper.properties(未做修改)

dataDir=/home/hadoop/data/zookeeper/zkdata
clientPort=2181

修改consumer.properties(未做修改)

zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181

修改producer.properties(未做修改)

metadata.broker.list=hadoop01:9092,hadoop02:9092,hadoop03:9092

配置环境变量

export KAFKA_HOME=/root/environments/kafka_2.12-2.0.0
export PATH=$PATH:$KAFKA_HOME/bin

刷新

source /etc/profile

将/app/kafka文件分发到其余的机器并修改kafka_2.12-2.0.0/config/server.properties文件中的broker.id的值 (#不一样 ---------- 01≠02≠03

scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/kafka_2.12-2.0.0 hadoop02:/root/environments/
scp -r /root/environments/kafka_2.12-2.0.0 hadoop03:/root/environments/

并刷新

source /etc/profile
vim /root/environments/kafka_2.12-2.0.0/config/server.properties
hadoop02    2
hadoop03    3

kafka 群起脚本

for i in hadoop102 hadoop103 hadoop104
do
echo "========== $i ==========" 
ssh $i '/opt/module/kafka/bin/kafka-server-start.sh -daemon 
/opt/module/kafka/config/server.properties'
done

各自三台机器启动kafka

#3台机器分别启动kafka
后台启动:
kafka-server-start.sh -daemon /root/environments/kafka_2.12-2.0.0/config/server.properties

http://hadoop01:8048

1)查看当前服务器中的所有 topic

kafka-topics.sh --zookeeper hadoop01:2181 --list

2)创建 topic(后面分发部署好集群后会同步消息)

kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK

Alienware^Kafka基础笔记

kafka-topics.sh --zookeeper hadoop01:2181 --create --replication-factor 3 --partitions 1 --topic first # 

选项说明:
–topic 定义 topic 名
–replication-factor 定义副本数
–partitions 定义分区数

3)删除 topic

[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --delete --topic first

需要 server.properties 中设置 delete.topic.enable=true 否则只是标记删除。

4)发送消息

[root@hadoop102 kafka]$ bin/kafka-console-producer.sh --broker-list hadoop102:9092 --topic first
>hello world

5)消费消息

[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --zookeeper hadoop102:2181 --topic first
[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first
[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --from-beginning --topic first
--from-beginning:会把主题中以往所有的数据都读取出来。

6)查看某个 Topic 的详情

[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --describe --topic first

7)修改分区数

[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --alter --topic first --partitions 6

安装solr

解压

 tar -xzvf solr-7.5.0.tgz -C /root/environments/

需要编辑的文件都在/app/solr/bin目录下

solr.in.sh

ZK_HOST="hadoop01:2181,hadoop02:2181,hadoop03:2181"
SOLR_HOST="hadoop01"
export SOLR_HOME=/root/environments/solr-7.5.0
export PATH=$PATH:$SOLR_HOME/bin
source /etc/profile

注:配置环境变量会出大问题!!!!!

# 别人的
16:42:39.035 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /opt/xxx/solr-6.5.1/server/solr
16:42:39.099 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading solr.xml from SolrHome (not found in ZooKeeper)
16:42:39.100 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container configuration from /opt/xxx/solr-6.5.1/server/solr/solr.xml
16:42:39.413 INFO  (main) [   ]
# 我的
2022-07-23 10:55:51.469 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /root/environments/solr-7.5.0
2022-07-23 10:55:51.638 INFO  (zkConnectionManagerCallback-2-thread-1) [   ] o.a.s.c.c.ConnectionManager zkClient has connected
2022-07-23 11:29:34.848 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading solr.xml from SolrHome (not found in ZooKeeper)
2022-07-23 11:29:34.854 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container configuration from /root/environments/solr-7.5.0/solr.xml
2022-07-23 11:29:34.859 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could not start Solr. Check solr/home property and the logs
2022-07-23 11:29:34.903 ERROR (main) [   ] o.a.s.c.SolrCore null:org.apache.solr.common.SolrException: solr.xml does not exist in /root/environments/solr-7.5.0 cannot start Solr
将/root/environments/solr-7.5.0文件分发到其余的机器并修改/root/environments/solr-7.5.0/bin/solr.in.sh文件中的SOLR_HOST的值

scp -r /root/environments/solr-7.5.0 hadoop02:/root/environments/
scp -r /root/environments/solr-7.5.0 hadoop03:/root/environments/

修改/root/environments/solr-7.5.0/bin/solr.in.sh文件中的SOLR_HOST的值(#不一样 ---------- 01≠02≠03

vim /root/environments/solr-7.5.0/bin/solr.in.sh
hadoop02    hadoop02
hadoop03    hadoop03

3台机器分别启动solr

# 一定要到目录执行,不要设置环境变量!!会导致后面的solr.solr.home目录错误“/root/environments/solr-7.5.0/”,变成你设置的环境变量,而对的是/root/environments/solr-7.5.0/server/solr
cd /root/environments/solr-7.5.0/bin
./solr start -force
# 或者
/root/environments/solr-7.5.0/bin/solr start -force
# 查看状态
cd /root/environments/solr-7.5.0/bin
./solr status
# 或者
/root/environments/solr-7.5.0/bin/solr status
# 

下面就成功了

“cloud”:{
“ZooKeeper”:“hadoop01:2181,hadoop02:2181,hadoop03:2181”,
“liveNodes”:“3”,
“collections”:“3”}}
或者访问 http://localhost:8983/solr/ ,有cloud菜单说明集群成功

三、安装atlas

atlas下载地址:https://atlas.apache.org/#/Downloads

# 解压atlas压缩包
tar -zxvf {file-dir}/apache-atlas-2.1.0-sources.tar.gz  -C /root/environments/ # {file-dir}为存放安装包的目录

编辑项目的顶级pom.xml文件,修改各个组件的版本,

# 进入atlas根目录,修改pom.xml文件
cd /root/environments/apache-atlas-sources-2.1.0/
vim pom.xml

主要修改如下安装组件对应版本,由于本此安装均是对照这里定义的版本安装的,因此不做修改

这里是引用需要修改的代码部分(网上资料说需要修改该部分代码,我已修改并成功运行,目前只测试了hive的hook,没有遇到任何问题,不知道不修改会怎样,)

反正我没改这里

vim /root/environments/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java

577行
将:
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;
改为:
String catalogName = null;
vim /root/environments/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java

81行
将:
this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
改为:
this.metastoreHandler = null;

进行编译

cd /root/environments/apache-atlas-sources-2.1.0/

打包:(使用外部hbase和solr的打包方式,这里不考虑使用atlas自带的)
mvn clean -DskipTests package -Pdist -X

注:编译过程中可能会遇到报错,基本都是因为网络的问题,重试即可解决,如若重试也没有解决jar包的下载问题,可手动下载缺失的jar,放到本地maven仓库后重新打包。

遇到问题一:nodejs下载失败
收到拷贝到下载目录C:\Users\shuch\Downloads\node-12.16.0-linux-x64.tar.gz hadoop01:/root/.m2/repository/com/github/eirslett/node/12.16.0/
问题二:依赖于GitHub上面的包下载失败
设置代理或者修改hosts

# localhost name resolution is handled within DNS itself.# 127.0.0.1 localhost# ::1 localhost20.205.243.166 github.com

# GitHub Start
140.82.114.4 github.com
199.232.69.194 github.global.ssl.fastly.net
199.232.68.133 raw.githubusercontent.com
# GitHub End

编译完成后的atlas存放位置

cd /root/environments/apache-atlas-sources-2.1.0/distro/target

apache-atlas-2.1.0-bin.tar.gz 就是我们所需要的包

解压

 tar -xzvf apache-atlas-2.1.0-bin.tar.gz

需要编辑的文件在/root/environments/apache-atlas-2.1.0/conf

cd /root/environments/apache-atlas-2.1.0/conf

atlas-env.sh

#indicates whether or not a local instance of HBase should be started for Atlas
export MANAGE_LOCAL_HBASE=false

# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLR=false

# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRA=false

# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCH=false
export JAVA_HOME=/root/environments/jdk1.8.0_341
export HBASE_CONF_DIR=/root/environments/hbase-2.0.2/conf

atlas-application.properties (这里给出全部内容,只集成了hive作为测试,如若有其他组件的需要,进行组件的安装与atlas hook的配置即可)

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000

#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index
atlas.graph.index.search.backend=solr

#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.search.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true

# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  Import Configs  #########
#atlas.import.temp.directory=/temp/import

#########  Notification Configs  #########
# atlas.notification.embedded=true 使用内嵌的kafka
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.kafka.bootstrap.servers=hadoop01:9092,hadoop02:9092,hadoop03:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=true
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/[email protected]
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/[email protected]

#########  Server Properties  #########
atlas.rest.address=http://hadoop01:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=hadoop01:2181,hadoop02:2181,hadoop03:2181


#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=:
#atlas.server.ha.zookeeper.auth=:



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query..
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.=


#########  UI Configuration ########

atlas.ui.default.version=v1


######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary

集成hbase

注册hook 编辑 hbase-site.xml

vi /root/environments/hbase-2.0.2/conf/hbase-site.xml

添加以下配置

<property>
    <name>hbase.coprocessor.master.classesname>
    <value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessorvalue>
property>

同步其他节点

scp /root/environments/hbase-2.0.2/conf/hbase-site.xml hadoop02:/root/environments/hbase-2.0.2/conf/
scp /root/environments/hbase-2.0.2/conf/hbase-site.xml hadoop03:/root/environments/hbase-2.0.2/conf/

引入依赖

# 将文件atlas-application.properties压缩进atlas下的hook/hbase/hbase-bridge-shim-2.1.0.jar包里
zip -u /root/environments/apache-atlas-2.1.0/hook/hbase/hbase-bridge-shim-2.1.0.jar  /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties

# 然后将atlas的hook/hbase/* 拷贝至所有节点安装的hbase的lib目录下
cp  -r /root/environments/apache-atlas-2.1.0/hook/hbase/* /root/environments/hbase-2.0.2/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hbase/* hadoop02:/root/environments/hbase-2.0.2/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hbase/* hadoop03:/root/environments/hbase-2.0.2/lib/

引入配置
atlas-application.properties文件添加配置

vi /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties
######### hbase Hook Configs #######
atlas.hook.hbase.synchronous=false 
atlas.hook.hbase.numRetries=3 
atlas.hook.hbase.queueSize=10000 

然后将atlas-application.properties文件拷贝到hbase/conf/

# 然后将atlas-application.properties拷贝至所有节点安装的hbase的conf目录下,一行一行地运行,不要全部复制,会出问题!!!
cd  /root/environments/apache-atlas-2.1.0/conf/ # !!不要忘了进到这个目录
cp  ./atlas-application.properties /root/environments/hbase-2.0.2/conf/
scp ./atlas-application.properties hadoop02:/root/environments/hbase-2.0.2/conf/
scp ./atlas-application.properties hadoop03:/root/environments/hbase-2.0.2/conf/
# 编辑atlas属性文件
vi atlas-application.properties

# 修改atlas存储数据主机
atlas.graph.storage.hostname=hadoop01:2181,hadoop02:2181,hadoop03:2181

# 建立软连接
ln -s /root/environments/hbase-2.0.2/conf/ /root/environments/apache-atlas-2.1.0/conf/hbase/
cp /root/environments/hbase-2.0.2/conf/* /root/environments/apache-atlas-2.1.0/conf/hbase/ # 看不懂这操作

# 添加HBase配置文件路径
vi /root/environments/apache-atlas-2.1.0/conf/atlas-env.sh

export HBASE_CONF_DIR=/root/environments/hbase-2.0.2/conf

集成solr

cp  -r /root/environments/apache-atlas-2.1.0/conf/solr  /root/environments/solr-7.5.0/
cd /root/environments/solr-7.5.0/
mv solr/  atlas-solr
scp -r ./atlas-solr/  hadoop02:/root/environments/solr-7.5.0/
scp -r ./atlas-solr/  hadoop03:/root/environments/solr-7.5.0/


# 重启solr
./solr stop -force
./solr start -force

# 查看状态
./solr status
# 或者访问 http://localhost:8983/solr/ ,有cloud菜单说明集群成功

在solr中创建索引
./solr create -c vertex_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force
./solr create -c edge_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force
./solr create -c fulltext_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force

如果以上创建错误,可以使用命令“solr delete -c ${collection_name}”删除重新创建。

kafka相关操作

在kafka中创建相关topic
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK

集成hive

# 将文件atlas-application.properties压缩进atlas下的hook/hive/hive-bridge-shim-2.1.0.jar包里
zip -u /root/environments/apache-atlas-2.1.0/hook/hive/hive-bridge-shim-2.1.0.jar  /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties

# 然后将atlas的hook/hive/* 拷贝至所有节点安装的hive的lib目录下
cp  -r /root/environments/apache-atlas-2.1.0/hook/hive/* /root/environments/apache-hive-3.1.0/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hive/* hadoop02:/root/environments/apache-hive-3.1.0/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hive/* hadoop03:/root/environments/apache-hive-3.1.0/lib/
# 然后将atlas-application.properties拷贝至所有节点安装的hive的conf目录下,一行一行地运行,不要全部复制,会出问题!!!
cd  /root/environments/apache-atlas-2.1.0/conf/ # !!不要忘了进到这个目录
cp  ./atlas-application.properties /root/environments/apache-hive-3.1.0/conf/
scp ./atlas-application.properties hadoop02:/root/environments/apache-hive-3.1.0/conf/
scp ./atlas-application.properties hadoop03:/root/environments/apache-hive-3.1.0/conf/

hive相关配置

#3台机器均需要配置
cd /root/environments/apache-hive-3.1.0/conf/

#hive-env.sh中添加
export JAVA_HOME=/root/environments/jdk1.8.0_341
export HIVE_AUX_JARS_PATH=/root/environments/apache-hive-3.1.0/lib/

#hive-site.xml中添加:
<property>
      <name>hive.exec.post.hooks</name>
      <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>

启动atlas

cd /root/environments/apache-atlas-2.1.0/bin
./atlas_start.py

说明:第一次启动atlas需要经过漫长的等待,即使显示启动完成了也需要等待一段时间才能访问atlas web ui
可以在/app/atlas/logs目录下进行日志的查看以及报错情况

启动完成后导入hive元数据

cd /root/environments/apache-atlas-2.1.0/bin
./import-hive.sh

导入hbase数据

/root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh

----------------------恭喜------------------Error报错!!!!------------------------------------

org.apache.atlas.AtlasException: Failed to load application properties
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:147)
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:100)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:123)
Caused by: org.apache.commons.configuration.ConversionException: 
'atlas.graph.index.search.solr.wait-searcher' doesn't map to a List object: true, a java.lang.Boolean

解释: 这个问题主要是由于hbase使用的commons-configuration包是1.6的,而atlas使用的是1.10的,函数返回类型不一致起了冲突。

  • apache-atlas-2.1.0/hook/hbase/atlas-hbase-plugin-impl/commons-configuration-1.10.jar
  • hbase-2.0.2/lib/commons-configuration-1.6.jar

解决办法:

  • 方法一: 在import-hbase.sh脚本中调整一下CP的加载顺序,将atlas调整在最前,这样根据jvm的类加载的最先机制,就可以优先使用atlas hive-hook中的版本,同时还不会影响hive自己的版本。

    # 将import-hbase.sh中的ATLASCPPATH调整在最前
    vi  /root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh
    

    将import-hbase.sh文件的 CP 变量改为如下

    CP="${ATLASCPPATH}:${HIVE_CP}:${HADOOP_CP}"
    

    不好意思,失败了! 虽然这个异常解决了,但是又出现了新的异常,出现了NoClassDefFoundError: com/fasterxml/jackson/core/exc/InputCoercion 。后续还会出现很多依赖找不到。

  • 方法二: 使用atlas的1.10包替换hbase自带的1.6包,操作步骤如下:

    #删除hbase的commons-configuration-1.6,
    #拷贝atlas下的1.10到hbase的lib下
    cd /root/environments/hbase-2.0.2/lib 
    rm -f commons-configuration-1.6.jar 
    cp /root/environments/apache-atlas-2.1.0/hook/hbase/atlas-hbase-plugin-impl/commons-configuration-1.10.jar /root/environments/hbase-2.0.2/lib
    

    注:这里解决办法只需要处理hadoop01机器,因为另外两个节点机器不需要执行这个导入,也没有安装atlas。

完成后就可查看正常的血缘关系了
http://hadoop01:21000

完结撒花!!!

二、Docker镜像启动


1. 加载镜像

# 进入到镜像文件路径,运行:
docker load -i mysql-5.7.tar # 加载hadoop01节点的镜像
docker load -i hadoop01-1.0.tar # 加载hadoop01节点的镜像
docker load -i hadoop02-1.0.tar # 加载hadoop02节点的镜像
docker load -i hadoop03-1.0.tar # 加载hadoop03节点的镜像

2. 创建容器

# 创建网络
docker network create -d bridge --subnet 192.168.0.0/24 --gateway 192.168.0.1 network_hadoop 
# 创建mysql容器
docker run -dit --name mysql5.7 -p 3306:3306 --hostname mysql57 --net network_hadoop --ip 192.168.0.2  -e MYSQL_ROOT_PASSWORD="123456" mysql:5.7 
# 创建节点容器
docker run -dit --name hadoop01 --privileged --hostname hadoop01 --net network_hadoop --ip 192.168.0.11 --add-host mysql57:192.168.0.1 --add-host hadoop02:192.168.0.12 --add-host hadoop03:192.168.0.13 -p 8042:8042 -p 8088:8088 -p 9870:9870 -p 9864:9864 -p 10002:10002 -p 16010:16010 -p 16000:16000 -p 8048:8048 -p 8983:8983 -p 21000:21000 -p 9868:9868 -p 10000:10000 -p 2181:2181 -p 9092:9092 hadoop01:1.0 /usr/sbin/init
docker run -dit --name hadoop02 --privileged --hostname hadoop02 --net network_hadoop --ip 192.168.0.12 --add-host mysql57:192.168.0.1 --add-host hadoop01:192.168.0.11 --add-host hadoop03:192.168.0.13 hadoop02:1.0 /usr/sbin/init
docker run -dit --name hadoop03 --privileged --hostname hadoop03 --net network_hadoop --ip 192.168.0.13 --add-host mysql57:192.168.0.1 --add-host hadoop01:192.168.0.11 --add-host hadoop02:192.168.0.12 hadoop03:1.0 /usr/sbin/init

3. 快速启动

注意:

  • hadoop01 & hadoop02 & hadoop03& 指都要启动。
  • hadoop01 | hadoop02 | hadoop03| 指启动任意一个或多个
  • hadoop01 ⊕ hadoop02 ⊕ hadoop03 指启动其中一个

(1)启动 Zookeeper hadoop01 & hadoop02 & hadoop03

zkServer.sh start # 启动zkServer,多台会自动集群,因此至少在两台机器启动

(2)启动 Hadoop hadoop01 ⊕ hadoop02

start-all.sh #启动hadoop集群,只需在集群主节点启动即可

(3)启动 Hive hadoop01 | hadoop02 | hadoop03

  1. 初始化hive元数据(首次安装hive或mysql才需要!!
    schematool -dbType mysql -initSchema # ,存mysql,一台机器运行就够了!!!!
    
  2. 启动hive元数据映射服务(后续启动只用开这个就行
    hive --service metastore & #后台启动单台机器hive元数据服务,一定要加 “&”
    hiveserver2 & #启动hiveserver2,支持JDBC和WebUI
    

注:schematool -dbType mysql -initSchema 初始化hive元数据(首次启动才需要或者mysql被重置了) ! !

(4)启动 Hbase hadoop01 ⊕ hadoop02

 start-hbase.sh # 在哪个节点启动那个就成为HMaster节点执。
 # 本例中hadoop03是备用HMaster,启动后将有两个HMaster节点。如果从hadoop03启动,就只有一个HMaster。

(5)启动 Kafka hadoop01 & hadoop02 & hadoop03

kafka-server-start.sh -daemon /root/environments/kafka_2.12-2.0.0/config/server.properties  

(6)启动 Solr hadoop01 & hadoop02 & hadoop03

/root/environments/solr-7.5.0/bin/solr start -force

(7)启动 Atlas hadoop01

/root/environments/apache-atlas-2.1.0/bin/atlas_start.py

(8)批量导入元数据(可选) hadoop01

# 导入hive元数据
/root/environments/apache-atlas-2.1.0/bin/import-hive.sh
# 导入hbase元数据
/root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh

4. 访问端口

WEB UI 访问地址 作用
haddop: Node UI 8042
haddop: YARN UI 8088 yarn的管理界面,查看hadoop集群信息
haddop: HDFS NN UI 9870
haddop: DataNode UI 9864
hiveserver2: webui 10002
hbase 16010,16000 使用16010访问!
kafka eagle(未安装) 8048
sorl 8983
atlas 21000
SecondaryNameNode
(高可用集群下未使用)
9868
Web Server 连接端口 作用
hdfs 9000
hiveserver2: server 10000 支持JDBC
zookeeper 2181
kafka 9092

注:

  1. NN UI:访问时可能会重定向到ResourceManager处于活动状态的主机,想访问要么映射活动主机的端口;要么手动杀死活动主机的RM,使RM自动切换到本例映射端口的主机上 (推荐),查询状态命令: yarn rmadmin -getAllServiceState。也可以使用命令hdfs haadmin -failover -forcefence -forceactive nn2 nn1切换,但是必须将dfs.ha.automatic-failover.enabled的配置改为false
  2. 端口超链接主机名为docker,请通过docker machine的ip进行访问,或者在windows hosts文件中添加docker machine IP 到 ‘docker’ 的映射。
    Docker下的Apache-Atlas元数据治理组件安装_第1张图片

附录

1. 常用命令集合

(1)通过进程名称找到它所占用的端口:

# 法一,立即推不好用
netstat -anp | grep hadoop	# 查hadoop相关进程的端口号
#[root@hadoop01 /] netstat -anp | grep hadoop
#[root@hadoop01 /]# 	              #毛也没有查到

# 法二,先查进程ID,再根据进程ID查端口。大智慧啊~
ps -ef | grep hadoop	# 查出进程ID 2419  
# [root@hadoop01 /]ps -ef | grep hadoop
# root  2419  2405 11 18:22 pts/1  00:02:14 /root/hadoop/bin/...
netstat -anp | grep 2419  # 端口16000,16010
# [root@hadoop01 /]# netstat -anp | grep 2419
# tcp        0      0 192.168.0.11:16000    0.0.0.0:*               LISTEN      2419/java
# tcp        0      0 0.0.0.0:16010           0.0.0.0:*               LISTEN      2419/java

(2)通过端口找到占用它的进程名称:

netstat -anp | grep 3690            ----->查到进程名为svnserver

你可能感兴趣的:(docker)