操作环境:
- 操作系统:Windows10
- Docker Desktop:4.10.1
- Docker version: 20.10.17, build 100c701
组件名称 组件版本
Hadoop 3.2.1
Hive 3.1.2
Hbase 2.3.4
Zookeeper 3.5.9
Kafka 2.6.2
Solr 7.4.0
Atlas 2.1.0
jdk 1.8
python 2.7
Maven 3.6.3
步骤一
在三个节点中执行下面命令,生产密钥文件
ssh-keygen
执行命令后会要求确认密钥文件的存储位置(默认~/.ssh/),这个过程直接按“Enter”键即可,id_rsa是本机私钥文件,id_rsa.pub是本机公钥文件
步骤二
分别在三个节点中执行下面命令:
ssh-copy-id hadoop01
ssh-copy-id hadoop02
ssh-copy-id hadoop03
这个过程会要求输入yes或者no,这里直接输入yes,然后输入主机密码
步骤三
在各节点用以下命令测试ssh免密登录
ssh hadoop01
ssh hadoop02
ssh hadoop03
安装noded
1.下载解压
wget https://cdn.npm.taobao.org/dist/node/v12.16.2/node-v12.16.2-linux-x64.tar.xz
tar -xf node-v12.16.2-linux-x64.tar.xz
cd node-v12.16.2-linux-x64/bin
./node -v
2.添加环境变量
export PATH= P A T H : PATH: PATH:NODE_HOME/bin
# 1.下载解压jdk到指定目录(先创建好目录)
tar -zxvf {file-dir}/jdk-8u341-linux-x64.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录
# 2.配置环境变量
vim /etc/profile
export JAVA_HOME=/root/environments/jdk1.8.0_341
export PATH=$PATH:$JAVA_HOME/bin
# 3.刷新使环境变量生效
source /etc/profile
# 4.验证
java -version
maven下载地址:https://dlcdn.apache.org/maven/maven-3/
# 1.下载解压maven到指定目录(先创建好目录)
tar -zxvf {file-dir}/apache-maven-3.6.3-bin.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录
# 2.配置环境变量
export MVN_HOME=/root/environments/apache-maven-3.6.3
export PATH=$PATH:$MVN_HOME/bin
# 3.刷新使环境变量生效
source /etc/profile
# 4.验证
mvn -version
# 5.配置maven仓库地址
vim $MVN_HOME/conf/settings.xml
<mirror>
<id>alimavenid>
<name>aliyun mavenname>
<url>http://maven.aliyun.com/nexus/content/groups/public/url>
<mirrorOf>centralmirrorOf>
mirror>
<mirror>
<id>repo1id>
<mirrorOf>centralmirrorOf>
<name>Human Readable Name for this Mirror.name>
<url>https://repo1.maven.org/maven2/url>
mirror>
<mirror>
<id>repo2id>
<mirrorOf>centralmirrorOf>
<name>Human Readable Name for this Mirror.name>
<url>https://repo2.maven.org/maven2/url>
mirror>
maven在调配置文件的时候优先调用的是/root/.m2/(隐藏目录)下的内容,创建/root/.m2目录一个然后将配置文件复制过去
mkdir /root/.m2
cp $MVN_HOME/conf/settings.xml /root/.m2/
安装顺序zookeeper ,hadoop,hbase,hive,kafka,solr,atlas
所有组件版本可在apache的仓库中找到https://archive.apache.org/dist/hbase/,国内镜像缺少很多版本,多为稳定版
tar -zxvf {file-dir}/apache-zookeeper-3.5.9-bin.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录
cd /root/environments/zookeeper-3.4.6/conf
将zoo_sample.cfg拷贝一份
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/root/environments/zookeeper-3.4.6/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888
创建环境变量
export ZK_HOME=/root/environments/zookeeper-3.4.6
export PATH=$PATH:$ZK_HOME/bin
source /etc/profile
创建data文件
mkdir /root/environments/zookeeper-3.4.6/data
cd /root/environments/zookeeper-3.4.6/data
touch myid && echo "1" > myid
然后将/root/environments/zookeeper-3.4.6整个文件夹拷贝到hadoop02、hadoop03并配置环境变量
scp -r /root/environments/zookeeper-3.4.6 hadoop02:/root/environments/
scp -r /root/environments/zookeeper-3.4.6 hadoop03:/root/environments/
并修改hadoop02、hadoop03机器上的/root/environments/zookeeper-3.4.6/data/myid文件(#不一样 ---------- 01≠02≠03)
hadoop02 2
hadoop03 3
3台机器上分别启动zk
zkServer.sh start
zkServer.sh status 查看状态
1.解压
tar -zxvf {file-dir}/hadoop-3.1.1.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录
2.加入环境变量
vi /etc/profile
#tip:在文件末尾追加
export HADOOP_HOME=/root/environments/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# 使配置文件生效
source /etc/profile
#测试
hadoop version
3.需要编辑的文件都在/root/environments/hadoop-3.1.1/etc/hadoop目录下
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://myclustervalue>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/data/hadoopvalue>
property>
<property>
<name>hadoop.http.staticuser.username>
<value>rootvalue>
property>
<property>
<name>ha.zookeeper.quorumname>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181value>
property>
<property>
<name>dfs.permissions.enabledname>
<value>falsevalue>
property>
<property>
<name>hadoop.proxyuser.root.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.root.groupsname>
<value>*value>
property>
configuration>
vi hadoop-env.sh
export JAVA_HOME=/root/environments/jdk1.8.0_341
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_ZKFC_USER="root"
export HDFS_JOURNALNODE_USER="root"
hdfs-site.xml
其中还设置hadoop01,hadoop02为NN()
<configuration>
<property>
<name>dfs.replicationname>
<value>2value>
property>
<property>
<name>dfs.permissions.enabledname>
<value>falsevalue>
property>
<property>
<name>dfs.nameservicesname>
<value>myclustervalue>
property>
<property>
<name>dfs.ha.namenodes.myclustername>
<value>nn1,nn2value>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1name>
<value>hadoop01:8020value>
property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2name>
<value>hadoop02:8020value>
property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1name>
<value>hadoop01:9870value>
property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2name>
<value>hadoop02:9870value>
property>
<property>
<name>dfs.namenode.shared.edits.dirname>
<value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/myclustervalue>
property>
<property>
<name>dfs.journalnode.edits.dirname>
<value>/data/hadoop/ha-hadoop/journaldatavalue>
property>
<property>
<name>dfs.ha.automatic-failover.enabledname>
<value>truevalue>
property>
<property>
<name>dfs.client.failover.proxy.provider.myclustername>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
<property>
<name>dfs.ha.fencing.methodsname>
<value>
sshfence
shell(/bin/true)
value>
property>
<property>
<name>dfs.ha.fencing.ssh.private-key-filesname>
<value>/root/.ssh/id_rsavalue>
property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeoutname>
<value>30000value>
property>
configuration>
mapred-env.sh
export JAVA_HOME=/root/environments/jdk1.8.0_341
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>hadoop01:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>hadoop01:19888value>
property>
<property>
<name>mapreduce.application.classpathname>
<value>
/root/environments/hadoop-3.1.1/etc/hadoop,
/root/environments/hadoop-3.1.1/share/hadoop/common/*,
/root/environments/hadoop-3.1.1/share/hadoop/common/lib/*,
/root/environments/hadoop-3.1.1/share/hadoop/hdfs/*,
/root/environments/hadoop-3.1.1/share/hadoop/hdfs/lib/*,
/root/environments/hadoop-3.1.1/share/hadoop/mapreduce/*,
/root/environments/hadoop-3.1.1/share/hadoop/mapreduce/lib/*,
/root/environments/hadoop-3.1.1/share/hadoop/yarn/*,
/root/environments/hadoop-3.1.1/share/hadoop/yarn/lib/*
value>
property>
configuration>
yarn-env.sh
export JAVA_HOME=/root/environments/jdk1.8.0_341
yarn-site.xml
其中还设置hadoop01,hadoop02为RM
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabledname>
<value>truevalue>
property>
<property>
<name>yarn.resourcemanager.cluster-idname>
<value>cluster1value>
property>
<property>
<name>yarn.resourcemanager.ha.rm-idsname>
<value>rm1,rm2value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm1name>
<value>hadoop01value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm2name>
<value>hadoop02value>
property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1name>
<value>hadoop01:8088value>
property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2name>
<value>hadoop02:8088value>
property>
<property>
<name>yarn.resourcemanager.zk-addressname>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181value>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.log-aggregation-enablename>
<value>truevalue>
property>
<property>
<name>yarn.log-aggregation.retain-secondsname>
<value>86400value>
property>
<property>
<name>yarn.resourcemanager.recovery.enabledname>
<value>truevalue>
property>
<property>
<name>yarn.resourcemanager.store.classname>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStorevalue>
property>
<property>
<name>yarn.nodemanager.vmem-check-enabledname>
<value>falsevalue>
property>
<property>
<name>yarn.nodemanager.vmem-pmem-rationame>
<value>5value>
property>
configuration>
workers
hadoop01
hadoop02
hadoop03
hadoop3有权限问题,为避免因权限问题造成的启动失败,在如下文件添加指定用户
vim /root/environments/hadoop-3.1.1/sbin/start-dfs.sh
vim /root/environments/hadoop-3.1.1/sbin/stop-dfs.sh
添加
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs # 已过时系统建议使用 HADOOP_SECURE_DN_USER
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
vim /root/environments/hadoop-3.1.1/sbin/start-yarn.sh
vim /root/environments/hadoop-3.1.1/sbin/stop-yarn.sh
添加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn # 已过时系统建议使用 HADOOP_SECURE_DN_USER
YARN_NODEMANAGER_USER=root
启动Zookeeper->JournalNode->格式化NameNode->创建命名空间zkfs->NameNode->Datanode->ResourceManager->NodeManager
3台机器上启动JournalNode
3台机器上启动JournalNode
cd /root/environments/hadoop-3.1.1/sbin/
./hadoop-daemon.sh start journalnode 启动journalnode
在hadoop01上执行格式化namenode
同步hadoop02的配置(#不一样 ---------- 01=02≠03)
#在hadoop01上执行
hadoop namenode -format
#将/data/hadoop/dfs/name目录下的内容拷贝到备用namenode主机
#如果备用namenode主机没有该目录就创建一个
scp -r /data/hadoop/dfs/name hadoop02:/data/hadoop/dfs/name/
格式化zkfc,在两个namenode主机上进行zkfc的格式化(#不一样 ---------- 01=02≠03)
./hdfs zkfc -formatZK
关闭JournalNode
#3台机器上关闭JournalNode
cd /root/environments/hadoop-3.1.1/sbin/
./hadoop-daemon.sh stop journalnode
启动hadoop
#在hadoop01机器上执行:
start-all.sh
tar -xzvf hbase-2.0.2-bin.tar.gz -C /root/environments/
hbase-env.sh
export JAVA_HOME=/root/environments/jdk1.8.0_341
export HBASE_CLASSPATH=/root/environments/hadoop-3.1.1/etc/hadoop
export HBASE_MANAGES_ZK=false # 使用自己安装的zookeeper。 一定要加这个,不使用自带的zookeeper,否则自己的zookeeper就无法启动了
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdirname>
<value>hdfs://mycluster/hbasevalue>
property>
<property>
<name>hbase.mastername>
<value>8020value>
property>
<property>
<name>hbase.zookeeper.quorumname>
<value>hadoop01,hadoop02,hadoop03value>
property>
<property>
<name>hbase.zookeeper.property.clientProtname>
<value>2181value>
property>
<property>
<name>hbase.zookeeper.property.dataDirname>
<value>/root/environments/zookeeper-3.4.6/confvalue>
property>
<property>
<name>hbase.tmp.dirname>
<value>/var/hbase/tmpvalue>
property>
<property>
<name>hbase.cluster.distributedname>
<value>truevalue>
property>
<property>
<name>hbase.cluster.distributedname>
<value>truevalue>
property>
configuration>
regionservers
hadoop01
hadoop02
hadoop03
Hbase启动高可用需要编辑文件backup-masters(里面添加备用的HMaster的主机)
vim backup-masters
hadoop03
配置环境变量
export HBASE_HOME=/root/environments/hbase-2.0.2
export PATH=$PATH:$HBASE_HOME/bin
source /etc/profile
拷贝到其他节点
scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/hbase-2.0.2 hadoop02:/root/environments/
scp -r /root/environments/hbase-2.0.2 hadoop03:/root/environments/
在 HMaster 节点启动,想让谁做HMaster 就在谁上面启动,本例中适合在hadoop01或hadoop02上启动。因为hadoop03是备用HMaster
start-hbase.sh
yarn rmadmin -getAllServiceState
查看http://hadoop03:16010/master-status
mysql安装
略
tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /app
mv apache-hive-3.1.2-bin apache-hive-3.1.2
需要编辑的文件都在/root/environments/apache-hive-3.1.0/conf目录下
vi hive-env.sh
export HADOOP_HOME=/root/environments/hadoop-3.1.1
export HIVE_CONF_DIR=/root/environments/apache-hive-3.1.0/conf
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://mysql57:3307/hive?createDatabaseIfNotExist=true&useSSL=falsevalue>
property>
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.jdbc.Drivervalue>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>rootvalue>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>123456value>
property>
<property>
<name>hive.metastore.warehouse.dirname>
<value>/user/hive/warehousevalue>
property>
<property>
<name>hive.exec.scratchdirname>
<value>/user/hive/tmpvalue>
property>
<property>
<name>hive.querylog.locationname>
<value>/user/hive/logvalue>
property>
<property>
<name>hive.metastore.localname>
<value>falsevalue>
property>
<property>
<name>hive.metastore.urisname>
<value>thrift://hadoop01:9083value>
property>
<property>
<name>hive.server2.thrift.portname>
<value>10000value>
property>
<property>
<name>hive.server2.thrift.bind.hostname>
<value>0.0.0.0value>
property>
<property>
<name>hive.server2.webui.hostname>
<value>0.0.0.0value>
property>
<property>
<name>hive.server2.webui.portname>
<value>10002value>
property>
<property>
<name>hive.server2.long.polling.timeoutname>
<value>5000value>
property>
<property>
<name>hive.server2.enable.doAsname>
<value>truevalue>
property>
<property>
<name>datanucleus.autoCreateSchemaname>
<value>falsevalue>
property>
<property>
<name>datanucleus.fixedDatastorename>
<value>truevalue>
property>
<property>
<name>hive.execution.enginename>
<value>mrvalue>
property>
configuration>
将mysql的驱动jar包上传到hive的lib目录下
https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20
配置环境变量
export HIVE_HOME=/root/environments/apache-hive-3.1.0
export PATH=$PATH:$HIVE_HOME/bin
刷新
source /etc/profile
初始化hive的元数据库
schematool -dbType mysql -initSchema
启动hive的matestore(重要 不知道为什么依赖hbase,应该是我看错了)
hive --service metastore
hive --service metastore & #后台启动
使用ps查看metastore服务是否起来
ps -ef | grep metastore # ps -ef表示查看全格式的全部进程。 -e 显示所有进程。-f 全格式。-h 不显示标题。-l 长格式。-w 宽输出
进入hive进行验证
hive
命令: create database filetest;
show databases;
切换filetest数据库:use filetest;
将/app/hive目录进行分发(目的是所有机器都可以使用hive,不需要修改任何配置)
scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/apache-hive-3.1.0 hadoop02:/root/environments/
scp -r /root/environments/apache-hive-3.1.0 hadoop03:/root/environments/
并刷新
source /etc/profile
tar -xzvf kafka_2.12-2.0.0.tgz -C /root/environments/
#需要编辑的文件都在/app/kafka/config目录下
修改server.properties中的
broker.id=1
zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181
修改zookeeper.properties(未做修改)
dataDir=/home/hadoop/data/zookeeper/zkdata
clientPort=2181
修改consumer.properties(未做修改)
zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181
修改producer.properties(未做修改)
metadata.broker.list=hadoop01:9092,hadoop02:9092,hadoop03:9092
配置环境变量
export KAFKA_HOME=/root/environments/kafka_2.12-2.0.0
export PATH=$PATH:$KAFKA_HOME/bin
刷新
source /etc/profile
将/app/kafka文件分发到其余的机器并修改kafka_2.12-2.0.0/config/server.properties文件中的broker.id的值 (#不一样 ---------- 01≠02≠03)
scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/kafka_2.12-2.0.0 hadoop02:/root/environments/
scp -r /root/environments/kafka_2.12-2.0.0 hadoop03:/root/environments/
并刷新
source /etc/profile
vim /root/environments/kafka_2.12-2.0.0/config/server.properties
hadoop02 2
hadoop03 3
kafka 群起脚本
for i in hadoop102 hadoop103 hadoop104
do
echo "========== $i =========="
ssh $i '/opt/module/kafka/bin/kafka-server-start.sh -daemon
/opt/module/kafka/config/server.properties'
done
各自三台机器启动kafka
#3台机器分别启动kafka
后台启动:
kafka-server-start.sh -daemon /root/environments/kafka_2.12-2.0.0/config/server.properties
http://hadoop01:8048
1)查看当前服务器中的所有 topic
kafka-topics.sh --zookeeper hadoop01:2181 --list
2)创建 topic(后面分发部署好集群后会同步消息)
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
Alienware^Kafka基础笔记
kafka-topics.sh --zookeeper hadoop01:2181 --create --replication-factor 3 --partitions 1 --topic first #
选项说明:
–topic 定义 topic 名
–replication-factor 定义副本数
–partitions 定义分区数
3)删除 topic
[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --delete --topic first
需要 server.properties 中设置 delete.topic.enable=true 否则只是标记删除。
4)发送消息
[root@hadoop102 kafka]$ bin/kafka-console-producer.sh --broker-list hadoop102:9092 --topic first
>hello world
5)消费消息
[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --zookeeper hadoop102:2181 --topic first
[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first
[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --from-beginning --topic first
--from-beginning:会把主题中以往所有的数据都读取出来。
6)查看某个 Topic 的详情
[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --describe --topic first
7)修改分区数
[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --alter --topic first --partitions 6
解压
tar -xzvf solr-7.5.0.tgz -C /root/environments/
需要编辑的文件都在/app/solr/bin目录下
solr.in.sh
ZK_HOST="hadoop01:2181,hadoop02:2181,hadoop03:2181"
SOLR_HOST="hadoop01"
export SOLR_HOME=/root/environments/solr-7.5.0
export PATH=$PATH:$SOLR_HOME/bin
source /etc/profile
注:配置环境变量会出大问题!!!!!
# 别人的
16:42:39.035 INFO (main) [ ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /opt/xxx/solr-6.5.1/server/solr
16:42:39.099 INFO (main) [ ] o.a.s.s.SolrDispatchFilter Loading solr.xml from SolrHome (not found in ZooKeeper)
16:42:39.100 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading container configuration from /opt/xxx/solr-6.5.1/server/solr/solr.xml
16:42:39.413 INFO (main) [ ]
# 我的
2022-07-23 10:55:51.469 INFO (main) [ ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /root/environments/solr-7.5.0
2022-07-23 10:55:51.638 INFO (zkConnectionManagerCallback-2-thread-1) [ ] o.a.s.c.c.ConnectionManager zkClient has connected
2022-07-23 11:29:34.848 INFO (main) [ ] o.a.s.s.SolrDispatchFilter Loading solr.xml from SolrHome (not found in ZooKeeper)
2022-07-23 11:29:34.854 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading container configuration from /root/environments/solr-7.5.0/solr.xml
2022-07-23 11:29:34.859 ERROR (main) [ ] o.a.s.s.SolrDispatchFilter Could not start Solr. Check solr/home property and the logs
2022-07-23 11:29:34.903 ERROR (main) [ ] o.a.s.c.SolrCore null:org.apache.solr.common.SolrException: solr.xml does not exist in /root/environments/solr-7.5.0 cannot start Solr
将/root/environments/solr-7.5.0文件分发到其余的机器并修改/root/environments/solr-7.5.0/bin/solr.in.sh文件中的SOLR_HOST的值
scp -r /root/environments/solr-7.5.0 hadoop02:/root/environments/
scp -r /root/environments/solr-7.5.0 hadoop03:/root/environments/
修改/root/environments/solr-7.5.0/bin/solr.in.sh文件中的SOLR_HOST的值(#不一样 ---------- 01≠02≠03)
vim /root/environments/solr-7.5.0/bin/solr.in.sh
hadoop02 hadoop02
hadoop03 hadoop03
3台机器分别启动solr
# 一定要到目录执行,不要设置环境变量!!会导致后面的solr.solr.home目录错误“/root/environments/solr-7.5.0/”,变成你设置的环境变量,而对的是/root/environments/solr-7.5.0/server/solr
cd /root/environments/solr-7.5.0/bin
./solr start -force
# 或者
/root/environments/solr-7.5.0/bin/solr start -force
# 查看状态
cd /root/environments/solr-7.5.0/bin
./solr status
# 或者
/root/environments/solr-7.5.0/bin/solr status
#
下面就成功了
“cloud”:{
“ZooKeeper”:“hadoop01:2181,hadoop02:2181,hadoop03:2181”,
“liveNodes”:“3”,
“collections”:“3”}}
或者访问 http://localhost:8983/solr/ ,有cloud菜单说明集群成功
atlas下载地址:https://atlas.apache.org/#/Downloads
# 解压atlas压缩包
tar -zxvf {file-dir}/apache-atlas-2.1.0-sources.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录
编辑项目的顶级pom.xml文件,修改各个组件的版本,
# 进入atlas根目录,修改pom.xml文件
cd /root/environments/apache-atlas-sources-2.1.0/
vim pom.xml
主要修改如下
安装组件对应版本,由于本此安装均是对照这里定义的版本安装的,因此不做修改
这里是引用需要修改的代码部分(网上资料说需要修改该部分代码,我已修改并成功运行,目前只测试了hive的hook,没有遇到任何问题,不知道不修改会怎样,)
反正我没改这里
vim /root/environments/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
577行
将:
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;
改为:
String catalogName = null;
vim /root/environments/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java
81行
将:
this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
改为:
this.metastoreHandler = null;
进行编译
cd /root/environments/apache-atlas-sources-2.1.0/
打包:(使用外部hbase和solr的打包方式,这里不考虑使用atlas自带的)
mvn clean -DskipTests package -Pdist -X
注:编译过程中可能会遇到报错,基本都是因为网络的问题,重试即可解决,如若重试也没有解决jar包的下载问题,可手动下载缺失的jar,放到本地maven仓库后重新打包。
遇到问题一:nodejs下载失败
收到拷贝到下载目录C:\Users\shuch\Downloads\node-12.16.0-linux-x64.tar.gz hadoop01:/root/.m2/repository/com/github/eirslett/node/12.16.0/
问题二:依赖于GitHub上面的包下载失败
设置代理或者修改hosts
# localhost name resolution is handled within DNS itself.# 127.0.0.1 localhost# ::1 localhost20.205.243.166 github.com
# GitHub Start
140.82.114.4 github.com
199.232.69.194 github.global.ssl.fastly.net
199.232.68.133 raw.githubusercontent.com
# GitHub End
编译完成后的atlas存放位置
cd /root/environments/apache-atlas-sources-2.1.0/distro/target
apache-atlas-2.1.0-bin.tar.gz 就是我们所需要的包
解压
tar -xzvf apache-atlas-2.1.0-bin.tar.gz
需要编辑的文件在/root/environments/apache-atlas-2.1.0/conf
cd /root/environments/apache-atlas-2.1.0/conf
atlas-env.sh
#indicates whether or not a local instance of HBase should be started for Atlas
export MANAGE_LOCAL_HBASE=false
# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLR=false
# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRA=false
# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCH=false
export JAVA_HOME=/root/environments/jdk1.8.0_341
export HBASE_CONF_DIR=/root/environments/hbase-2.0.2/conf
atlas-application.properties (这里给出全部内容,只集成了hive作为测试,如若有其他组件的需要,进行组件的安装与atlas hook的配置即可)
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
######### Graph Database Configs #########
# Graph Database
#Configures the graph database to use. Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase
# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus
#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=
# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true
# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1
# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1
# Graph Search Index
atlas.graph.index.search.backend=solr
#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.search.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true
# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150
######### Import Configs #########
#atlas.import.temp.directory=/temp/import
######### Notification Configs #########
# atlas.notification.embedded=true 使用内嵌的kafka
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.kafka.bootstrap.servers=hadoop01:9092,hadoop02:9092,hadoop03:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=true
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/[email protected]
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443
######### Security Properties #########
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true
#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none
#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true
######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=
######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=
######### JAAS Configuration ########
#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/[email protected]
######### Server Properties #########
atlas.rest.address=http://hadoop01:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=hadoop01:2181,hadoop02:2181,hadoop03:2181
######### High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=:
#atlas.server.ha.zookeeper.auth=:
######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
######### Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=
######### Performance Configs #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000
######### CSRF Configs #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER
############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=
############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query..
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=
######### Compiled Query Cache Configuration #########
# The size of the compiled query cache. Older queries will be evicted from the cache
# when we reach the capacity.
#atlas.CompiledQueryCache.capacity=1000
# Allows notifications when items are evicted from the compiled query
# cache because it has become full. A warning will be issued when
# the specified number of evictions have occurred. If the eviction
# warning threshold <= 0, no eviction warnings will be issued.
#atlas.CompiledQueryCache.evictionWarningThrottle=0
######### Full Text Search Configuration #########
#Set to false to disable full text search.
#atlas.search.fulltext.enable=true
######### Gremlin Search Configuration #########
#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false
########## Add http headers ###########
#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.=
######### UI Configuration ########
atlas.ui.default.version=v1
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
注册hook 编辑 hbase-site.xml
vi /root/environments/hbase-2.0.2/conf/hbase-site.xml
添加以下配置
<property>
<name>hbase.coprocessor.master.classesname>
<value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessorvalue>
property>
同步其他节点
scp /root/environments/hbase-2.0.2/conf/hbase-site.xml hadoop02:/root/environments/hbase-2.0.2/conf/
scp /root/environments/hbase-2.0.2/conf/hbase-site.xml hadoop03:/root/environments/hbase-2.0.2/conf/
引入依赖
# 将文件atlas-application.properties压缩进atlas下的hook/hbase/hbase-bridge-shim-2.1.0.jar包里
zip -u /root/environments/apache-atlas-2.1.0/hook/hbase/hbase-bridge-shim-2.1.0.jar /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties
# 然后将atlas的hook/hbase/* 拷贝至所有节点安装的hbase的lib目录下
cp -r /root/environments/apache-atlas-2.1.0/hook/hbase/* /root/environments/hbase-2.0.2/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hbase/* hadoop02:/root/environments/hbase-2.0.2/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hbase/* hadoop03:/root/environments/hbase-2.0.2/lib/
引入配置
atlas-application.properties文件添加配置
vi /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties
######### hbase Hook Configs #######
atlas.hook.hbase.synchronous=false
atlas.hook.hbase.numRetries=3
atlas.hook.hbase.queueSize=10000
然后将atlas-application.properties文件拷贝到hbase/conf/
# 然后将atlas-application.properties拷贝至所有节点安装的hbase的conf目录下,一行一行地运行,不要全部复制,会出问题!!!
cd /root/environments/apache-atlas-2.1.0/conf/ # !!不要忘了进到这个目录
cp ./atlas-application.properties /root/environments/hbase-2.0.2/conf/
scp ./atlas-application.properties hadoop02:/root/environments/hbase-2.0.2/conf/
scp ./atlas-application.properties hadoop03:/root/environments/hbase-2.0.2/conf/
# 编辑atlas属性文件
vi atlas-application.properties
# 修改atlas存储数据主机
atlas.graph.storage.hostname=hadoop01:2181,hadoop02:2181,hadoop03:2181
# 建立软连接
ln -s /root/environments/hbase-2.0.2/conf/ /root/environments/apache-atlas-2.1.0/conf/hbase/
cp /root/environments/hbase-2.0.2/conf/* /root/environments/apache-atlas-2.1.0/conf/hbase/ # 看不懂这操作
# 添加HBase配置文件路径
vi /root/environments/apache-atlas-2.1.0/conf/atlas-env.sh
export HBASE_CONF_DIR=/root/environments/hbase-2.0.2/conf
集成solr
cp -r /root/environments/apache-atlas-2.1.0/conf/solr /root/environments/solr-7.5.0/
cd /root/environments/solr-7.5.0/
mv solr/ atlas-solr
scp -r ./atlas-solr/ hadoop02:/root/environments/solr-7.5.0/
scp -r ./atlas-solr/ hadoop03:/root/environments/solr-7.5.0/
# 重启solr
./solr stop -force
./solr start -force
# 查看状态
./solr status
# 或者访问 http://localhost:8983/solr/ ,有cloud菜单说明集群成功
在solr中创建索引
./solr create -c vertex_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force
./solr create -c edge_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force
./solr create -c fulltext_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force
如果以上创建错误,可以使用命令“solr delete -c ${collection_name}”删除重新创建。
kafka相关操作
在kafka中创建相关topic
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
集成hive
# 将文件atlas-application.properties压缩进atlas下的hook/hive/hive-bridge-shim-2.1.0.jar包里
zip -u /root/environments/apache-atlas-2.1.0/hook/hive/hive-bridge-shim-2.1.0.jar /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties
# 然后将atlas的hook/hive/* 拷贝至所有节点安装的hive的lib目录下
cp -r /root/environments/apache-atlas-2.1.0/hook/hive/* /root/environments/apache-hive-3.1.0/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hive/* hadoop02:/root/environments/apache-hive-3.1.0/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hive/* hadoop03:/root/environments/apache-hive-3.1.0/lib/
# 然后将atlas-application.properties拷贝至所有节点安装的hive的conf目录下,一行一行地运行,不要全部复制,会出问题!!!
cd /root/environments/apache-atlas-2.1.0/conf/ # !!不要忘了进到这个目录
cp ./atlas-application.properties /root/environments/apache-hive-3.1.0/conf/
scp ./atlas-application.properties hadoop02:/root/environments/apache-hive-3.1.0/conf/
scp ./atlas-application.properties hadoop03:/root/environments/apache-hive-3.1.0/conf/
hive相关配置
#3台机器均需要配置
cd /root/environments/apache-hive-3.1.0/conf/
#hive-env.sh中添加
export JAVA_HOME=/root/environments/jdk1.8.0_341
export HIVE_AUX_JARS_PATH=/root/environments/apache-hive-3.1.0/lib/
#hive-site.xml中添加:
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
启动atlas
cd /root/environments/apache-atlas-2.1.0/bin
./atlas_start.py
说明:第一次启动atlas需要经过漫长的等待,即使显示启动完成了也需要等待一段时间才能访问atlas web ui
可以在/app/atlas/logs目录下进行日志的查看以及报错情况
启动完成后导入hive元数据
cd /root/environments/apache-atlas-2.1.0/bin
./import-hive.sh
导入hbase数据
/root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh
----------------------恭喜------------------Error报错!!!!------------------------------------
org.apache.atlas.AtlasException: Failed to load application properties
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:147)
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:100)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:123)
Caused by: org.apache.commons.configuration.ConversionException:
'atlas.graph.index.search.solr.wait-searcher' doesn't map to a List object: true, a java.lang.Boolean
解释: 这个问题主要是由于hbase使用的commons-configuration包是1.6的,而atlas使用的是1.10的,函数返回类型不一致起了冲突。
解决办法:
方法一: 在import-hbase.sh脚本中调整一下CP的加载顺序,将atlas调整在最前,这样根据jvm的类加载的最先机制,就可以优先使用atlas hive-hook中的版本,同时还不会影响hive自己的版本。
# 将import-hbase.sh中的ATLASCPPATH调整在最前
vi /root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh
将import-hbase.sh文件的 CP 变量改为如下
CP="${ATLASCPPATH}:${HIVE_CP}:${HADOOP_CP}"
不好意思,失败了! 虽然这个异常解决了,但是又出现了新的异常,出现了NoClassDefFoundError: com/fasterxml/jackson/core/exc/InputCoercion
。后续还会出现很多依赖找不到。
方法二: 使用atlas的1.10包替换hbase自带的1.6包,操作步骤如下:
#删除hbase的commons-configuration-1.6,
#拷贝atlas下的1.10到hbase的lib下
cd /root/environments/hbase-2.0.2/lib
rm -f commons-configuration-1.6.jar
cp /root/environments/apache-atlas-2.1.0/hook/hbase/atlas-hbase-plugin-impl/commons-configuration-1.10.jar /root/environments/hbase-2.0.2/lib
注:这里解决办法只需要处理hadoop01机器,因为另外两个节点机器不需要执行这个导入,也没有安装atlas。
完成后就可查看正常的血缘关系了
http://hadoop01:21000
# 进入到镜像文件路径,运行:
docker load -i mysql-5.7.tar # 加载hadoop01节点的镜像
docker load -i hadoop01-1.0.tar # 加载hadoop01节点的镜像
docker load -i hadoop02-1.0.tar # 加载hadoop02节点的镜像
docker load -i hadoop03-1.0.tar # 加载hadoop03节点的镜像
# 创建网络
docker network create -d bridge --subnet 192.168.0.0/24 --gateway 192.168.0.1 network_hadoop
# 创建mysql容器
docker run -dit --name mysql5.7 -p 3306:3306 --hostname mysql57 --net network_hadoop --ip 192.168.0.2 -e MYSQL_ROOT_PASSWORD="123456" mysql:5.7
# 创建节点容器
docker run -dit --name hadoop01 --privileged --hostname hadoop01 --net network_hadoop --ip 192.168.0.11 --add-host mysql57:192.168.0.1 --add-host hadoop02:192.168.0.12 --add-host hadoop03:192.168.0.13 -p 8042:8042 -p 8088:8088 -p 9870:9870 -p 9864:9864 -p 10002:10002 -p 16010:16010 -p 16000:16000 -p 8048:8048 -p 8983:8983 -p 21000:21000 -p 9868:9868 -p 10000:10000 -p 2181:2181 -p 9092:9092 hadoop01:1.0 /usr/sbin/init
docker run -dit --name hadoop02 --privileged --hostname hadoop02 --net network_hadoop --ip 192.168.0.12 --add-host mysql57:192.168.0.1 --add-host hadoop01:192.168.0.11 --add-host hadoop03:192.168.0.13 hadoop02:1.0 /usr/sbin/init
docker run -dit --name hadoop03 --privileged --hostname hadoop03 --net network_hadoop --ip 192.168.0.13 --add-host mysql57:192.168.0.1 --add-host hadoop01:192.168.0.11 --add-host hadoop02:192.168.0.12 hadoop03:1.0 /usr/sbin/init
注意:
hadoop01 & hadoop02 & hadoop03
中&
指都要启动。hadoop01 | hadoop02 | hadoop03
中|
指启动任意一个或多个hadoop01 ⊕ hadoop02 ⊕ hadoop03
中⊕
指启动其中一个
(1)启动 Zookeeper hadoop01 & hadoop02 & hadoop03
zkServer.sh start # 启动zkServer,多台会自动集群,因此至少在两台机器启动
(2)启动 Hadoop hadoop01 ⊕ hadoop02
start-all.sh #启动hadoop集群,只需在集群主节点启动即可
(3)启动 Hive hadoop01 | hadoop02 | hadoop03
首次安装hive或mysql才需要!!
)schematool -dbType mysql -initSchema # ,存mysql,一台机器运行就够了!!!!
后续启动只用开这个就行
)hive --service metastore & #后台启动单台机器hive元数据服务,一定要加 “&”
hiveserver2 & #启动hiveserver2,支持JDBC和WebUI
注:schematool -dbType mysql -initSchema 初始化hive元数据(首次启动才需要或者mysql被重置了) ! !
(4)启动 Hbase hadoop01 ⊕ hadoop02
start-hbase.sh # 在哪个节点启动那个就成为HMaster节点执。
# 本例中hadoop03是备用HMaster,启动后将有两个HMaster节点。如果从hadoop03启动,就只有一个HMaster。
(5)启动 Kafka hadoop01 & hadoop02 & hadoop03
kafka-server-start.sh -daemon /root/environments/kafka_2.12-2.0.0/config/server.properties
(6)启动 Solr hadoop01 & hadoop02 & hadoop03
/root/environments/solr-7.5.0/bin/solr start -force
(7)启动 Atlas hadoop01
/root/environments/apache-atlas-2.1.0/bin/atlas_start.py
(8)批量导入元数据(可选) hadoop01
# 导入hive元数据
/root/environments/apache-atlas-2.1.0/bin/import-hive.sh
# 导入hbase元数据
/root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh
WEB UI | 访问地址 | 作用 |
---|---|---|
haddop: Node UI | 8042 | |
haddop: YARN UI | 8088 | yarn的管理界面,查看hadoop集群信息 |
haddop: HDFS NN UI | 9870 | |
haddop: DataNode UI | 9864 | |
hiveserver2: webui | 10002 | |
hbase | 16010,16000 | 使用16010访问! |
kafka eagle(未安装) | 8048 | |
sorl | 8983 | |
atlas | 21000 | |
SecondaryNameNode (高可用集群下未使用) |
9868 |
Web Server | 连接端口 | 作用 |
---|---|---|
hdfs | 9000 | |
hiveserver2: server | 10000 | 支持JDBC |
zookeeper | 2181 | |
kafka | 9092 |
注:
yarn rmadmin -getAllServiceState
。也可以使用命令hdfs haadmin -failover -forcefence -forceactive nn2 nn1
切换,但是必须将dfs.ha.automatic-failover.enabled
的配置改为false
。(1)通过进程名称找到它所占用的端口:
# 法一,立即推不好用
netstat -anp | grep hadoop # 查hadoop相关进程的端口号
#[root@hadoop01 /] netstat -anp | grep hadoop
#[root@hadoop01 /]# #毛也没有查到
# 法二,先查进程ID,再根据进程ID查端口。大智慧啊~
ps -ef | grep hadoop # 查出进程ID 2419
# [root@hadoop01 /]ps -ef | grep hadoop
# root 2419 2405 11 18:22 pts/1 00:02:14 /root/hadoop/bin/...
netstat -anp | grep 2419 # 端口16000,16010
# [root@hadoop01 /]# netstat -anp | grep 2419
# tcp 0 0 192.168.0.11:16000 0.0.0.0:* LISTEN 2419/java
# tcp 0 0 0.0.0.0:16010 0.0.0.0:* LISTEN 2419/java
(2)通过端口找到占用它的进程名称:
netstat -anp | grep 3690 ----->查到进程名为svnserver