apache atlas独立部署(hadoop、hive、kafka、hbase、solr、zookeeper)

虚拟机(centos7):192.168.198.131

java 1.8

一、hadoop 安装

1、设置主机名 master

vim /etc/sysconfig/network

NETWORKING=yes 
HOSTNAME=master

vim /etc/hosts

192.168.198.131 master

重启生效 reboot

2、关闭防火墙

systemctl stop firewalld firewall-cmd --state

3、设置免密码登录,感觉没有必要吧(有必要,后面用到,后面有设置)

4、Hadoop-2.7.4 解压

[root@master tools]# tar -zxvf hadoop-2.7.4.tar.gz

5、jdk

[root@master hadoop-2.7.4]# java -version

java version "1.8.0_161"

Java(TM) SE Runtime Environment (build 1.8.0_161-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

6、查看hadoop版本

[root@master bin]# ./hadoop version

Error: JAVA_HOME is not set and could not be found.

修改hadoop环境配置

vim hadoop-env.sh

export JAVA_HOME=/usr/local/tools/jdk1.8.0_161 
export HADOOP_LOG_DIR=/data/hadoop_repo/logs/hadoop

查看版本:

[root@master bin]# ./hadoop version

Hadoop 2.7.4

Subversion https://[email protected]/repos/asf/hadoop.git -r cd915e1e8d9d0131462a0b7301586c175728a282

Compiled by kshvachk on 2017-08-01T00:29Z

Compiled with protoc 2.5.0

From source with checksum 50b0468318b4ce9bd24dc467b7ce1148

This command was run using /usr/local/tools/hadoop-2.7.4/share/hadoop/common/hadoop-common-2.7.4.jar

7、修改配置文件

[root@master hadoop]# pwd

/usr/local/tools/hadoop-2.7.4/etc/hadoop

vim core-site.xml


    
        fs.defaultFS
        hdfs://master:9000
    
    
        hadoop.tmp.dir
        /data/hadoop_repo
    

vim hdfs-site.xml
副本数量


    
        dfs.replication
        1
    

[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml

vim mapred-site.xml

表示在yarn这个引擎执行


    
        mapreduce.framework.name
        yarn
    

vim yarn-site.xml

yarn跑哪个引擎,白名单


    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
    
        yarn.nodemanager.env-whitelist
        JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
    

8、hdfs使用前需要进行格式化(和格式化磁盘类似):不要频繁执行,如果出错,把hadoop_repo目录删除,在执行格式化

确保路径 /data/hadoop_repo 存在

bin/hdfs namenode -format

20/05/05 19:44:45 INFO common.Storage: Storage directory /data/hadoop_repo/dfs/name has been successfully formatted.

9、环境变量

vim /etc/profile

HADOOP_HOME=/usr/local/tools/hadoop-2.7.4 
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${PATH}

10、设置免密码登录 ssh-keygen -t rsa

如果不设置,执行 start-all.sh 命令,会一直提示:

[root@master sbin]# ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Starting namenodes on [master]

The authenticity of host 'master (192.168.198.131)' can't be established.

ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.

ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.

Are you sure you want to continue connecting (yes/no)? yes

master: Warning: Permanently added 'master,192.168.198.131' (ECDSA) to the list of known hosts.

root@master's password:

master: starting namenode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-namenode-master.out

The authenticity of host 'localhost (::1)' can't be established.

ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.

ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.

Are you sure you want to continue connecting (yes/no)? yes

localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

root@localhost's password:

root@localhost's password: localhost: Permission denied, please try again.

localhost: starting datanode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-datanode-master.out

Starting secondary namenodes [0.0.0.0]

The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.

ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.

ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.

Are you sure you want to continue connecting (yes/no)? yuH[[3~[[D[[D^[[C

Please type 'yes' or 'no': yes

0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.

[email protected]'s password:

0.0.0.0: starting secondarynamenode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-secondarynamenode-master.out

starting yarn daemons

starting resourcemanager, logging to /usr/local/tools/hadoop-2.7.4/logs/yarn-root-resourcemanager-master.out

root@localhost's password:

localhost: starting nodemanager, logging to /usr/local/tools/hadoop-2.7.4/logs/yarn-root-nodemanager-master.out

未设置ssh免密码登录

[root@master sbin]# ssh 192.168.198.131

[email protected]'s password:

Last failed login: Tue May 5 19:51:21 PDT 2020 from localhost on ssh:notty

There was 1 failed login attempt since the last successful login.

Last login: Tue May 5 19:28:21 2020 from 192.168.198.1

设置ssh免密码登录

[root@master sbin]# ssh-keygen -t rsa

三次回车

执行

[root@master ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

然后可以直接ssh了

exit退出

11、启动

[root@master sbin]# ./start-all.sh
[root@master sbin]# jps
7202 Jps
6836 NodeManager
6709 ResourceManager
6536 SecondaryNameNode
6201 NameNode
6347 DataNode

12、访问

localhost:8088 localhost:50070

二、hive 安装

1、解压

2、配置环境变量

[root@master apache-hive-2.3.7]# hive --version

Hive 2.3.7

Git git://Alans-MacBook-Air.local/Users/gates/git/hive -r cb213d88304034393d68cc31a95be24f5aac62b6

Compiled by gates on Tue Apr 7 12:42:45 PDT 2020

From source with checksum 9da14e8ac4737126b00a1a47f662657e

3、

[root@master conf]# cp hive-default.xml.template hive-site.xml

[root@master conf]# vim hive-site.xml


    javax.jdo.option.ConnectionUserName
    root


    javax.jdo.option.ConnectionPassword
    root


    javax.jdo.option.ConnectionURL
    jdbc:mysql://192.168.198.131:3306/hive


    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver

4、复制mysql的驱动程序到hive/lib下面

5、创建mysql下的hive数据库,然后执行

mysql 启动,创建hive库,不使用hive自带的

[root@master mysql]# service mysqld start

/etc/init.d/mysqld: line 239: my_print_defaults: command not found

/etc/init.d/mysqld: line 259: cd: /usr/local/mysql: No such file or directory

Starting MySQL ERROR! Couldn't find MySQL server (/usr/local/mysql/bin/mysqld_safe)

CREATE DATABASE hive;

[root@master bin]# schematool -dbType mysql -initSchema

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Metastore connection URL: jdbc:mysql://192.168.198.131:3306/hive

Metastore Connection Driver : com.mysql.jdbc.Driver

Metastore connection User: root

Starting metastore schema initialization to 2.3.0

Initialization script hive-schema-2.3.0.mysql.sql

Initialization script completed

schemaTool completed

[root@master bin]#

6、执行hive命令

[root@master apache-hive-2.3.7]# hive

which: no hbase in (/usr/local/tools/apache-hive-2.3.7/bin:/usr/local/tools/hadoop-2.7.4/bin:/usr/local/tools/hadoop-2.7.4/sbin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/tools/apache-hive-2.3.7/lib/hive-common-2.3.7.jar!/hive-log4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive>

7、查看第5步创建的数据库,已经有了很多表

mysql -uroot -p

目测主要看两个表:TBLS , COLUMNS_V2

8、测试

[root@master apache-hive-2.3.7]# hive

which: no hbase in (/usr/local/tools/apache-hive-2.3.7/bin:/usr/local/tools/hadoop-2.7.4/bin:/usr/local/tools/hadoop-2.7.4/sbin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/tools/apache-hive-2.3.7/lib/hive-common-2.3.7.jar!/hive-log4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive> show tables;

OK

Time taken: 4.426 seconds

hive> create database hive_1;

OK

Time taken: 0.198 seconds

hive> show databases;

OK

default

hive_1

Time taken: 0.03 seconds, Fetched: 2 row(s)

hive>

看看hadoop存储信息:

[root@master ~]# hadoop fs -lsr /

lsr: DEPRECATED: Please use 'ls -R' instead.

drwx-wx-wx - root supergroup 0 2020-05-05 22:29 /tmp

drwx-wx-wx - root supergroup 0 2020-05-05 22:29 /tmp/hive

drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root

drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root/1fed50ca-d9f6-4b5c-b80b-a81a66679812

drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root/1fed50ca-d9f6-4b5c-b80b-a81a66679812/_tmp_space.db

drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user

drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive

drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive/warehouse

drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive/warehouse/hive_1.db

三、kafka (伪分布式)安装,前提安装zookeeper

1、安装

# tar zxvf kafka_2.11-2.2.1.tgz # mv kafka_2.11-2.2.1 kafka # cd kafka

启动kafka服务

# nohup bin/kafka-server-start.sh config/server.properties &

创建topic

# bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

查看topic

# bin/kafka-topics.sh --list --zookeeper localhost:2181

2、测试

使用kafka-console-producer.sh 发送消息

# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

使用kafka-console-consumer.sh消费消息

# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

3、kafka集群

config 下配置多个server属性文件,设置不同的 broker.id

bin/kafka-server-start.sh config/server-1.properties &

需要先启动zookeeper

四、hbase 安装

1、解压,配置环境变量

[root@master hbase-1.4.13]# vim /etc/profile

HBASE_HOME=/usr/local/tools/hbase-1.4.13
export PATH=${HBASE_HOME}/bin:${PATH}

[root@master hbase-1.4.13]# source /etc/profile

2、修改配置文件

向hbase-env.sh中添加:

export JAVA_HOME=/usr/local/tools/jdk1.8.0_161 export HBASE_MANAGES_ZK=false

修改hbase-site.xml为




    hbase.rootdir
    hdfs://master:9000/hbase



 hbase.zookeeper.property.clientPort
 2181
 Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.
 



    hbase.tmp.dir
    /usr/local/tools/hbase-1.4.13/data


        
                hbase.zookeeper.quorum
                master
        


    hbase.cluster.distributed
    true
    The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    


3、启动hbase

[root@master bin]# ./start-hbase.sh

访问:查看HBase界面 端口 16010

4、问题总结

1)

running master, logging to /usr/local/tools/hbase-1.4.13/bin/../logs/hbase-root-master-master.out

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

解决方案:

如报错所示,在hbase-env.sh配置文件中存在某些在jdk8中不存在命令,查看配置文件发现如下场景:

Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+

export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"

注释即可

2)

HMaster和HRegionServer是Hbase的两个子进程,但是使用jps发现没有启动起来,所以去我们配置的logs查看错误信息。提示:

Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.

但是在hbase-env.sh文件中设置了export HBASE_MANAGES_ZK=false

设置不使用自带zookeeper,这一步设置完按理说就可以使用独立的zookeeper程序了,但是还是报错。很明显,这是启动自带zookeeper与独立zookeeper冲突了。因为把hbase.cluster.distributed设置为false,也就是让hbase以standalone模式运行时,依然会去启动自带的zookeeper。

所以要做如下设置,值为true:

vim conf/hbase-site.xml



hbase.cluster.distributed

true


3)

2020-05-07 02:57:17,302 INFO [main-SendThread(192.168.181.131:2181)] zookeeper.ClientCnxn: Opening socket connection to server 192.168.181.131/192.168.181.131:2181. Will not attempt to authenticate using SASL (unknown error)

hbase-site.xml 配置的zookeeper 主机为hostname(master),之前是ip

五、solr集群 solr-7.5.0,前提,已经配置好了zk伪集群

1、解压

2、启动测试

solr start

3、配置zk集群和SOLR_PORT,zokeeper伪集群已经配置好,需要在solr中配置zk和SOLR_PORT

[root@master bin]# vim solr.in.sh

ZK_HOST="192.168.198.131:2181,192.168.198.131:2182,192.168.198.131:2183"

4、solr 创建 collection

bash $SOLR_HOME/bin/solr create -c vertex_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2

bash $SOLR_HOME/bin/solr create -c edge_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2

bash $SOLR_HOME/bin/solr create -c fulltext_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2

5、solr集群启动

zk伪集群

solr伪集群

/usr/local/tools/solr-cloud/solr1/bin/solr start -force 
/usr/local/tools/solr-cloud/solr2/bin/solr start -force 
/usr/local/tools/solr-cloud/solr3/bin/solr start -force 
/usr/local/tools/solr-cloud/solr4/bin/solr start -force 

/usr/local/tools/solr-cloud/solr1/bin/solr stop 
/usr/local/tools/solr-cloud/solr2/bin/solr stop 
/usr/local/tools/solr-cloud/solr3/bin/solr stop 
/usr/local/tools/solr-cloud/solr4/bin/solr stop
[root@master bin]#./solr create_collection -c test_collection -shards 2 -replicationFactor 2 -force

-c 指定库(collection)名称

-shards 指定分片数量,可简写为 -s ,索引数据会分布在这些分片上

-replicationFactor 每个分片的副本数量

-force 上文已说明

加 -force 是因为solr不允许使用 root 进行操作的,其他账户可不加

solr集群完成

参考:https://blog.csdn.net/qq_37936542/article/details/83113083

六、apache atlas 独立部署开始

使用atlas内置的hbase和solr

/usr/local/project/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server

不使用atlas内置的hbase和solr

/usr/local/project/apache-atlas-sources-2.0.0-alone

[root@master apache-atlas-sources-2.0.0-alone]# mvn clean -DskipTests package -Pdist

编译完成,使用 distro/target/apache-atlas-2.0.0-server

集成solr到apache atlas

cd /usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf

[root@master conf]# cp -r solr/ /usr/local/tools/solr-7.5.0/apache-atlas-conf

独立部署:主要修改配置文件
atlas-env.sh

export HBASE_CONF_DIR=/usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf

atlas-application.properties

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=master:2181,master:2182,master:2183
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000

#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index
atlas.graph.index.search.backend=solr

#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=master:2181,master:2182,master:2183
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: http://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true

# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  Notification Configs  #########
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=master:2181/kafka,master:2182/kafka,master:2183/kafka
atlas.kafka.bootstrap.servers=master:9092,master:9093,master:9094
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/[email protected]
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/[email protected]

#########  Server Properties  #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=master:2181,master:2182,master:2183

#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=:
#atlas.server.ha.zookeeper.auth=:



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query..
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.=

######### Hive Hook Configs #######
 
atlas.hook.hive.synchronous=false
 
atlas.hook.hive.numRetries=3
 
atlas.hook.hive.queueSize=10000
 
######### Sqoop Hook Configs #######
 
atlas.hook.sqoop.synchronous=false
 
atlas.hook.sqoop.numRetries=3
 
atlas.hook.sqoop.queueSize=10000

storage.cql.protocol-version=3
storage.cql.local-core-connections-per-host=10
storage.cql.local-max-connections-per-host=20
storage.cql.local-max-requests-per-connection=2000
storage.buffer-size=1024


七、atlas 独立部署问题总结

1)

Could not find hbase-site.xml in %s. Please set env var HBASE_CONF_DIR to the hbase client conf dir

软连接不对 ?

cd /usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0

[root@cdh632-worker03 atlas]# ln -s /etc/hbase/conf /opt/module/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf

[root@cdh632-worker03 atlas]# pwd

vim atlas-env.sh

export HBASE_CONF_DIR=/usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf

2)

启动报错

2020-05-23 10:30:46,794 WARN - [main:] ~ Unexpected exception during getDeployment() (HBaseStoreManager:399)

java.lang.RuntimeException: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend

修改配置文件

master:2181,master:2182,master:2183

参考:
https://blog.csdn.net/qq_34024275/article/details/105393745

图数据库建立流程如下:

配置文件配置图数据库的数据存储位置和索引存储位置:

atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage

atlas.graph.storage.backend=hbase

atlas.graph.storage.port=2181

atlas.graph.storage.hbase.table=atlas-test

atlas.graph.storage.hostname=docker2,docker3,docker4

# Graph Search Index Backend

atlas.graph.index.search.backend=elasticsearch

atlas.graph.index.search.hostname=127.0.0.1

atlas.graph.index.search.index-name=atlas_test

3)

at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily

atlas hbase 是2.0,本地启动的是 1.4.13

解决:



hbase.rootdir

hdfs://master:9000/hbase





hbase.zookeeper.property.clientPort

2181

Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.







hbase.tmp.dir

/usr/local/tools/hbase-2.2.4/data







hbase.zookeeper.quorum

master





hbase.cluster.distributed

true


4)

2020-05-23 23:52:00,415 WARN - [main:] ~ JanusGraphException: Could not open global configuration (AtlasJanusGraphDatabase:167)

2020-05-23 23:52:00,432 WARN - [main:] ~ Unexpected exception during getDeployment() (HBaseStoreManager:399)

java.lang.RuntimeException: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend

配置文件添加:

storage.cql.protocol-version=3

storage.cql.local-core-connections-per-host=10

storage.cql.local-max-connections-per-host=20

storage.cql.local-max-requests-per-connection=2000

storage.buffer-size=1024

5)

Caused by: org.apache.solr.common.SolrException: Cannot connect to cluster at master:2181,master:2182,master:2183/solr: cluster not found/not ready

at org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:385)

at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:141)

at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:383)

at org.janusgraph.diskstorage.solr.Solr6Index.(Solr6Index.java:218)

master:2181,master:2182,master:2183/solr

改成 master:2181,master:2182,master:2183 就可以了

补充:

1、java环境变量

vim /etc/profile

加上以下代码:

export JAVA_HOME=/usr/local/tools/jdk1.8.0_161 
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar 
export PATH=$JAVA_HOME/bin:$PATH

然后保存退出,使配置生效:

source /etc/profile

2、zookeeper 伪分布式

[root@master zookeeper-01]# cd data/

[root@master data]# touch myid

[root@master data]# echo 1 >> myid

修改配置文件。把conf目录下的zoo_sample.cfg文件改名为zoo.cfg(IP号记得改成你自己的)

server.1=192.168.198.131:2881:3881

server.2=192.168.198.131:2882:3882

server.3=192.168.198.131:2883:3883

zookeeper 集群启动报错

org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException

zoo.cfg 需要改成自己创建的data路径,因为里面有myid文件

dataDir=/usr/local/tools/zk-cloud/zookeeper01/data

创建启动文件,省的一个一个启动

vim zk-start.sh

cd zookeeper01/bin

./zkServer.sh start

cd ../../

cd zookeeper02/bin

./zkServer.sh start

cd ../../

cd zookeeper03/bin

./zkServer.sh start

cd ../../

chmod -R 755 zk-start.sh

zookeeper启动成功,查看 zkServer.sh stauts

你可能感兴趣的:(apache atlas独立部署(hadoop、hive、kafka、hbase、solr、zookeeper))