作者:刘宇
CSDN博客地址:https://blog.csdn.net/liuyu973971883
有部分资料参考,如有侵权,请联系删除。如有不正确的地方,烦请指正,谢谢。
前提条件:需要安装JAVA的运行环境,我这边使用的是JDK1.8版本,安装过程就不演示了。
这边搭建的是zookeeper的集群
cd /software
tar -xzvf zookeeper-3.4.14.tar.gz
#进入zookeeper解压下来的文件夹
cd /software/zookeeper-3.4.14
#创建zookeeper所使用的快照的存储路径
mkdir dataDir
#创建zookeeper所使用的日志的存储路径
mkdir dataDirLog
#进入zookeeper文件夹的conf文件夹
cd /software/zookeeper-3.4.14/conf
#拷贝配置文件
cp zoo_sample.cfg zoo.cfg
#编辑zoo.cfg文件
vi zoo.cfg
添加下面几项配置
#路径为我们刚才创建的文件夹路径
dataDir=/software/zookeeper-3.4.14/dataDir
dataLogDir=/software/zookeeper-3.4.14/dataDirLog
#zookeeper集群,有几个zookeeper就写几个server
server.1=192.168.40.101:2888:3888
server.2=192.168.40.102:2888:3888
server.3=192.168.40.103:2888:3888
cd /software/zookeeper-3.4.14/dataDir
echo "1" > myid
#进入zookeeper文件夹的bin目录下
cd /software/zookeeper-3.4.14/bin
#启动zookeeper
./zkServer.sh start
这边搭建的是单台kafka,我是安装在103Linux上的
#进入software目录
cd /software
#解压
tar -xzvf kafka_2.11-2.2.1.tgz
#修改文件名
mv kafka_2.11-2.2.1 kafka
cd /software/kafka/config
vi server.properties
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# broker就是一个kafka的部署实例,在一个kafka集群中,每一台kafka都要有一个broker.id
# 并且,该id唯一,且必须为整数
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = security_protocol://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
#The number of threads handling network requests
# 默认处理网络请求的线程个数 3个
num.network.threads=3
# The number of threads doing disk I/O
# 执行磁盘IO操作的默认线程个数 8
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
# socket服务使用的进行发送数据的缓冲区大小,默认100kb
socket.send.buffer.bytes=102400
# The receive buffer (SO_SNDBUF) used by the socket server
# socket服务使用的进行接受数据的缓冲区大小,默认100kb
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
# socket服务所能够接受的最大的请求量,防止出现OOM(Out of memory)内存溢出,默认值为:100m
# (应该是socker server所能接受的一个请求的最大大小,默认为100M)
socket.request.max.bytes=104857600
############################# Log Basics (数据相关部分,kafka的数据称为log)#############################
# A comma seperated list of directories under which to store log files
# 一个用逗号分隔的目录列表,用于存储kafka接受到的数据
log.dirs=/home/uplooking/data/kafka
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
# 每一个topic所对应的log的partition分区数目,默认1个。更多的partition数目会提高消费
# 并行度,但是也会导致在kafka集群中有更多的文件进行传输
# (partition就是分布式存储,相当于是把一份数据分开几份来进行存储,即划分块、划分分区的意思)
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
# 每一个数据目录用于在启动kafka时恢复数据和在关闭时刷新数据的线程个数。如果kafka数据存储在磁盘阵列中
# 建议此值可以调整更大。
num.recovery.threads.per.data.dir=1
############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
############################# Log Flush Policy (数据刷新策略)#############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs(平衡) here:
# 1. Durability 持久性: Unflushed data may be lost if you are not using replication.
# 2. Latency 延时性: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput 吞吐量: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# kafka中只有基于消息条数和时间间隔数来制定数据刷新策略,而没有大小的选项,这两个选项可以选择配置一个
# 当然也可以两个都配置,默认情况下两个都配置,配置如下。
# The number of messages to accept before forcing a flush of data to disk
# 消息刷新到磁盘中的消息条数阈值
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
# 消息刷新到磁盘生成一个log数据文件的时间间隔
#log.flush.interval.ms=1000
############################# Log Retention Policy(数据保留策略) #############################
# The following configurations control the disposal(清理) of log segments(分片). The policy can
# be set to delete segments after a period of time, or after a given size has accumulated(累积).
# A segment will be deleted whenever(无论什么时间) *either* of these criteria(标准) are met. Deletion always happens
# from the end of the log.
# 下面的配置用于控制数据片段的清理,只要满足其中一个策略(基于时间或基于大小),分片就会被删除
# The minimum age of a log file to be eligible for deletion
# 基于时间的策略,删除日志数据的时间,默认保存7天
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes. 1G
# 基于大小的策略,1G
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
# 数据分片策略
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies 5分钟
# 每隔多长时间检测数据是否达到删除条件
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
brolker.id=1
#配置内网kafka集群的监听器,用于告诉外部连接者访问指定的主机名和端口。如果是外网集群则需要使用advertised.listeners。
listeners=PLAINTEXT://192.168.40.103:9092
#配置zookeeper集群的地址
zookeeper.connect=192.168.40.101:2181,192.168.40.102:2181,192.168.40.103:2181
#进入kafka目录
cd /software/kafka
#后台启动
nohup bin/kafka-server-start.sh config/server.properties &
这边搭建的是单台Elasticsearch,我是安装在103的Linux上的
#进入software目录
cd /software
#解压
tar -zxvf elasticsearch-5.6.8.tar.gz
vi /software/elasticsearch-5.6.8/config/elasticsearch.yml
cluster.name: my-application
node.name: node-1
path.data: /software/elasticsearch-5.6.8/data
path.logs: /software/elasticsearch-5.6.8/logs
network.host: 0.0.0.0
http.port: 9200
#进入到elasticsearch目录
cd /software/elasticsearch-5.6.8
#创建data文件夹
mkdir data
#创建logs文件夹
mkdir logs
因为elasticsearch不能root用户启动,所以我们这边创建一个用户和组来启动它
#创建用户组
groupadd elsearch
#创建用户并添加到用户组中
useradd -r -g elsearch elsearch
passwd elsearch
chown -R elsearch:elsearch /software/elasticsearch-5.6.8
#切换启动用户
su elsearch
#进入到elasticsearch的bin目录下
cd /software/elasticsearch-5.6.8/bin
#后台启动
nohup ./elasticsearch &
#观察nohup日志,查看是否出错,一般都会出现线程数等不够错误
tail -f nohup.out
curl http://192.168.40.103:9200
#如果出现下面信息即启动成功
{
"name" : "node-1",
"cluster_name" : "my-application",
"cluster_uuid" : "2UlrJ43PQDKbrqvcTG9IyA",
"version" : {
"number" : "5.6.8",
"build_hash" : "688ecce",
"build_date" : "2018-02-16T16:46:30.010Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}
是因为最大的文件数不够,切换的root用户下修改/etc/security/limits.conf文件即可
* soft nofile 65536
* hard nofile 65536
是因为最大线程个数太低,切换的root用户下修改/etc/security/limits.conf文件即可
* soft nproc 4096
* hard nproc 4096
是因为限制一个进程可以拥有的VMA(虚拟内存区域)的数量不够,切换的root用户下修改/etc/sysctl.conf文件即可
vm.max_map_count=262144
sysctl -p
是因为如果在Centos6下,是不支持SecComp,而ES5.2.1默认bootstrap.system_call_filter为true进行检测,所以导致检测失败,失败后直接导致ES不能启动
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
create database test1;
use test1;
create table user(id int PRIMARY KEY AUTO_INCREMENT,username varchar(50),password varchar(50));
# 连接器名称
name=mysql_test1
# 连接器使用的类
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
# 最大任务数
tasks.max=1
# mysql地址
connection.url=jdbc:mysql://192.168.40.102:3306/test1?user=root&password=root&useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=GMT&autoReconnect=true
# 监听模式:分为incrementing、timestamp、timestamp+incrementing
mode=incrementing
# 监听的字段名
incrementing.column.name=id
# 主题前缀
topic.prefix=mysql_test1_
每10秒刷新一次
poll.interval.ms=10000
# 连接器名称
name=es_mysql_test1
# 使用的类
connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
# 最大任务数
tasks.max=1
# 主题名,一般都是主题前缀+表名
topics=mysql_test1_user
# 表示写入ES的每条记录的键为kafka主题名字+分区id+偏移量
key.ignore=true
# elasticsearch地址
connection.url=http://192.168.40.103:9200
# elasticsearch索引类型
type.name=test1_user
这边使用的是单机的connect模式
#进入kafka的bin目录
cd /software/kafka/bin
#后台启动connect并加上两个连接器的配置文件
nohup ./connect-standalone.sh ../config/connect-standalone.properties ../config/es-mysql-test1.properties ../config/mysql-test1.properties &
#查看nohup日志是否有错误,或者也可以通过connector的api查看各个连接器的状态
tail -f nohup.out
curl -X GET http://ip:8083/connector-plugins
GET /connectors – 返回所有正在运行的connector名。
POST /connectors – 新建一个connector; 请求体必须是json格式并且需要包含name字段和config字段,name是connector的名字,config是json格式,必须包含你的connector的配置信息。
GET /connectors/{name} – 获取指定connetor的信息。
GET /connectors/{name}/config – 获取指定connector的配置信息。
PUT /connectors/{name}/config – 更新指定connector的配置信息。
GET /connectors/{name}/status – 获取指定connector的状态,包括它是否在运行、停止、或者失败,如果发生错误,还会列出错误的具体信息。
GET /connectors/{name}/tasks – 获取指定connector正在运行的task。
GET /connectors/{name}/tasks/{taskid}/status – 获取指定connector的task的状态信息。
PUT /connectors/{name}/pause – 暂停connector和它的task,停止数据处理知道它被恢复。
PUT /connectors/{name}/resume – 恢复一个被暂停的connector。
POST /connectors/{name}/restart – 重启一个connector,尤其是在一个connector运行失败的情况下比较常用
POST /connectors/{name}/tasks/{taskId}/restart – 重启一个task,一般是因为它运行失败才这样做。
DELETE /connectors/{name} – 删除一个connector,停止它的所有task并删除配置。