kubernetes容器化常用中间件之kafka

kafka

Kafka是由Apache软件基金会开发的一个开源流处理平台,由Scala和Java编写。该项目的目标是为处理实时数据提供一个统一、高吞吐、低延迟的平台。其持久化层本质上是一个“按照分布式事务日志架构的大规模发布/订阅消息队列”,[3]这使它作为企业级基础设施来处理流式数据非常有价值。此外,Kafka可以通过Kafka Connect连接到外部系统(用于数据输入/输出),并提供了Kafka Streams——一个Java流式处理库。

容器化步骤

构建kafka镜像

容器化的第一步首先是要构建kafka的基础镜像,当然你也可以使用dockerhub上别人已经做好的kafka镜像,但是这样的话,你首先的熟悉别人做的镜像的相关配置,可能你想使用的kafka版本没有对应的镜像,所以这里我们就从0开始,自己构建你想使用的任何版本的kafka镜像
两种方式构建kafka镜像

方式一

访问kafak官方下载你想使用的kafka版本的包,比如我下载kafka_2.11-1.1.1.tgz,下载完整后,开始编写dockerfile文件。在编写dockerfile之前,可以先在kafka官网查询相关版本使用的jdk的版本(kafka官网点击documenet,其中有个java version的tab),然后找对应的jdk的基础镜像做完base镜像。

FROM mcr.microsoft.com/java/jdk:8-zulu-alpine

LABEL MAINTAINER="[email protected]"

ENV KAFKA_VERSION="1.1.1" SCALA_VERSION="2.11"

ADD kafka_2.11-1.1.1.tgz /opt 
# 修改java的jvm参数     
RUN sed  -i 's/-Xmx1G -Xms1G/-Xmx4g -Xms4G/g' /opt/kafka_2.11-1.1.1/bin/kafka-server-start.sh

VOLUME ["/kafka"]

ENV KAFKA_HOME /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION}

ENV PATH=${PATH}:${KAFKA_HOME}/bin

# 9092是broker server的监听地址,5555是JMX监听的端口
EXPOSE 9092 5555
# 这里没有指定CMD或者ENTRYPOINT,可以自行自定,没指定的原因是在部署的时候,自己指定启动命令

方式二

直接在dockerfile中用命令下载构建

FROM mcr.microsoft.com/java/jdk:8-zulu-alpine

LABEL MAINTAINER="[email protected]"

RUN apk add --update unzip wget curl

ENV KAFKA_VERSION="1.1.1" SCALA_VERSION="2.11"

RUN wget -q https://archive.apache.org/dist/kafka/${KAFKA_VERSION}/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz  -O /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz \
    && tar xfz /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz -C /opt && rm /tmp/kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz \
    && sed  -i 's/-Xmx1G -Xms1G/-Xmx4g -Xms4G/g' /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION}/bin/kafka-server-start.sh

VOLUME ["/kafka"]

ENV KAFKA_HOME /opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION}

ENV PATH=${PATH}:${KAFKA_HOME}/bin

# 9092是broker server的监听地址,5555是JMX监听的端口
EXPOSE 9092 5555

到此为止,kafka的基础镜像,我们已经做好了,接下来就是在k8s中部署kafka集群了

部署kafka集群

在 部署前,我们需要了解kafka集群的实现方式,我们手动部署kafka集群的方式比较简单,可以参考官网相关文档,我们这里部署的时候,尽量的做到灵活化,就是kafka的很多参数都可以动态的修改。kafka的配置文件我们通过configmap的方式挂载,并且配置文件是模版的方式填充,依靠initcontainer去动态的渲染配置文件。

headless service

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: kafka
    app.kubernetes.io/instance: kafka-demo
    app.kubernetes.io/version: 1.1.1
  name: kafka-demo
  namespace: default
spec:
  clusterIP: None
  ports:
  - name: broker
    port: 9092
    protocol: TCP
    targetPort: 9092
  selector:
    app.kubernetes.io/instance: kafka-demo
    app.kubernetes.io/version: 1.1.1
  type: ClusterIP

configmap

apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/name: kafka
    app.kubernetes.io/instance: kafka-demo
    app.kubernetes.io/version: 1.1.1
  name: kafka-config
  namespace: default
data:
  init.sh: |-
    #!/bin/bash
    set -x

    cp /etc/kafka-configmap/log4j.properties /etc/kafka/

    KAFKA_BROKER_ID=${HOSTNAME##*-}

    ZOOKEEPER=${ZOOKEEPER}

    # to dynamic set kafka server broker id
    sed "s/#init#broker.id=#init#/broker.id=$KAFKA_BROKER_ID/" /etc/kafka-configmap/server.properties > /etc/kafka/server.properties.tmp

    # to dynamic set kafka server zookeeper connect info
    sed -i "s/#init#zookeeper.connect=#init#/zookeeper.connect=$ZOOKEEPER/" /etc/kafka/server.properties.tmp

    [ $? -eq 0 ] && mv /etc/kafka/server.properties.tmp /etc/kafka/server.properties
  log4j.properties: |-
    log4j.rootLogger=INFO, stdout

    log4j.appender.stdout=org.apache.log4j.ConsoleAppender
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.kafkaAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.kafkaAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.kafkaAppender.File=${kafka.logs.dir}/server.log
    log4j.appender.kafkaAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.kafkaAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.stateChangeAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.stateChangeAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.stateChangeAppender.File=${kafka.logs.dir}/state-change.log
    log4j.appender.stateChangeAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.stateChangeAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.requestAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.requestAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.requestAppender.File=${kafka.logs.dir}/kafka-request.log
    log4j.appender.requestAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.requestAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.cleanerAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.cleanerAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log
    log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.controllerAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.controllerAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.controllerAppender.File=${kafka.logs.dir}/controller.log
    log4j.appender.controllerAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.controllerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    log4j.appender.authorizerAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.authorizerAppender.DatePattern='.'yyyy-MM-dd-HH
    log4j.appender.authorizerAppender.File=${kafka.logs.dir}/kafka-authorizer.log
    log4j.appender.authorizerAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.authorizerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

    # Turn on all our debugging info
    #log4j.logger.kafka.producer.async.DefaultEventHandler=DEBUG, kafkaAppender
    #log4j.logger.kafka.client.ClientUtils=DEBUG, kafkaAppender
    #log4j.logger.kafka.perf=DEBUG, kafkaAppender
    #log4j.logger.kafka.perf.ProducerPerformance$ProducerThread=DEBUG, kafkaAppender
    #log4j.logger.org.I0Itec.zkclient.ZkClient=DEBUG
    log4j.logger.kafka=INFO, kafkaAppender

    log4j.logger.kafka.network.RequestChannel$=WARN, requestAppender
    log4j.additivity.kafka.network.RequestChannel$=false

    #log4j.logger.kafka.network.Processor=INFO, requestAppender
    #log4j.logger.kafka.server.KafkaApis=INFO, requestAppender
    #log4j.additivity.kafka.server.KafkaApis=false
    log4j.logger.kafka.request.logger=WARN, requestAppender
    log4j.additivity.kafka.request.logger=false

    log4j.logger.kafka.controller=INFO, controllerAppender
    log4j.additivity.kafka.controller=false

    log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender
    log4j.additivity.kafka.log.LogCleaner=false

    log4j.logger.state.change.logger=INFO, stateChangeAppender
    log4j.additivity.state.change.logger=false

    #Change this to debug to get the actual audit log for authorizer.
    log4j.logger.kafka.authorizer.logger=WARN, authorizerAppender
    log4j.additivity.kafka.authorizer.logger=false
  server.properties: |-
    ############################# Socket Server Settings #############################

    # The id of the broker. This must be set to a unique integer for each broker.
    #init#broker.id=#init#

    #init#broker.rack=#init#

    listeners=PLAINTEXT://:9092

    # The number of threads handling network requests
    num.network.threads=3

    # The number of threads doing disk I/O
    num.io.threads=8

    # The send buffer (SO_SNDBUF) used by the socket server
    socket.send.buffer.bytes=102400

    # The receive buffer (SO_RCVBUF) used by the socket server
    socket.receive.buffer.bytes=102400

    # The maximum size of a request that the socket server will accept (protection against OOM)
    socket.request.max.bytes=104857600

    ############################# Log Basics #############################

    # A comma seperated list of directories under which to store log files
    log.dirs=/var/lib/kafka/data/topics

    # The default number of log partitions per topic. More partitions allow greater
    # parallelism for consumption, but this will also result in more files across
    # the brokers.
    num.partitions=1

    default.replication.factor=3

    min.insync.replicas=2

    auto.create.topics.enable=true

    # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
    # This value is recommended to be increased for installations with data dirs located in RAID array.
    num.recovery.threads.per.data.dir=1

    ############################# Log Flush Policy #############################

    # Messages are immediately written to the filesystem but by default we only fsync() to sync
    # the OS cache lazily. The following configurations control the flush of data to disk.
    # There are a few important trade-offs here:
    #    1. Durability: Unflushed data may be lost if you are not using replication.
    #    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
    #    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
    # The settings below allow one to configure the flush policy to flush data after a period of time or
    # every N messages (or both). This can be done globally and overridden on a per-topic basis.

    # The number of messages to accept before forcing a flush of data to disk
    log.flush.interval.messages=10000

    # The maximum amount of time a message can sit in a log before we force a flush
    log.flush.interval.ms=1000

    ############################# Log Retention Policy #############################

    # The following configurations control the disposal of log segments. The policy can
    # be set to delete segments after a period of time, or after a given size has accumulated.
    # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
    # from the end of the log.

    # The minimum age of a log file to be eligible for deletion
    log.retention.hours=168

    # A size-based retention policy for logs. Segments are pruned from the log unless the remaining
    # segments drop below log.retention.bytes. Functions independently of log.retention.hours.
    log.retention.bytes=1073741824

    # The maximum size of a log segment file. When this size is reached a new log segment will be created.
    log.segment.bytes=1073741824

    # The interval at which log segments are checked to see if they can be deleted according
    # to the retention policies
    log.retention.check.interval.ms=300000

    ############################# Zookeeper #############################

    # Zookeeper connection string (see zookeeper docs for details).
    # This is a comma separated host:port pairs, each corresponding to a zk
    # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
    # You can also append an optional chroot string to the urls to specify the
    # root directory for all kafka znodes.
    #init#zookeeper.connect=#init#

    # Zookeeper open acl in root path, we need set this vaule to true.
    # zookeeper.set.acl=true

    # Timeout in ms for connecting to zookeeper
    #zookeeper.connection.timeout.ms=6000


    ############################# Group Coordinator Settings #############################

    # The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
    # The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
    # The default value for this is 3 seconds.
    # We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
    # However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
    #group.initial.rebalance.delay.ms=0

statefulset

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: kafka
    app.kubernetes.io/instance: kafka-demo
    app.kubernetes.io/version: 1.1.1
  name: kafka-demo
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/instance: kafka-demo
      app.kubernetes.io/version: 1.1.1
  serviceName: kafka-demo
  template:
    metadata:
      labels:
        app.kubernetes.io/instance: kafka-demo
        app.kubernetes.io/version: 1.1.1
    spec:
      containers:
      - command:
        - kafka-server-start.sh
        - /etc/kafka/server.properties
        env:
        - name: KAFKA_LOG4J_OPTS
          value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
        - name: JMX_PORT
          value: "5555"
        - name: KAFKA_HEAP_OPTS
        # 自定更改成符合自己需求的内存大小
          value: -Xmx1G -Xms1G
        # 自行更换成制作的镜像
        image: kafka:2.11-1.1.1
        imagePullPolicy: IfNotPresent
        name: kafka
        ports:
        - containerPort: 5555
          name: jmx
          protocol: TCP
        - containerPort: 9092
          name: broker
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 9092
          timeoutSeconds: 1
        resources:
          limits:
            # 自行更换成合适的cpu
            cpu: 512m
            # 自行更换成合适的memory
            memory: 512M
          requests:
            cpu: 512m
            memory: 512
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /etc/kafka
          name: config
        - mountPath: /var/lib/kafka/data
          name: data
      initContainers:
      - command:
        - /bin/bash
        - /etc/kafka-configmap/init.sh
        env:
        # zk地址,自行替换
        - name: ZOOKEEPER
          value: localhost:2182
        image: huangjia/kafka:2.11-1.1.1
        imagePullPolicy: IfNotPresent
        name: init-config
        resources:
          limits:
            cpu: 512m
            memory: 512M
          requests:
            cpu: 512m
            memory: 512M
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /etc/kafka-configmap
          name: configmap
        - mountPath: /etc/kafka
          name: config
      restartPolicy: Always
      volumes:
      - configMap:
          defaultMode: 420
          name: kafka-config
        name: configmap
      - emptyDir: {}
        name: config
      # 自行使用pvc替换
      - emptyDir: {}
        name: data

部署完kafka后可以在创建一个service给具体的服务使用

踩过的坑

在kafka容器化完成之后,肯定是需要给具体的应用使用的,可以给一个具体的服务加端口的方式,但是者只能在集群内部,也可以通过配置ingress的方式给集群外部的服务使用,但是在给集群外部的服务使用的时候,通常会遇到问题,具体的原因:集群外部使用一个地址去访问kafka的时候,producer在链接kafka的时候,内部是先通过服务地址,获取对应的topic的partition的leader partition所在的节点(broker)的信息,然后根据partition的leader所在broker的host和端口,再去创建链接,这个时候,获取的broker的host是集群内部的statefulset的pod的dns record,集群外部是不能正常解析的。所以导致不能正常的使用kafka。根据这种情况,可以通过创建hostnetwork的statefulset去做,但是这需要自己去动态的管理hostnetwork的port来避免端口的冲突。有需求的朋友可以自行的去尝试。使用hostnetwork的时候,需要注意的是,container的port中的 HostPort和ContainerPort必须一致。

后话

kafka的很多可配置的参数我没详细设置,具体的可以参考kafka官网,根据自己的需要去动态配置参数,动态的配置的方式也比较简单,就是通过在configmap的配置文件中通过占位的方式,然后在init.sh脚本中去动态的渲染,参数的来源可以是通过设置initcontainer的环境变量的方式。后续我会继续整理RabbitMQ,Zookeeper,Etcd,Mysql,MariaDB,Mongo的容器化的部署。

你可能感兴趣的:(kubernetes)