storm集群 安装笔记

本文主要是参照strom的管网中的document中来进行安装,管网地址

1,首先需要安装zookeeper集群.可参考管网或网络上安装(很简单).

2,在storm的work机器上和nimbus机器上安装相关的依赖.即需安装jdk1.6+和python2.6+版本.

3,下载storm的二进制文件,我这里下载了0.93版本的.

4,解压storm的tar包到指定的目录(STORM_DIR).

5,将STORM_DIR配置到环境变量中.并在此目录下新建一个目录叫logs,这样storm的log的日志就会输出到这个logs目录下面.log日志的配置是在logback下面的cluster.xml里进行配置.

6,修改conf/storm.yaml文件.修改内容如下:

storm.zookeeper.servers:
     - "nacey-master"
     - "nacey-node2"
nimbus.host: "nacey-master"
storm.local.dir: "/home/nacey/storm/data"
supervisor.slots.ports:
     - 6700
     - 6701
     - 6702
     - 6703
ui.port: 8081 

注意这里每个键后面必须空格后才能加键的值.不然启动的时候会提示文件加载错误.

其实这个配置文件中的配置会覆盖到defaults.yaml这个文件.默认的配置如下:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


########### These all have default values as shown
########### Additional configuration goes into storm.yaml

java.library.path: "/usr/local/lib:/opt/local/lib:/usr/lib"

### storm.* configs are general configurations
# the local dir is where jars are kept
storm.local.dir: "storm-local"
storm.zookeeper.servers:
    - "localhost"
storm.zookeeper.port: 2181
storm.zookeeper.root: "/storm"
storm.zookeeper.session.timeout: 20000
storm.zookeeper.connection.timeout: 15000
storm.zookeeper.retry.times: 5
storm.zookeeper.retry.interval: 1000
storm.zookeeper.retry.intervalceiling.millis: 30000
storm.zookeeper.auth.user: null
storm.zookeeper.auth.password: null
storm.cluster.mode: "distributed" # can be distributed or local
storm.local.mode.zmq: false
storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin"
storm.principal.tolocal: "backtype.storm.security.auth.DefaultPrincipalToLocal"
storm.group.mapping.service: "backtype.storm.security.auth.ShellBasedGroupsMapping"
storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.nimbus.retry.times: 5
storm.nimbus.retry.interval.millis: 2000
storm.nimbus.retry.intervalceiling.millis: 60000
storm.auth.simple-white-list.users: []
storm.auth.simple-acl.users: []
storm.auth.simple-acl.users.commands: []
storm.auth.simple-acl.admins: []
storm.meta.serialization.delegate: "backtype.storm.serialization.ThriftSerializationDelegate"

### nimbus.* configs are for the master
nimbus.host: "localhost"
nimbus.thrift.port: 6627
nimbus.thrift.threads: 64
nimbus.thrift.max_buffer_size: 1048576
nimbus.childopts: "-Xmx1024m"
nimbus.task.timeout.secs: 30
nimbus.supervisor.timeout.secs: 60
nimbus.monitor.freq.secs: 10
nimbus.cleanup.inbox.freq.secs: 600
nimbus.inbox.jar.expiration.secs: 3600
nimbus.task.launch.secs: 120
nimbus.reassign: true
nimbus.file.copy.expiration.secs: 600
nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator"
nimbus.credential.renewers.freq.secs: 600

### ui.* configs are for the master
ui.host: 0.0.0.0
ui.port: 8080
ui.childopts: "-Xmx768m"
ui.actions.enabled: true
ui.filter: null
ui.filter.params: null
ui.users: null
ui.header.buffer.bytes: 4096
ui.http.creds.plugin: backtype.storm.security.auth.DefaultHttpCredentialsPlugin

logviewer.port: 8000
logviewer.childopts: "-Xmx128m"
logviewer.cleanup.age.mins: 10080
logviewer.appender.name: "A1"

logs.users: null

drpc.port: 3772
drpc.worker.threads: 64
drpc.max_buffer_size: 1048576
drpc.queue.size: 128
drpc.invocations.port: 3773
drpc.invocations.threads: 64
drpc.request.timeout.secs: 600
drpc.childopts: "-Xmx768m"
drpc.http.port: 3774
drpc.https.port: -1
drpc.https.keystore.password: ""
drpc.https.keystore.type: "JKS"
drpc.http.creds.plugin: backtype.storm.security.auth.DefaultHttpCredentialsPlugin
drpc.authorizer.acl.filename: "drpc-auth-acl.yaml"
drpc.authorizer.acl.strict: false

transactional.zookeeper.root: "/transactional"
transactional.zookeeper.servers: null
transactional.zookeeper.port: null

### supervisor.* configs are for node supervisors
# Define the amount of workers that can be run on this machine. Each worker is assigned a port to use for communication
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703
supervisor.childopts: "-Xmx256m"
supervisor.run.worker.as.user: false
#how long supervisor will wait to ensure that a worker process is started
supervisor.worker.start.timeout.secs: 120
#how long between heartbeats until supervisor considers that worker dead and tries to restart it
supervisor.worker.timeout.secs: 30
#how many seconds to sleep for before shutting down threads on worker
supervisor.worker.shutdown.sleep.secs: 1
#how frequently the supervisor checks on the status of the processes it's monitoring and restarts if necessary
supervisor.monitor.frequency.secs: 3
#how frequently the supervisor heartbeats to the cluster state (for nimbus)
supervisor.heartbeat.frequency.secs: 5
supervisor.enable: true
supervisor.supervisors: []
supervisor.supervisors.commands: []


### worker.* configs are for task workers
worker.childopts: "-Xmx768m"
worker.gc.childopts: ""
worker.heartbeat.frequency.secs: 1

# control how many worker receiver threads we need per worker
topology.worker.receiver.thread.count: 1

task.heartbeat.frequency.secs: 3
task.refresh.poll.secs: 10
task.credentials.poll.secs: 30

zmq.threads: 1
zmq.linger.millis: 5000
zmq.hwm: 0


storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880 #5MB buffer
# Since nimbus.task.launch.secs and supervisor.worker.start.timeout.secs are 120, other workers should also wait at least that long before giving up on connecting to the other worker. The reconnection period need also be bigger than storm.zookeeper.session.timeout(default is 20s), so that we can abort the reconnection when the target worker is dead.
storm.messaging.netty.max_retries: 300
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100

# If the Netty messaging layer is busy(netty internal buffer not writable), the Netty client will try to batch message as more as possible up to the size of storm.messaging.netty.transfer.batch.size bytes, otherwise it will try to flush message as soon as possible to reduce latency.
storm.messaging.netty.transfer.batch.size: 262144
# Sets the backlog value to specify when the channel binds to a local address
storm.messaging.netty.socket.backlog: 500
# We check with this interval that whether the Netty channel is writable and try to write pending messages if it is.
storm.messaging.netty.flush.check.interval.ms: 10

# By default, the Netty SASL authentication is set to false.  Users can override and set it true for a specific topology.
storm.messaging.netty.authentication: false

# default number of seconds group mapping service will cache user group
storm.group.mapping.service.cache.duration.secs: 120

### topology.* configs are for specific executing storms
topology.enable.message.timeouts: true
topology.debug: false
topology.workers: 1
topology.acker.executors: null
topology.tasks: null
# maximum amount of time a message has to complete before it's considered failed
topology.message.timeout.secs: 30
topology.multilang.serializer: "backtype.storm.multilang.JsonSerializer"
topology.skip.missing.kryo.registrations: false
topology.max.task.parallelism: null
topology.max.spout.pending: null
topology.state.synchronization.timeout.secs: 60
topology.stats.sample.rate: 0.05
topology.builtin.metrics.bucket.size.secs: 60
topology.fall.back.on.java.serialization: true
topology.worker.childopts: null
topology.executor.receive.buffer.size: 1024 #batched
topology.executor.send.buffer.size: 1024 #individual messages
topology.receiver.buffer.size: 8 # setting it too high causes a lot of problems (heartbeat thread gets starved, throughput plummets)
topology.transfer.buffer.size: 1024 # batched
topology.tick.tuple.freq.secs: null
topology.worker.shared.thread.pool.size: 4
topology.disruptor.wait.strategy: "com.lmax.disruptor.BlockingWaitStrategy"
topology.spout.wait.strategy: "backtype.storm.spout.SleepSpoutWaitStrategy"
topology.sleep.spout.wait.strategy.time.ms: 1
topology.error.throttle.interval.secs: 10
topology.max.error.report.per.interval: 5
topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory"
topology.tuple.serializer: "backtype.storm.serialization.types.ListDelegateSerializer"
topology.trident.batch.emit.interval.millis: 500
topology.testing.always.try.serialize: false
topology.classpath: null
topology.environment: null
topology.bolts.outgoing.overflow.buffer.enable: false

dev.zookeeper.path: "/tmp/dev-storm-zookeeper"

备注:如果你的机器的网络地址存在ipv6的地址,storm启动的时候默认是启用ipv6的地址,但实际上storm是不能使用ipv6的地址,故需在启动脚本中(storm)增加-Djava.net.preferIPv4Stack=true.如果使用了ipv6有可能进程启动是正常的,但是在访问stormui的时候,页面会提示如下错误:

Storm UI getting Internal Server Error :org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused

按照如上的修改后.即可解决如上错误.

7,启动.

  storm nimbus &

  storm supervisor &

  storm ui &

8,测试 ,通过浏览器访问http://{nimbus host}:8080.能够正常访问stormui.且响应了


Storm UI

Cluster Summary

Version Nimbus uptime Supervisors Used slots Free slots Total slots Executors Tasks
0.9.3 1m 45s 1 0 4 4 0 0

Topology summary

Supervisor summary

Id
Host
Uptime
Slots
Used slots
cbe9e1bd-5e43-4749-b187-c9a2c89081ba nacey-master 1m 18s 4 0

Nimbus Configuration


Key
Value
dev.zookeeper.path /tmp/dev-storm-zookeeper
drpc.childopts -Xmx768m
drpc.invocations.port 3773
drpc.port 3772
drpc.queue.size 128
drpc.request.timeout.secs 600
drpc.worker.threads 64
java.library.path /usr/local/lib:/opt/local/lib:/usr/lib
logviewer.appender.name A1
logviewer.childopts -Xmx128m
logviewer.port 8000
nimbus.childopts -Xmx1024m
nimbus.cleanup.inbox.freq.secs

然后可将此storm打包发布到集群中的其他机器即可.

在其他机器上执行storm supervisor即可.



你可能感兴趣的:(storm)