实时流式计算框架Storm 0.9.0发布通知(中文版)

Storm0.9.0发布通知中文翻译版(2013/12/10 by 富士通邵贤军 有错误一定告诉我 [email protected]^_^

  我们很高兴宣布Storm 0.9.0已经成功发布,你可以从the downloads page下载. 本次发布对茁壮成长的Storm来说是一次巨大的进步。

  我们追加了一些新特性,你会在下面看到详细的介绍, 此外这次发布的另一个着重点是修复了大量跟稳定性相关的 bug. 虽然很多用户已经在自己的环境中把0.9.x版本的Storm成功运行起来,但我们不保证那些版本的稳定性。0.9.0是目前最稳定的版本,我们强烈推荐各位使用,特别是0.8.x的用户们。


  第一个重大的特点是新的传输层。我们引入了使用纯Java语言编写的Netty作为我们的传输层,这个工作是由好基友Yahoo! Engineering完成的。关于Storm的核心消息传输层能以插拔形式更换这一点,我想大家都知道了,只可惜原来只有ZeroMQ,而现在Storm提供了两种消息传输层实现,分别是原来的ZeroMQ和新的Netty。







2.高性能,Netty的性能要比ZeroMQ快两倍左右,这里有一篇文章this blog post 专门比较了ZeroMQ和Netty的性能(待翻译)。

3. 安全性认证,使得我们将来要做的 worker 进程之间的认证授权机制成为可能。


storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880
storm.messaging.netty.max_retries: 100
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100



  新版本的Storm增加了一个很给力的特性用来调试和监视topology——logviewer进程。在早期的版本里,查看Worker节点的日志决定于Worker节点的位置(host/port),典型的是通过Storm UI,然后用ssh连接那个主机查看该主机上worker的日志文件。在最新的日志查看机制里,现在可以很容易的去访问一个指定worker节点的日志,你只需要在浏览器中的StormUI里点击worker的port就可以了。


    $ storm logviewer





特性5:API 兼容性和升级



  另一个非常大的变化是对日志的改变,Storm里面大量使用slf4j 的API,而有些Storm的依赖库或Storm的使用者则依赖于log4j的API。所以现在Storm改为依赖于log4j-over-slf4j,它可以在log4j与slf4j之间架起一座桥梁。这些改变会涉及到已经使用log4jAPI的拓扑和拓扑组件。总之,如果可以的话,还是尽可能的使用slf4j的API来做日志记录吧!




  • Update build configuration to force compatibility with Java 1.6
  • Fixed a netty client issue where sleep times for reconnection could be negative (thanks brndnmtthws)
  • Fixed an issue that would cause storm-netty unit tests to fail
  • Added configuration to limit ShellBolt internal _pendingWrites queue length (thanks xiaokang)
  • Fixed a a netty client issue where sleep times for reconnection could be negative (thanks brndnmtthws)
  • Fixed a display issue with system stats in Storm UI (thanks d2r)
  • Nimbus now does worker heartbeat timeout checks as soon as heartbeats are updated (thanks d2r)
  • The logviewer now determines log file location by examining the logback configuration (thanks strongh)
  • Allow tick tuples to work with the system bolt (thanks xumingming)
  • Add default configuration values for the netty transport and the ability to configure the number of worker threads (thanks revans2)
  • Added timeout to unit tests to prevent a situation where tests would hang indefinitely (thanks d2r)
  • Fixed an issue in the system bolt where local mode would not be detected accurately (thanks miofthena)
  • Fixed storm jar command to work properly when STORM_JAR_JVM_OPTS is not specified (thanks roadkill001)
  • All logging now done with slf4j
  • Replaced log4j logging system with logback
  • Logs are now limited to 1GB per worker (configurable via logging configuration file)
  • Build upgraded to leiningen 2.0
  • Revamped Trident spout interfaces to support more dynamic spouts, such as a spout who reads from a changing set of brokers
  • How tuples are serialized is now pluggable (thanks anfeng)
  • Added blowfish encryption based tuple serialization (thanks anfeng)
  • Have storm fall back to installed storm.yaml (thanks revans2)
  • Improve error message when Storm detects bundled storm.yaml to show the URL's for offending resources (thanks revans2)
  • Nimbus throws NotAliveException instead of FileNotFoundException from various query methods when topology is no longer alive (thanks revans2)
  • Escape HTML and Javascript appropriately in Storm UI (thanks d2r)
  • Storm's Zookeeper client now uses bounded exponential backoff strategy on failures
  • Automatically drain and log error stream of multilang subprocesses
  • Append component name to thread name of running executors so that logs are easier to read
  • Messaging system used for passing messages between workers is now pluggable (thanks anfeng)
  • Netty implementation of messaging (thanks anfeng)
  • Include topology id, worker port, and worker id in properties for worker processes, useful for logging (thanks d2r)
  • Tick tuples can now be scheduled using floating point seconds (thanks tscurtu)
  • Added log viewer daemon and links from UI to logviewers (thanks xiaokang)
  • DRPC server childopts now configurable (thanks strongh)
  • Default number of ackers to number of workers, instead of just one (thanks lyogavin)
  • Validate that Storm configs are of proper types/format/structure (thanks d2r)
  • FixedBatchSpout will now replay batches appropriately on batch failure (thanks ptgoetz)
  • Can set JAR_JVM_OPTS env variable to add jvm options when calling 'storm jar' (thanks srmelody)
  • Throw error if batch id for transaction is behind the batch id in the opaque value (thanks mrflip)
  • Sort topologies by name in UI (thanks jaked)
  • Added LoggingMetricsConsumer to log all metrics to a file, by default not enabled (thanks mrflip)
  • Add prepare(Map conf) method to TopologyValidator (thanks ankitoshniwal)
  • Bug fix: Supervisor provides full path to workers to logging config rather than relative path (thanks revans2)
  • Bug fix: Call ReducerAggregator#init properly when used within persistentAggregate (thanks lorcan)
  • Bug fix: Set component-specific configs correctly for Trident spouts


分类:  [151_Storm学习]
