storm运行异常之No output fields defined for component:stream XxxBolt:null疑案追踪

前言

上一篇写了 storm运行异常之No output fields defined for component:stream XxxBolt:null 发现是多线程导致的,但是也有可能是其他原因,今天就来追踪一下。


反查蛛丝马迹

错误log:

Caused by: java.lang.IllegalArgumentException: No output fields defined for component:stream XxxBolt:null
        at backtype.storm.task.GeneralTopologyContext.getComponentOutputFields(GeneralTopologyContext.java:113) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.tuple.TupleImpl.<init>(TupleImpl.java:53) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.serialization.KryoTupleDeserializer.deserialize(KryoTupleDeserializer.java:54) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.daemon.executor$mk_task_receiver$fn__4244.invoke(executor.clj:397) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.disruptor$clojure_handler$reify__1668.onEvent(disruptor.clj:59) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:124) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        ... 6 common frames omitted

从log上看是GeneralTopologyContext的方法抛出,我们来看一下

/**
     * Gets the declared output fields for the specified component/stream.
     */
    public Fields getComponentOutputFields(String componentId, String streamId) {
        Fields ret = _componentToStreamToFields.get(componentId).get(streamId);
        if(ret==null) {
            throw new IllegalArgumentException("No output fields defined for component:stream " + componentId + ":" + streamId);
        }
        return ret;
    }

根据log打印信息可知,是  streamId 为null。一般来说bolt往下emit时,可以指定streamId,如果不指定的话,storm会给定一个默认的default streamId,所以这里streamId为null就是一个奇怪的异常。

继续观察错误stack,发现是executor.clj 的 mk_task_receiver 调用出错。来看看这个方法:

(defn mk-task-receiver [executor-data tuple-action-fn]
  (let [^KryoTupleDeserializer deserializer (:deserializer executor-data)
        task-ids (:task-ids executor-data)
        debug? (= true (-> executor-data :storm-conf (get TOPOLOGY-DEBUG)))
        ]
    (disruptor/clojure-handler
      (fn [tuple-batch sequence-id end-of-batch?]
        (fast-list-iter [[task-id msg] tuple-batch]
          (let [^TupleImpl tuple (if (instance? Tuple msg) msg (.deserialize deserializer msg))]
            (when debug? (log-message "Processing received message " tuple))
            (if task-id
              (tuple-action-fn task-id tuple)
              ;; null task ids are broadcast tuples
              (fast-list-iter [task-id task-ids]
                (tuple-action-fn task-id tuple)
                ))
            ))))))

根据错误stack的行号指示是在 let [^ TupleImpl tuple (if instance? Tuple msg ......)]这行报错。

这里是对Tuple发序列化过程,实例一个TupleImpl,会调用其构造函数:

public TupleImpl(GeneralTopologyContext context, List<Object> values, int taskId, String streamId, MessageId id) {
        this.values = values;
        this.taskId = taskId;
        this.streamId = streamId;
        this.id = id;
        this.context = context;
        
        String componentId = context.getComponentId(taskId);
        Fields schema = context.getComponentOutputFields(componentId, streamId);
        if(values.size()!=schema.size()) {
            throw new IllegalArgumentException(
                    "Tuple created with wrong number of fields. " +
                    "Expected " + schema.size() + " fields but got " +
                    values.size() + " fields");
        }
    }

这里会调用GeneralTopologyContext的getComponentOutputFields方法,传进去的streamId为null


那么这个StreamId是从什么时候传进来的呐??


线索

storm是像spark一样,使用DAG引擎的,关于DAG引擎的优缺点,请看 DAG (directed acyclic graph) 作为大数据执行引擎的优点

DAG就是一个有向图,在createTopology时就创建好了,具体请看

1、我们一般用TopologyBuilder来构建topology,每次setBolt时,都会把指定group方式,grouping里面就保留当前bolt接收上游bolt的streamId

private BoltDeclarer grouping(String componentId, String streamId, Grouping grouping) {
            _commons.get(_boltId).put_to_inputs(new GlobalStreamId(componentId, streamId), grouping);
            return this;
        }

2、调用createTopology()方法, 把所有DAG信息保留下来,运行

public StormTopology createTopology() {
        Map<String, Bolt> boltSpecs = new HashMap<String, Bolt>();
        Map<String, SpoutSpec> spoutSpecs = new HashMap<String, SpoutSpec>();
        for(String boltId: _bolts.keySet()) {
            IRichBolt bolt = _bolts.get(boltId);
            ComponentCommon common = getComponentCommon(boltId, bolt);
            boltSpecs.put(boltId, new Bolt(ComponentObject.serialized_java(Utils.serialize(bolt)), common));
        }
        for(String spoutId: _spouts.keySet()) {
            IRichSpout spout = _spouts.get(spoutId);
            ComponentCommon common = getComponentCommon(spoutId, spout);
            spoutSpecs.put(spoutId, new SpoutSpec(ComponentObject.serialized_java(Utils.serialize(spout)), common));
            
        }
        return new StormTopology(spoutSpecs,
                                 boltSpecs,
                                 new HashMap<String, StateSpoutSpec>());
    }

3、在Topology执行过程中,根据bolt并行度来创建bolt thread,

(defn mk-executor [worker executor-id]
  (let [executor-data (mk-executor-data worker executor-id);; mk-executor-data
        _ (log-message "Loading executor " (:component-id executor-data) ":" (pr-str executor-id))
        task-datas (->> executor-data
                        :task-ids
                        (map (fn [t] [t (task/mk-task executor-data t)]))
                        (into {})
                        (HashMap.))
        _ (log-message "Loaded executor tasks " (:component-id executor-data) ":" (pr-str executor-id))
        report-error-and-die (:report-error-and-die executor-data)
        component-id (:component-id executor-data)

        ;; starting the batch-transfer->worker ensures that anything publishing to that queue 
        ;; doesn't block (because it's a single threaded queue and the caching/consumer started
        ;; trick isn't thread-safe)
        system-threads [(start-batch-transfer->worker-handler! worker executor-data)]
        handlers (with-error-reaction report-error-and-die
                   (mk-threads executor-data task-datas)) ;;这里会调用mk-threads:spout和mk-thread:bolt来创建thread
        threads (concat handlers system-threads)]    
    (setup-ticks! worker executor-data)

在创建mk-thread之前需要准备数据,方法开始调用:

let [executor-data (mk-executor-data worker executor-id);; mk-executor-data

在mk-executor-data方法里有调用mk-grouper的方法的方法,在下面代码的第37行


(defn mk-executor-data [worker executor-id]
  (let [worker-context (worker-context worker)
        task-ids (executor-id->tasks executor-id)
        component-id (.getComponentId worker-context (first task-ids))
        storm-conf (normalized-component-conf (:storm-conf worker) worker-context component-id)
        executor-type (executor-type worker-context component-id)
        batch-transfer->worker (disruptor/disruptor-queue
                                  (str "executor"  executor-id "-send-queue")
                                  (storm-conf TOPOLOGY-EXECUTOR-SEND-BUFFER-SIZE)
                                  :claim-strategy :single-threaded
                                  :wait-strategy (storm-conf TOPOLOGY-DISRUPTOR-WAIT-STRATEGY))
        ]
    (recursive-map
     :worker worker
     :worker-context worker-context
     :executor-id executor-id
     :task-ids task-ids
     :component-id component-id
     :open-or-prepare-was-called? (atom false)
     :storm-conf storm-conf
     :receive-queue ((:executor-receive-queue-map worker) executor-id)
     :storm-id (:storm-id worker)
     :conf (:conf worker)
     :shared-executor-data (HashMap.)
     :storm-active-atom (:storm-active-atom worker)
     :batch-transfer-queue batch-transfer->worker
     :transfer-fn (mk-executor-transfer-fn batch-transfer->worker)
     :suicide-fn (:suicide-fn worker)
     :storm-cluster-state (cluster/mk-storm-cluster-state (:cluster-state worker))
     :type executor-type
     ;; TODO: should refactor this to be part of the executor specific map (spout or bolt with :common field)
     :stats (mk-executor-stats <> (sampling-rate storm-conf))
     :interval->task->metric-registry (HashMap.)
     :task->component (:task->component worker)
     ;; outbound-components方法里outbound-groupings的会调用mk-grouper方法
     ;; mk-grouper method doc => Returns a function that returns a vector of which task indices to send tuple to, or just a single task index.
     :stream->component->grouper (outbound-components worker-context component-id)
     :report-error (throttled-report-error-fn <>)
     :report-error-and-die (fn [error]
                             ((:report-error <>) error)
                             ((:suicide-fn <>)))
     :deserializer (KryoTupleDeserializer. storm-conf worker-context)
     :sampler (mk-stats-sampler storm-conf)
     ;; TODO: add in the executor-specific stuff in a :specific... or make a spout-data, bolt-data function?
     )))



讲到这里读者可能觉得扯的有点远了,没关系,再拉回来

回来再看错误堆栈信息

java.lang.RuntimeException: java.lang.IllegalArgumentException: No output fields defined for component:stream XxxBolt:null
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:127) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:96) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:81) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.daemon.executor$fn__4321$fn__4333$fn__4380.invoke(executor.clj:747) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at backtype.storm.util$async_loop$fn__457.invoke(util.clj:457) ~[storm-core-0.9.3-rc1.jar:0.9.3-rc1]
        at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
        at java.lang.Thread.run(Thread.java:662) [na:1.6.0_45]

这里抛出runtimeexception的地方是:

private void consumeBatchToCursor(long cursor, EventHandler<Object> handler) {
        for(long curr = _consumer.get() + 1; curr <= cursor; curr++) {
            try {
                MutableObject mo = _buffer.get(curr);
                Object o = mo.o;
                mo.setObject(null);
                if(o==FLUSH_CACHE) {
                    Object c = null;
                    while(true) {                        
                        c = _cache.poll();
                        if(c==null) break;
                        else handler.onEvent(c, curr, true);
                    }
                } else if(o==INTERRUPT) {
                    throw new InterruptedException("Disruptor processing interrupted");
                } else {
                    handler.onEvent(o, curr, curr == cursor);
                }
            } catch (Exception e) {
                // 这里抛出的,引起这个异常的地方是上面handler.onEvent()方法
                throw new RuntimeException(e);
            }
        }
        //TODO: only set this if the consumer cursor has changed?
        _consumer.set(cursor);
    }

在看看EventHandler是从哪里传来的呐?其实是在mk-threads:bolt时创建的event-handler,在最后面

(defmethod mk-threads :bolt [executor-data task-datas]
  (let [execute-sampler (mk-stats-sampler (:storm-conf executor-data))
        executor-stats (:stats executor-data)
        {:keys [storm-conf component-id worker-context transfer-fn report-error sampler
                open-or-prepare-was-called?]} executor-data
        rand (Random. (Utils/secureRandomLong))
        tuple-action-fn (fn [task-id ^TupleImpl tuple]
                          ;; synchronization needs to be done with a key provided by this bolt, otherwise:
                          ;; spout 1 sends synchronization (s1), dies, same spout restarts somewhere else, sends synchronization (s2) and incremental update. s2 and update finish before s1 -> lose the incremental update
                          ;; TODO: for state sync, need to first send sync messages in a loop and receive tuples until synchronization
                          ;; buffer other tuples until fully synchronized, then process all of those tuples
                          ;; then go into normal loop
                          ;; spill to disk?
                          ;; could be receiving incremental updates while waiting for sync or even a partial sync because of another failed task
                          ;; should remember sync requests and include a random sync id in the request. drop anything not related to active sync requests
                          ;; or just timeout the sync messages that are coming in until full sync is hit from that task
                          ;; need to drop incremental updates from tasks where waiting for sync. otherwise, buffer the incremental updates
                          ;; TODO: for state sync, need to check if tuple comes from state spout. if so, update state
                          ;; TODO: how to handle incremental updates as well as synchronizations at same time
                          ;; TODO: need to version tuples somehow
                          
                          ;;(log-debug "Received tuple " tuple " at task " task-id)
                          ;; need to do it this way to avoid reflection
                          (let [stream-id (.getSourceStreamId tuple)]
                            (condp = stream-id
                              Constants/METRICS_TICK_STREAM_ID (metrics-tick executor-data (get task-datas task-id) tuple)
                              (let [task-data (get task-datas task-id)
                                    ^IBolt bolt-obj (:object task-data)
                                    user-context (:user-context task-data)
                                    sampler? (sampler)
                                    execute-sampler? (execute-sampler)
                                    now (if (or sampler? execute-sampler?) (System/currentTimeMillis))]
                                (when sampler?
                                  (.setProcessSampleStartTime tuple now))
                                (when execute-sampler?
                                  (.setExecuteSampleStartTime tuple now))
                                (.execute bolt-obj tuple)
                                (let [delta (tuple-execute-time-delta! tuple)]
                                  (task/apply-hooks user-context .boltExecute (BoltExecuteInfo. tuple task-id delta))
                                  (when delta
                                    (builtin-metrics/bolt-execute-tuple! (:builtin-metrics task-data)
                                                                         executor-stats
                                                                         (.getSourceComponent tuple)                                                      
                                                                         (.getSourceStreamId tuple)
                                                                         delta)
                                    (stats/bolt-execute-tuple! executor-stats
                                                               (.getSourceComponent tuple)
                                                               (.getSourceStreamId tuple)
                                                               delta)))))))]
    
    ;; TODO: can get any SubscribedState objects out of the context now

    [(async-loop
      (fn []
        ;; If topology was started in inactive state, don't call prepare bolt until it's activated first.
        (while (not @(:storm-active-atom executor-data))          
          (Thread/sleep 100))
        
        (log-message "Preparing bolt " component-id ":" (keys task-datas))
        (doseq [[task-id task-data] task-datas
                :let [^IBolt bolt-obj (:object task-data)
                      tasks-fn (:tasks-fn task-data)
                      user-context (:user-context task-data)
                      bolt-emit (fn [stream anchors values task]
                                  (let [out-tasks (if task
                                                    (tasks-fn task stream values)
                                                    (tasks-fn stream values))]
                                    (fast-list-iter [t out-tasks]
                                                    (let [anchors-to-ids (HashMap.)]
                                                      (fast-list-iter [^TupleImpl a anchors]
                                                                      (let [root-ids (-> a .getMessageId .getAnchorsToIds .keySet)]
                                                                        (when (pos? (count root-ids))
                                                                          (let [edge-id (MessageId/generateId rand)]
                                                                            (.updateAckVal a edge-id)
                                                                            (fast-list-iter [root-id root-ids]
                                                                                            (put-xor! anchors-to-ids root-id edge-id))
                                                                            ))))
                                                      (transfer-fn t
                                                                   (TupleImpl. worker-context
                                                                               values
                                                                               task-id
                                                                               stream
                                                                               (MessageId/makeId anchors-to-ids)))))
                                    (or out-tasks [])))]]
          (builtin-metrics/register-all (:builtin-metrics task-data) storm-conf user-context)
          (if (= component-id Constants/SYSTEM_COMPONENT_ID)
            (builtin-metrics/register-queue-metrics {:sendqueue (:batch-transfer-queue executor-data)
                                                     :receive (:receive-queue executor-data)
                                                     :transfer (:transfer-queue (:worker executor-data))}
                                                    storm-conf user-context)
            (builtin-metrics/register-queue-metrics {:sendqueue (:batch-transfer-queue executor-data)
                                                     :receive (:receive-queue executor-data)}
                                                    storm-conf user-context)
            )

          (.prepare bolt-obj
                    storm-conf
                    user-context
                    (OutputCollector.
                     (reify IOutputCollector
                       (emit [this stream anchors values]
                         (bolt-emit stream anchors values nil))
                       (emitDirect [this task stream anchors values]
                         (bolt-emit stream anchors values task))
                       (^void ack [this ^Tuple tuple]
                         (let [^TupleImpl tuple tuple
                               ack-val (.getAckVal tuple)]
                           (fast-map-iter [[root id] (.. tuple getMessageId getAnchorsToIds)]
                                          (task/send-unanchored task-data
                                                                ACKER-ACK-STREAM-ID
                                                                [root (bit-xor id ack-val)])
                                          ))
                         (let [delta (tuple-time-delta! tuple)]
                           (task/apply-hooks user-context .boltAck (BoltAckInfo. tuple task-id delta))
                           (when delta
                             (builtin-metrics/bolt-acked-tuple! (:builtin-metrics task-data)
                                                                executor-stats
                                                                (.getSourceComponent tuple)                                                      
                                                                (.getSourceStreamId tuple)
                                                                delta)
                             (stats/bolt-acked-tuple! executor-stats
                                                      (.getSourceComponent tuple)
                                                      (.getSourceStreamId tuple)
                                                      delta))))
                       (^void fail [this ^Tuple tuple]
                         (fast-list-iter [root (.. tuple getMessageId getAnchors)]
                                         (task/send-unanchored task-data
                                                               ACKER-FAIL-STREAM-ID
                                                               [root]))
                         (let [delta (tuple-time-delta! tuple)]
                           (task/apply-hooks user-context .boltFail (BoltFailInfo. tuple task-id delta))
                           (when delta
                             (builtin-metrics/bolt-failed-tuple! (:builtin-metrics task-data)
                                                                 executor-stats
                                                                 (.getSourceComponent tuple)                                                      
                                                                 (.getSourceStreamId tuple))
                             (stats/bolt-failed-tuple! executor-stats
                                                       (.getSourceComponent tuple)
                                                       (.getSourceStreamId tuple)
                                                       delta))))
                       (reportError [this error]
                         (report-error error)
                         )))))
        (reset! open-or-prepare-was-called? true)        
        (log-message "Prepared bolt " component-id ":" (keys task-datas))
        (setup-metrics! executor-data)

        (let [receive-queue (:receive-queue executor-data)
              ;; 这里创建event-handler供disruptorQueue来调用
              event-handler (mk-task-receiver executor-data tuple-action-fn)]
          (disruptor/consumer-started! receive-queue)
          (fn []            
            (disruptor/consume-batch-when-available receive-queue event-handler);;这里开始使用event-handler
            0)))
      :kill-fn (:report-error-and-die executor-data)
      :factory? true
      :thread-name component-id)]))

代码有点多,见谅

====================================


其实追到这里我还是没有找出为什么会报这个异常的原因,哪位大牛如果知道,请留言,或e-mail([email protected])告知,I will appreciate that



你可能感兴趣的:(storm)