工作流挖掘:相关问题和方法的研究(7)

5. 哪类工作流过程可以再发现?--一种基于Petri网理论的方法

我们要比较详细讨论的第一种方法使用一种名为工作流网络(WF-nets)的特殊类型的Petri网作为理论基础[1,4]。参考文献[3,8]报告了一些(这方面的)成果并提出了2种工具以支持该方法:EMiT[3]MiMo[8]。值得一提的是,Little Thumb(参见第6部分)也支持这一方法,而且还能处理噪音(干扰)。

在这一偏重理论的方法中,诸如噪音等问题并不是我们所关心的重点。我们假设工作流日志中没有噪音,并且包含了“充分的”信息。在这些理想情况下,我们研究再发现工作流过程的可能性,即:对于那类工作流模型我们可以单纯的通过其日志准确的构造出它们的模型。这可并不简单。作为例子,我们来看一下图3所示的过程模型。表2所示其对应的工作流日志中并没有与“AND-split”和“AND-join”相关的任何信息。然而对于准确描述过程,这是必要的。

我们通过图6来展示再发现问题。假设我们得到了在工作流网络WF1描述的过程执行了若干步后的日志,并在此基础上,我们使用一种挖掘算法构造了工作流网络WF2。一个有趣的问题是:WF1是否等同于WF2。本文中,我们仅观察那类满足这一条件的工作流网络。

正如参考文献[8]中所述,再发现这类工作流网络是可能的。然而,参考文献[38]中描述的α算法能够成功的再发现大多数实践相关的工作流网络。为此,我们假设日志在如下情况下是完整的:如果两个事件彼此紧跟,他们在日志中至少会挨着一次。注意这一标准并不要求所有执行序列都出现。

从日志中可以得到作为α算法基础的4种顺序关系:>W,àW, #W , and ||W。设W为记录了一组人物T的工作流日志,即W P (T*)。(简单的说,工作流日志就是一个集合或记录集,每个案例一条,我们可以从事件、数据等信息中抽象出来)。假设abT:(1a>Wb,当且仅当存在一条记录σ= t1t2t3 … tn-1i{1,…,n-2},则σ∈Wti=a,ti+1=b,2aà Wb,当且仅当a> WbbWa,(3a# Wb,当且仅当aWbbWa,(4a|| Wb,当前仅当a> Wbb> Wa.如果至少有1个案例a紧跟在b后。这并不是暗示AB之间有什么因果关系,因为AB有可能是平行进行的。àw代表因果关系,||w#w则用于区分平行和选择关系。由于所有的关系都可以从>w得到,我们假设日志相对于>w是完整的(即:如果一项任务紧随另一项任务之后,日志将会记录这种必然性的行为)。

认识到在Petri网理论里那种传统的限制在应用到工作流挖掘时也同样存在,那是件很有趣的事情。例如,算法在处理无限选择结构时就存在问题。为人所熟知的,许多问题对于通常的Petri网是不确定的,而对于自由选择网络却是相反的情况。这一认识也指出了工作流挖掘的局限性。另一个有趣的观察是,对于给定的工作流日志存在多个典型的可以匹配的工作流网络。这并不稀奇,在语句构造上不同的两个工作流网络可能会有相同的行为。算法会构造出产生指定行为的“最简单的”工作流网络。以表2所示的工作流日志为例,算法a构造了一个较小的工作流网络(即:比图3所示的工作流网络要小),该网络没有明确的代表“AND-split”和“AND-join”的转变,他们也不会在日志中出现。最终的网络如图7所示。注意图7所示的工作流网络的行为与图3所示的从“AND-split”和“AND-join”等效追溯和抽象出来的网络的行为是等价的。

算法a的一个局限在与无法检测到那些具有相同名称的循环和多重任务。看起来,与循环相关的问题是可以解决的。进一步的,算法a也能够挖掘时间相关的工作流日志,计算各种执行路径。挖掘工具EMiTEnhanced Mining Tool [3])支持所有这些功能。EMiT能够解析这种XML格式,并提供从StaffwareInConcert到这种格式的转译。进一步的,EMiT完全支持图5所示的任务执行模型。EMiT的输出是一个包含各种执行路径的图形化的过程模型。图8EMiT分析一份Staffware日志时的画面。

5. Which class of workflow processes can be rediscovered?––An approach based on Petri net theory

The first approach we would like to discuss in more detail uses a specific class of Petri nets, named workflow nets (WF-nets), as a theoretical basis [1,4]. Some of the results have been reported in [3,8] and there are two tools to support this approach: EMiT [3] and MiMo [8]. Note that the tool Little Thumb (see Section 6) also support this approach but in addition is able to deal with noise.

In this more theoretical approach, we do not focus on issues such as noise. We assume that there is no noise and that the workflow log contains “sufficient” information. Under these ideal circumstances we investigate whether it is possible to rediscover the workflow process, i.e., for which class of workflow models is it possible to accurately construct the model by merely looking at their logs. This is not as simple as it seems. Consider for example the process model shown in Fig. 3. The corresponding workflow log shown in Table 2 does not show any information about the AND-split and the AND-join. Nevertheless, they are needed to accurately describe the process.

To illustrate the rediscovery problem we use Fig. 6. Suppose we have a log based on many executions of the process described by a WF-net WF1. Based on this workflow log and using a mining algorithm we construct a WF-net WF2. An interesting question is whether WF1 = WF2. In this paper, we explore the class of WF-nets for which WF1 = WF2.

As shown in [8] it is impossible to rediscover the class of all WF-nets. However, the α algorithm described in [3,8] can successfully rediscover a large class of practically relevant WF-nets. For this result, we assume logs to be complete in the sense that if two events can follow each other, they will follow each other at least once in the log. Note that this local criterion does not require the presence of all possible execution sequences.

Theαalgorithm is based on four ordering relations which can be derived from the log: >W,àW, #W , and ||W . Let W be a workflow log over a set of tasks T , i.e., W P (T*). (The workflow log is simply a set or traces, one for each case, and we abstract from time, data, etc.) Let a,bT :(1) a >W b if and only if there is a trace σ= t1t2t3 … tn-1 and i {1,…,n-2} such thatσW and ti = a and ti+1 = b, (2) a àW b if and only if a >W b and b W a, (3) a #W b if and only if a W b and b W a, and (4) a||W b if and only if a >W b and b >Wa. a >Wb if for at least one case a is directly followed by b. This does not imply that there is a causal relation between a and b, because a and b can be in parallel. Relation àW suggests causality and relations ||W and #W are used to differentiate between parallelism and choice. Since all relations can be derived from >W, we assume the log to be complete with respect to >W (i.e., if one task can follow another task directly, then the log should have registered this potential behavior).

It is interesting to observe that classical limits in Petri-net theory also apply in the case of workflow mining. For example, the a algorithm has problems dealing with non-free-choice constructs[18]. It is well-known that many problems that are undecidable for general Petri nets are decidable for free-choice nets. This knowledge has been used to indicate the limits of workflow mining. Another interesting observation is that there are typically multiple WF-nets that match with a given workflow log. This is not surprising because two syntactically different WF-nets may have the same behavior. The algorithm will construct the “simplest” WF-net generating the desired behavior. Consider for example the log shown in Table 2. Theαalgorithm will construct a smaller WF-net (i.e., smaller than the WF-net shown in Fig.3) without explicitly representing the ANDsplit and AND-join transitions as they are not visible in the log. The resulting net is shown in Fig.7. Note that the behavior of the WF-net shown in Fig.7 is equivalent to the behavior of the WF-net shown in Fig.3 using trace equivalence and abstracting from the AND-split and AND-join.

A limitation of the α algorithm is that certain kinds of loops and multiple tasks having the same name cannot be detected. It seems that the problems related to loops can be resolved. Moreover, the α algorithm can also mine timed workflow logs and calculate all kinds of performance metrics. All of this is supported by the mining tool EMiT (Enhanced Mining Tool [3]). EMiT can read the XML format and provides translators from Staffware and InConcert to this format. Moreover, EMiT fully supports the transactional task model shown in Fig.5. The output of EMiT is a graphical process model including all kinds of performance metrics. Fig. 8 shows a screenshot of EMiT while analyzing a Staffware log.

 

你可能感兴趣的:(工作流挖掘:相关问题和方法的研究(7))