Research Lab - Process Mining Techniques

Our software is directly driven by our our own research and informed by the latest developments in the process mining academic community, and more broadly in the business process management academic community.

Below you can find some of the results of our Research Lab. These are experimental tools developed by the Business Process Management Team at The University of Melbourne in collaboration with the Software Engineering & Information Systems Group at the University of Tartu. Many of these tools get later refined and improved to become experimental plugins on top of Apromore Community Edition. From these, we then carefully select those to be productized into commercial-strength plugins for Apromore Enterprise Edition.

Automatable Routines Discoverer

(by A. Bosco, A. Augusto, M. Dumas, M. La Rosa, and G. Fortino)
This tool allows one to analyze user interaction (UI) logs in order to discover sequences of actions (i.e. routines) that are fully deterministic and can thus be automated using such tools. The tool losslessly compresses the user interaction log into a Deterministic Acyclic Finite State Automaton (DAFSA). It then applies an algorithm to decompose biconnected graphs (of which a DAFSA is an exemplar) into Single-Entry Single-Exit (SESE) regions. Some of these SESE regions correspond to sequences of actions. For each such sequence, the tool checks that each action is deterministic. If each action of the sequence is deterministic, the method tries to discover an activation condition for each sequence of deterministic actions using a rule mining technique. Instead, if an action in the middle of the sequence is not deterministic, the sequence is split into subsequences (called subroutines) for which the tool tries to discover activation conditions separately. For each (sub)sequence for which a rule is found, an activation condition is defined, and a routine specification is generated. The tool outputs the list of routine specifications.

Download Automatable Routines Discoverer

BPMN Miner 2.0

(by R. Conforti, A. Augusto, M. Dumas, L. Garcia-Baneulos and M. La Rosa)
BPMN Miner is a tool for the automated discovery of maximally-structured, hierarchical BPMN models containing subprocesses, interrupting and non-interrupting boundary events and activity markers. The tool works on top of a range of flat process discovery algorithms: Heuristics Miner, InductiveMiner, Fodina, ILP Miner and the Alpha algorithm. It employs functional and inclusion dependency discovery techniques in order to elicit a process-subprocess hierarchy from the event log. It requires as input a log in the XES or MXML format and produces a standard BPMN 2.0 model (.bpmn) as output. The tool will identify inclusion dependencies from the log, and ask the user to validate these dependencies before proceeding with the mining of the BPMN model. The identification of the inclusion dependencies in noise-tolerant. Moreover, the tool integrates Structured Miner, meaning that it returns a maximally structured BPMN 2.0 model by combining BPStruct and Extended Oulsnam Structurer (both used with default settings). BPMN Miner has been integrated as a plugin into Apromore.

Download BPMN Miner 2.0

BProVe

(by F. Corradini, F. Fornari, A. Polini, B. Re, F. Tiezzi, A. Vandin, M. La Rosa)
BProVe is a tool supporting the automated verification of BPMN collaboration models. The analysis is based on a formal operational semantics defined for the BPMN 2.0 modeling language and is provided as a freely accessible service that uses open standard formats as input data. In particular, BProVe permits the analysis of correctness of models with respect to domain-independent properties, such as soundness and safeness, as well as domain-dependent properties, e.g. checking the correct exchange of messages or the proper evolution of process activities. BProVe provides diagnostic information that can be easily reported on the diagram in a way that is understandable by process stakeholders. This is especially useful when different parties, with different backgrounds, need to quickly interact on the basis of a model. From a technical point of view, BProVe is based on a running instance of MAUDE loaded with the MAUDE modules implementing the BPMN Operational Semantics and the LTL MAUDE model checker. BProVe has also been integrated as a plugin into Apromore. This way it is possible to check BPMN model correctness from within the Apromore Editor.

Download BProVe

Business Process Clone Detector

(by R. Uba, M. La Rosa, L. Garcia-Banuelos and M. Dumas)
Business Process Clone Detector is a command-line tool for detecting duplicate fragments (a.k.a. clones) in repositories of process models. The tool works with a collection of EPC models as input (at least two models) and returns a DOT image for each identified clone. These images can be opened with ProM 5.2 (www.processmining.org). It is possible to choose the minimum size of a clone, which is 4 nodes by default.

Download Business Process Clone Detector

Infrequent Process Behavior Filter

(by R. Conforti, M. La Rosa and A.H.M. ter Hofstede)
The analysis of business process event logs can be negatively influenced by the presence of outliers, which reflect infrequent behavior or “noise”. In process discovery, where the objective is to automatically extract a process model from an event log, this may result in rarely traveled pathways that clutter the process model. The Infrequent Process Behavior Filter automatically filters out infrequent behavior while minimizing the number of events being removed from the log. The tool accepts as input an event log in XES or MXML format and provides a filtered log in output. This tool has been integrated into Apromore as part of the BPMN Miner plugin, where users can choose whether to filter the input log before process discovery.

Markovian Fitness and Precision (MFP)

(by A. Augusto, A. Armas-Cervantes, R. Conforti, M. Dumas, M. La Rosa, D. Reissner)
Given a BPMN model and an event log in XES or MXML format, this tool computes the fitness and the precision of the model w.r.t. the log. Fitness measures how much of the behavior recorded in the log can be replayed by the process model. The measure ranges from 0 (highly unfitting model) to 1 (fully fitting model). Precision measures the extent of extra model behavior that is not recorded in the log. The measure ranges from 0 (highly-imprecise model) to 1 (highly-precise). Specifically, these measures are computed by relying on the Markovian representations of the log and of the model behavior, which are then compared using graph-edit distance. The Markovian representation may be lossy depending on the k-order parameter.

Download Markovian Fitness and Precision (MFP)

Multi-Perspective Process Comparator (MPC)

(by H. Nguyen, M. Dumas, M. La Rosa, A.H.M. ter Hofstede)
Existing approaches to log-based process variant comparison are restricted to intra-case relations, and more specifically, directly-follows relations such as “a task directly follows another one” or a “resource directly hands-off to another resource” within the same case. This tool implements a more general approach based on so-called perspective graphs. A perspective graph is a graph-based abstraction of an event log where a node represents any entity in an event log (task, resource, location, etc.) and an arc represents an arbitrary relation between these entities (e.g. directly-follows, co-occurs, hands-off to, works-together with, etc.) within or across cases. Statistically significant differences between two perspective graphs are captured in a so-called differential perspective graph, which allows us to compare two event logs from any given perspective. The tool is packaged as a standalone ProM distribution containing the plugin called “Multi-Perspective Process Comparator”. The input to the plugin are two event logs and user-defined parameters for comparison. The output is a matrix-based visualization of the differences between the two logs. Detailed usage instructions are documented in the tool manual.

Nirdizati

(by A. Rozumnyi, I. Verenich, M. La Rosa, M. Dumas, F. Maggi, I. Teinemaa)
Nirdizati is a dashboard-based monitoring tool that is updated periodically based on incoming streams of events. However, unlike classical monitoring dashboards, Nirdizati does not focus on showing the current state of business process executions, but their future state (e.g. when will each case finish). On the backend, Nirdizati uses predictive models trained using machine learning methods, including deep learning. Currently, Nirdizati is processing two predefined event streams corresponding to the Business Process Intelligence Challenges (BPIC 2012 and BPIC 2017). Both logs originate from a financial institute and pertain to a loan application process. For the 2012 BPIC, we are using a classification model to predict whether the case duration will be within a certain threshold and a regression model to predict the remaining cycle time of an ongoing case. In addition, for the 2017 BPIC, we predict whether a customer will accept a loan offer via a classification model. All the predictions are updated automatically as new events arrive.

Download Nirdizati

Optimization Framework for Automated Process Discovery

(by A. Augusto, M. Dumas, M. La Rosa, S.J.J. Leemans, S.K.L.M. vanden Broucke)
This tool implements four optimization meta-heuristics (i.e. iterative local search, repetitive local search, tabu search, and simulated annealing) to optimize the automated discovery of process models. Given an event log as input (in MXML, XES.GZ, or XES format), the tool allows to optimize one of the three automated process discovery algorithms implemented within the framework: Split Miner, Fodina, or Inductive Miner. The optimization is driven by the selected metaheuristic (out of the four available). Precisely, the metaheuristic explores (in a pseudo-random manner, according to the selected meta-heuristic) the solution space looking for the process model that scores the highest Markovian fitness and precision values. The exploration of the solution space ends when a timeout or a maximum number of exploration iterations have been reached. The tool outputs the best process model found during the solution space exploration, in BPMN 2.0, which can be opened and visualized using different tools, such as Apromore.

Download Optimization Framework for Automated Process Discovery

OptimizeKnockout

(by I. Verenich, M. Dumas, M. La Rosa, F.M. Maggi and C. Di Francescomarino)
OptimizeKnockout is a tool for finding an optimal ordering of check activities in a so-called “knockout section” of a business process in order to minimize overprocessing. Overprocessing waste occurs in a business process when an effort is spent in a way that does not add value to the customer nor to the business. A recurrent overprocessing pattern in business processes happens in the context of “knockout checks”, i.e. activities that classify a case into “accepted” or “rejected”, such that if the case is accepted it proceeds forward, while if rejected, it is canceled and all work performed in the case is considered unnecessary. Thus, when a knockout check rejects a case, the effort spent in other (previous) checks becomes overprocessing waste, according to the Lean classification. This tool implements a fine-grained approach to reorder knockout checks at runtime based on predictive machine learning models.

Model OptimizeKnockout

Predictive Business Process Monitoring with LSTM

(by N. Tax, I. Verenich, M. La Rosa and M. Dumas)
This tool can be used to perform the following prediction tasks: i) prediction of the next type of activity to be executed in a running process instance; ii) prediction of the timestamp of the next type of activity to be executed; iii) prediction of the continuation of a running instance, i.e. its suffix; and iv) prediction of the remaining cycle time of an instance. The tool trains a Long Short Term Memory (LSTM)-based predictive model using data about historical process instances. Next, the models are evaluated on running, i.e. incomplete instances. It assumes the input is a complete log of all traces in the CSV format wherein the first column is a case ID, then activity name or ID, and finally the activity timestamp. Then, this input log is temporally split on 66% (training set) vs 34% (test set), and on the test set the tool evaluates prediction performance for every size of a partial trace, e.g a test trace cut at the 2nd event, the same trace cut at the 3rd event and so on, along with all four prediction tasks.

Download Predictive Business Process Monitoring with LSTM

ProConformance 1.0 (event structures-based)

(by L. Garcia-Banuelos, N. van Beest, M. Dumas and M. La Rosa)
Given a process model and a process execution log, ProConformance 1.0 provides a list of statements in natural language capturing behavior that is present or frequent in the model, while absent or infrequent in the log, and vice versa. This conformance analysis method allows users to diagnose differences between prescriptive process behavior (as captured in the process model) and deviant executions of a process as captured in the log, e.g. for compliance purposes. or between two versions or variants of a process. The model can be provided in BPMN and the log in the MXML or XES format.

Download ProConformance 1.0

ProConformance 2.0 (automata-based)

(by D. Reissner, R. Conforti, M. Dumas, M. La Rosa and A. Armas-Cervantes)
ProConformance 2.0 provides a list of statements in natural language capturing behavior that is present or frequent in the log but not in the model. Moreover, it returns all optimal and one-optimal trace alignments between the log and the model. The difference with ProConformance 1.0 is that the internal structures are based on automata (a Deterministic Acyclic Finate State Automaton – DAFSA is built from the log and a Reachability Graph is built from the process model) instead of event structures. Similar to ProConformance 1.0, the model can be provided in BPMN and the log in the MXML or XES format.

Download ProConformance 2.0

ProConformance 3.0 (automata-based)

(by D. Reissner, A. Armas-Cervantes, R. Conforti, M. Dumas, D. Fahland, M. La Rosa)
ProConformance 3.0 features a re-engineered ProConformance tool with a number of optimizations. In addition, the package implements several extensions to improve scalability with large datasets. Specifically, there are four options: i) base approach without any extension (Automata); ii) with the S-Components extension (SComp) to tackle concurrent process models; iii) with the S-Components and tandem repeats reduction (TR-SComp) to tackle event logs with lots of repetitions, or iv) a hybrid approach that tries to automatically select the most suitable extension based on the characteristics of the input model and log (Hybrid).

Datasets used in the experiments of the papers “Scalable Alignment of Process Models and Event Logs: An Approach Based on Automata and S-Components” and “Efficient Conformance Checking using Alignment Computation with Tandem Repeats“.
Source code (provided “as is”, under Apache v2.0)

Download ProConformance 3.0

ProDelta

(by N. van Beest, M. Dumas, L. Garcia-Baneulos and M. La Rosa)
Given two process execution logs, ProDelta provides a list of statements in natural language capturing behavior that is present or frequent in one log, while absent or infrequent in the other. This log delta analysis method allows users to diagnose differences between normal and deviant executions of a process or between two versions or variants of a process. The logs can be provided in the MXML or XES format. ProDelta has been integrated into Apromore as part of the Compare plugin.

Download ProDelta

ProDrift 2.5

(by A. Ostovar, A. Maaradji, M. Dumas, M. La Rosa)
ProDrift is a fully automated tool for detecting and characterizing business process drifts. The tool accepts as input a process execution log in MXML or XES format and performs statistical tests over a stream of runs or a stream of events, obtained by replaying the event log. ProDrift accepts an optional window size (specified as a number of traces or events), as well as the possibility of using an adaptive window. If the latter option is chosen, ProDrift will adapt the window size in order to strike a trade-off between classification accuracy and drift detection delay. The output is a list of drifts, each with information on the location in the stream of traces (or events) where the drift occurred, and a list of behavioral relations that have been modified by the drift. Drifts can also be characterized at the level of entire process fragments (i.e. single-entry-single-exit sub-processes containing multiple activities and gateways), in which case the tool will return one or more statements in natural language, describing the fragments being affected by the drift and how they have been changed. ProDrift has also been integrated as a plugin into Apromore.

Synthetic logs used in “Robust Drift Characterization from Event Streams of Business Processes”

Download ProDrift 2.5

ProLoCon

(by A. Armas Cervantes, M. Dumas and M. La Rosa)
ProLoCon is a command-line tool for the computation of local concurrency oracles out of event logs. Given an event log, the tool constructs a state-space representing the behavior captured in the log and identifies parts within such state space, referred to as scopes, where concurrency relations between pairs of events hold. The state-space abstracts the behavior in the log as an acyclic transition graph, where every vertex in the graph denotes an execution state and every transition denotes an event occurrence. Then, a scope is a pair of vertices (execution states) where pairs of events can occur concurrently. The current version of the tool uses the Alpha algorithm for the computation of the concurrency relations between events. The input required for the tool is simply an event log in either XES or MXML format.

Download ProLoCon

ProSeqPredict

(by I. Verenich, D. Chasovskyi, M. Dumas, M. La Rosa, F. Maggi and A. Rozumnyi)
ProSeqPredict is a tool to predict the most likely sequence of activities (trace suffix) that will be executed from a partial process instance (trace prefix), based on the information already available on the prefix as well as on the availability of past traces already executed, which are recorded in an event log. It requires as input an event log in CSV format and the length of the prefix to be used. The tool will predict the most likely suffix for each prefix of that length present in the log.

Download ProSeqPredict

ProVariant

(by N. van Beest, H. Groefsema, L. Garcia-Banuelos and M. Aiello)
ProVariant allows the automated generation of declarative specifications from a set of business process variants. It takes as input a set of process models in PNML format and returns as output a set of CTL specifications stored in an XML file.

Download ProVariant

Slice Mine Dice (SMD) Process Miner

(by C.C. Ekanayake, M. Dumas, L. Garcia-Baneulos and M. La Rosa)
SMD is a tool for mining a collection of process models from a process log. This tool uses a combination of trace clustering and clone detection techniques to mine a process model collection where similar process sections are extracted as subprocesses. The tool requires as input a log, an existing trace clustering technique (different ones can be chosen), and a complexity threshold. The result is a hierarchical process model collection where the size of each process model is bounded by the threshold. As this tool can detect and extract common sections from discovered process models, the resulting process model collection has a smaller overall size and less number of process models compared to a collection of process models obtained with a trace clustering technique under the same complexity bound. Furthermore, identification and extraction of similar sections could facilitate better analysis of the generated process model collection.

Download Slice Mine Dice (SMD) Process Miner

Split Miner

(by A. Augusto, R. Conforti, M. Dumas and M. La Rosa)
Split Miner is a tool for fast mining of simple, accurate, and deadlock-free BPMN process models from an event log. The approach works in five steps. The first step discovers the directly-follows graph and identifies loops in the process behavior captured in the input event log. The second step detects parallelism between process activities. The third step filters the graph by removing infrequent behavior. The fourth step detects the split gateways while the last step discovers the join gateways. The event log can be in MXML, XES.GZ, or XES format. The output model, in BPMN 2.0, can be opened and visualized using different tools, such as Apromore.

Download Split Miner

Split Miner 2.0

(by A. Augusto, M. Dumas and M. La Rosa)
Split Miner 2.0 is an extended version of the Split Miner algorithm for discovering accurate and deadlock-free BPMN process models from event logs. With respect to the original (2017) version of Split Miner, the main improvements of Split Miner 2.0 are:

Ability to use both start and end timestamps of each activity, in order to identify concurrency more accurately. The original Split Miner algorithm relied only on end timestamps.
Ability to discover BPMN process models with inclusive decision gateways (OR-splits). This leads to process models with simpler branching structures.

Download Split Miner 2.0

Staged Process Flow Performance Analyzer

(by H. Nguyen, A.H.M. ter Hofstede, M. Dumas, M. La Rosa, F.M Maggi)
Existing process mining techniques provide summary views of the overall process performance over a period of time, allowing analysts to identify bottlenecks and associated performance issues. However, these tools are not designed to help analysts understand how bottlenecks form and dissolve over time nor how the formation and dissolution of bottlenecks – and associated fluctuations in demand and capacity – affect the overall process performance. Staged Process Flow (SPF) is a ProM plugin offering a number of visualizations that collectively allow process performance evolution to be analyzed from multiple perspectives. The idea underlying this tool is an abstraction of a business process as a series of queues corresponding to stages. SPF has also been integrated as a plugin into Apromore.

Download Staged Process Flow Performance Analyzer

Staged Process Miner

(by H. Nguyen, A.H.M. ter Hofstede, M. Dumas, M. La Rosa, F.M Maggi)
This is a standalone ProM distribution containing the Staged Process Miner plugin and the Stage-based Process Discovery plugins, as well as plugins for relevant baseline techniques. The Staged Process Miner plugin takes as input an event log in XES or MXML format and returns a partitioning of this log into stages (called “stage model”). The only parameter required is the minimum number of events for each stage. The Stage-based Process Discovery plugin takes as input the stage model and an event log and returns a process model (Petri Net and BPMN). The two baseline techniques for stage mining included in the package are the Divide and Conquer framework (DC) and the Performance Analysis with Simple Precedence Diagram (SPD). The four baseline techniques for stage-based process discovery included in the tool are Decomposed Miner, Region-based Miner (genet tool), Inductive Miner, and Fodina. This ProM distribution also comes with a plugin to visualize the output of the above three techniques (SPM, SPD, and DC). You can also download the SPM plugin directly from the ProM nightly build. The Staged Process Miner plugin has also been integrated as a plugin into Apromore.

Download Staged Process Miner

Structured Miner 1.1

(by A. Augusto, R. Conforti, M. Dumas, M. La Rosa, G. Bruno)
Structured Miner is a tool for mining maximally structured process models in BPMN from an event log. The approach works in two phases. The first phase discovers the BPMN process model from an input log using a baseline discovery algorithm which does not force the discovered model to be structured (currently, Heuristics Miner and Fodina Miner are supported). The second phase structures the discovered model combining BPStruct and Extended Oulsnam Structurer (both used with default settings). The event log can be in MXML or XES format. The discovered model, in BPMN 2.0, can be opened and visualized using different tools. Structured Miner is also part of BPMN Miner 2.0. The difference between the two is that Structured Miner always discovers flat process models whereas BPMN Miner 2.0 discovers hierarchical process models with subprocesses. Structured Miner has also been integrated into Apromore as part of the BPMN Miner plugin.

Download Structured Miner 1.1

Timestamp Repair for Event Logs

(by R. Conforti, M. La Rosa, A.H.M. ter Hofstede, A. Augusto)
This tool allows the automatic correction of timestamp errors in business process execution logs. Precisely, given an input event log, it detects events recorded with the same timestamp and, first, it repairs the order of the events relying on correct event log traces (where the same events do not have recording errors), and then it computes the likely true timestamp for each event affected by same-timestamp errors. The tool receives in input the affected event log in the XES format and outputs the repaired event log.

Questions?
Ask an expert.

Get a personalized demo to see how Apromore can work for you.

Request a demo

Free-trial-without-background-e1617774962298

Start free trial