Bridging Theory and Practice: The Academic Roots of Split Miner’s Success

An interview with Dr. Adriano Augusto, Principal Product Manager of Apromore

Many groundbreaking inventions that have reshaped the world originated from university research. From everyday essentials like the World Wide Web and Gatorade to transformative innovations like X-ray technology and the COVID-19 mRNA vaccine, these contributions from academic institutions have repeatedly left a lasting mark on human history.

The process mining market has its own pedigree of breakthroughs and Apromore is proud to be part of this with deep academic pedigree. Our software incorporates the results of thirteen PhD theses and has been showcased in more than 200 scientific publications, attracting over 2,500 citations. One of these inventions is Split Miner(1), a groundbreaking algorithm designed to discover process models from event logs with elevated accuracy and efficiency.

Dr. Adriano Augusto is principal product manager at Apromore and obtained his doctoral degree with a thesis titled “Accurate and Efficient Discovery of Process Models from Event Logs”(2) – a joint PhD between the University of Melbourne and the University of Tartu, under the supervision of Prof. Marcello La Rosa (Apromore CEO) and Prof. Marlon Dumas (Apromore CPO). The thesis’ main contribution is the design, implementation, and evaluation of Split Miner, arguably the most sophisticated algorithm for discovering BPMN models from event logs. Today Split Miner underpins Apromore’s automated discovery capability and is also used by other leading process mining vendors. 

We sat down with Dr. Adriano Augusto to reminisce about the research that led to Split Miner and his perspective on how innovation can flourish from academia.

Share the backstory of Split Miner: What prompted or inspired your research and thesis?

Adriano: My journey into process mining began during my master's thesis, where I delved into the state of the art of process discovery algorithms. At that time, algorithms such as Inductive Miner, Evolutionary Tree Miner and Heuristics Miner were highly regarded. However, they were built on the assumption that the data was always clean and well-structured, which is often not the case in real-world scenarios. This led to accuracy issues, low precision and inefficiencies when dealing with actual process data.

As I continued my research with a team that included Professors Marcello La Rosa and Marlon Dumas, and others, we observed that existing tools struggled with real data due to these assumptions. Inductive Miner, although excellent in theory, often fell short when handling the complexities of real-world processes. On the other hand, tools like Heuristics Miner occasionally performed better but were grounded on a number of simplifying assumptions, resulting in disconnected models. So, we explored the question further: Instead of merely extending existing tools like Heuristics Miner, could we design something new that combined the strengths of both Inductive Miner and Heuristics Miner while addressing their weaknesses? Or in fact, design a new approach that could lift the limitations of said algorithms? This idea eventually led to the creation of Split Miner, which took a semi-structured approach and incorporated a filtering mechanism.

Before Split Miner Fig 1: Process model discovered by Inductive Miner from a patient treatments dataset.

After Split Miner
Fig 2: Process model discovered by Split Miner from the same dataset.

How did you validate and test Split Miner?

Adriano: We extensively compared Split Miner against six existing algorithms (including the Inductive Miner), using twelve real-life logs, and extension combination of configuration parameters. The evaluation showed that Split Miner convincingly outperforms all existing automated process discovery algorithms in terms of simplicity of the resulting process models (size and structural complexity), in terms of accuracy (using different accuracy measures), and in terms of execution time (over four times faster than all other algorithms on all event logs tested)(3).

The research on Split Miner has been featured at international conferences, receiving numerous awards including the 2022 KAIS Best Paper Award, the 2021 Best Dissertation Award at the ICPM Conference and the 2020 International Conference on Business Process Management Best BPM Dissertation Runner-Up Award.  

Split Miner
Fig 3: Split Miner in Apromore

And, is it true you’re always improving and innovating Split Miner?

Adriano: Over time, we refined Split Miner through various research iterations, consistently outperforming other baselines as well as the first incarnation of Split Miner in experiments. One of the first improvements we made was allowing clean discovery from process data without mandatory filtering. Initially, filtering was necessary in any discovery from the tool, but we later realized that sometimes the data was clean enough to skip filtering altogether.

Another improvement was to leverage both start and end timestamps of activities in the process data. When activities are executed in parallel by different resources, it's challenging to detect this using only the end timestamp of activities. By incorporating both start and end timestamps, we can better capture the overlapping execution of activities, leading to more accurate process models.

Did your research on Split Miner lead to other projects or areas of research?

Adriano: Yes, it did. One realization from my experience with Split Miner and other baselines was the importance of optimizing the parameters used for discovery. Typically, these tools are parameterized, requiring users to set up the tool before discovering a model. The model's accuracy can vary based on these parameters, which means users often must manually adjust settings until they achieve a satisfactory result. This process can be time-consuming.

Out of this experience, I envisioned automating the fine-tuning of these parameters to optimize process discovery automatically. While we haven't brought this into the industry yet, it's a promising area of research. However, automated optimization can only go so far; it might not match the nuanced knowledge of a process analyst who might prefer a less accurate model that reveals more interesting insights.

Earlier this year SAP started using Split Miner in the Signavio Process Intelligence product. I imagine you must be proud, yes? Care to share your thoughts?

Adriano: After spending almost seven years in academia, both as a lecturer and a researcher, I’m indeed very proud. There’s so much research out there that never makes it to industry, which often limits its impact. That’s one of the reasons I left academia—I wanted to see my work reach beyond the research bubble. Knowing that Split Miner has been acknowledged and adopted by industry players like SAP is incredibly gratifying. It validates the work we’ve done and proves that academic research can have a real-world impact.

I’ve been contacted by several people over the years asking to reimplement Split Miner in different programming languages like Python and R. This shows that different communities of academics are interested in expanding the work on Split Miner.

Anything else you’d like to share about the importance of research in making it from academia to commercial software, or anything about Split Miner that we haven’t discussed?

Adriano: At Apromore we have a very strong research background – in fact, the Apromore platform incorporates research from numerous PhD theses, the results have been published in scientific literature and attracted a large number of citations. Many of us still engage in academic work, including supervising PhD students. For example, Apromore's new Compliance Center is the result of four years of research conducted by Nigel Adams and supervised by me and Marcello. This deep connection between academia and industry is what I believe sets us apart. Research is fundamental, but it’s rare for it to cross the line into industry. Our organization is in a unique position because we’re deeply rooted in academia while actively engaging with the industry.

Thank you, Adriano, for sharing your insights. It’s clear that your work with Split Miner has had a significant impact on both the academic and commercial worlds. We look forward to seeing what’s next.

Explore Apromore’s deep academic roots and the PhD- and Masters’ theses that underpin Apromore’s functionality here.

(1) A. Augusto, R. Conforti, M. Dumas, M. La Rosa, A. Polyvyanyy, Split miner: automated discovery of accurate and simple business process models from event logs. Knowledge and Information Systems, 59, 2019.

(2) A. Augusto: Accurate and Efficient Discovery of Process Models from Event Logs, PhD thesis, The University of Melbourne and University of Tartu (joint PhD), 2020.

(3) A. Augusto, R. Conforti, M. Dumas, M. La Rosa, F.M. Maggi, A. Marrella, M. Mecella, A. Soo: Automated Discovery of Process Models from Event Logs: Review and Benchmark. IEEE Trans. Knowl. Data Eng. 31(4), 2019.