Data Science Techniques Used in Process Mining for Removing Noise
Heinonen, Kalle (2023)
Heinonen, Kalle
2023
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023060116921
https://urn.fi/URN:NBN:fi:amk-2023060116921
Tiivistelmä
Process mining supports organisations to understand and improve existing processes by extracting event log data from IT systems and visualising it. This study explores the main challenges associated with event log data processing. One of the most significant challenges is the presence of noise activities, which are infrequent and do not accurately represent the typical behaviour of the process. A process model, known as a Petri net, is generated using process mining techniques to enable business stakeholders to analyse and verify processes. In this thesis, the methods that are effective at handling noise in event log data are investigated. An experiment was conducted using two real-life event logs to evaluate which algorithm is best suited to handle noisy data. Two Petri nets were generated using Integer Linear Programming (ILP) and Inductive Miner algorithms. Generalisation and complexity were compared in addition to quality metrics such as F-score. The results indicate that the Inductive Miner algorithm generates a Petri net with a good F-score and suitable complexity metrics. When dealing with a more complex event log with a higher number of events and more noise, the ILP Miner produced a slightly better F-score by replicating more transitions in the model. The algorithms perform well for use cases where precise results are not essential. However, in industries such as medicine or fraud detection, where accuracy is critical, it is recommended that the excluded traces are checked by an expert to ensure that vital information is not omitted in a discovered model.