Research Topic: Merging event logs: Combining granularity levels for process flow analysis.
Supervisor: Prof. Pnina Soffer
Process mining techniques enable the discovery and analysis of business processes, pointing at enhancement opportunities. However, processes are often comprised of separately managed procedures that have separate log files, which are impossible to mine in an integrative manner. Mining in an integrative manner implies disparate log files must be merged into a unified, comprehensive log. Several reasons impede such log merging. First, the logs reflect relationships between procedures varying in their complexity from simple inter-log relationships (one-to-one) to highly complex relationships (many-to-one or many-to-many). Second, complex relationships involve numerous granularity levels, which must be reflected in the merged log. Third, most real-life logs lack a common case ID rendering it impossible to merge logs in a straightforward manner. Fourth, most real-life logs contain free text that obscures the coherence of the merged data.
This work is neither qualitative nor quantitative but rather applicative. It concerns the development of an algorithm that has the power to overcome limitations of existing merging techniques to produce a merged log that would facilitate integrative process mining. The approach suggested herein matches log cases using temporal relations and text mining techniques accounting for all possible inter-log relationships.
The algorithm was developed based on the hypotheses that (a) a full end-to-end process may be mapped tracing all granularity levels; and (b) these granularity levels are imperative to produce an accurate process map that reveals all performance and flow issues hidden among the data.
The algorithm was evaluated over both synthetic (artificial) and real-life logs. This evaluation showed that mapping business processes based on the combination of both high-level (case) and low-level (instance) views is indeed possible and useful, particularly in identifying flow problems that occur at the point of integration between the levels under consideration. The algorithm thus enables the identification of process flow problems that could not be detected by previous techniques.
Email: