CONFINE: Preserving data secrecy in inter-organizational process mining

The CONFINE framework implements a decentralized paradigm for inter-organizational process mining, utilizing trusted applications operating within Trusted Execution Environments (TEEs) to uphold the tenets of data privacy and confidentiality. This repository houses the prototype implementation of the principal components constituting the framework, namely:

Provisioners: HTTP servers responsible for delivering event logs designated for mining.
Secure Miners: EGo Intel SGX trusted applications retrieving and merging event logs to be fed into process mining algorithms.

Publications and further material

For more information about the CONFINE approach and toolkit, consult the following publications:

Valerio Goretti, Davide Basile, Luca Barbaro, Claudio Di Ciccio (2024) Trusted Execution Environment for Decentralized Process Mining. In: CAiSE 2024, 509-527, Springer.
- Paper (DOI): 10.1007/978-3-031-61057-8_30
- Paper (ePrint): arXiv.org/abs/2407.10684
- Presentation: slideshare.net/slideshow/trusted-execution-environment-for-decentralized-process-mining/269604280
Valerio Goretti, Davide Basile, Luca Barbaro, Claudio Di Ciccio (2024) CONFINE: Preserving Data Secrecy in Decentralized Process Mining. In: ICPM Doctoral Consortium / Demo 2024, 63-68, CEUR-ws.org.
- Paper (URL): ceur-ws.org/Vol-3783/paper_324.pdf (open access)
- Poster: slideshare.net/slideshow/confine-preserving-data-secrecy-in-decentralized-process-mining-demo-poster/272579908

Framework overview

Our framework involves different information systems running on multiple machines. An organization can take at least one of the following roles: provisioning if it delivers local event logs to be collaboratively mined; mining if it applies process mining algorithms using event logs retrieved from provisioners. Depending on the played role, nodes come endowed with a Provisioner or a Secure Miner component, or both. Provisioner Nodes host the Provisioner's components, encompassing the Log Recorder and the Log Provider. The Miner Node is characterized by two distinct execution environments: the Operating System(OS) and the Trusted Execution Environment (TEE). TEEs establish isolated contexts separate from the OS, safeguarding code and data through hardware-based encryption mechanisms. We leverage the security guarantees provided by TEEs to protect a Trusted App responsible for fulfilling the functions of the Secure Miner and its associated sub-components. The Secure Miner exchange messages with Provisioners according to the CONFINE protocol. After the proper execution of the CONFINE protocol, the trusted app implementing the Secure Miner is able to retrieve event logs, merge them and elaborate their aggregation in the Trusted Execution Environment.

Screencast

As follows you can find a screencast that shows how to set up and run the necessary components

CONFINE.mp4

Repository

The main content of the repository is structured as follows:

/src/: the root folder of the implementation source code
- /src/secure-miner/ contains the Secure Miner implementation as an EGo Intel SGX application
- /src/provisioner/ contains the Log Provider component implementation in GO
- /src/mining-data/ contains the metadata required for the execution of the CONFINE protocol
/evaluation/: folder containing datasets and results of our tests
- /evaluation/convergence/ contains the convergence test data
- /evaluation/memoryusage/ includes the memory usage tests data
- /evaluation/scalability/ contains the data of the scalability tests

Installation with Docker

To run the code, we prepared a CONFINE Docker image containing all the necessary requirements in order to run the framework. In order to use it, you need to have Docker installed, you can do that by following this page. After installation, you can start creating the container to run the framework. After installation, you can pull the images executing the following command.

docker pull valeriogoretti9/confine:latest

Once the image has been downloaded, you can execute the following command to create a container with the CONFINE image. Before execution, you must be careful to enter your volume path instead of the tag. You can also set a container name by replacing the tag .

docker run --volume /var/run/aesmd:/var/run/aesmd -v <INSERT YOUR VOLUME PATH>:/volume --name <INSERT YOUR DOCKER CONTAINER NAME> -ti valeriogoretti9/confine:latest

Once the Docker container is created, the following commands allow you to start it.

docker start <INSERT YOUR DOCKER CONTAINER NAME>
docker attach <INSERT YOUR DOCKER CONTAINER NAME>

Setup and run

Once the docker container is running, you can proceed with the CONFINE setup. Execute the following commands to clone the CONFINE code and access it.

cd volume
apt-get update
apt-get install git
git clone https://github.com/Process-in-Chains/CONFINE.git
cd CONFINE/

In the next parts, you can see how to run provisioner and secure miner components.

Provisioner

At this point, the container is running.

Enter the folder dedicated to the provisioner data by executing the following command

cd mining-data/provision-data/process-01/

You have to put the log (in XES format) you want to provide to the Secure Miners into this folder. We already provide different inter-organizational log samples that you can find in this folder.

After that, navigate to /src/provision-data and modify the minerList.json file

cd ..
nano minerList.json

Append to this file the TLS certificate string of the Secure Miners you want to be accepted by the provisioner. You will see in the Secure Miner section how to get this information.

Now you are ready to run the provisioner. To facilitate its start-up, we prepared the shell script runLogServer.sh in the /src folder. Let's navigate there

cd ../..

and run the runLogServer.sh shell script

./runLogServer.sh -port 8089 -log testing_logs/motivating/pharma.xes -mergekey concept:name -measurement `ego uniqueid app` -skipattestation true

with parameters :

port: the port on which the log server listens for new requests from the Secure Miner. The default value is 8089.
log: the path of the XES event log in the /src/mining-data/provision-data folder. The default value is testing_logs/motivating/pharma.xes.
mergekey: the name of the case identifier attribute inside the provided event log. The default value is concept:name.
measurement: the value that identifies the Secure Miner's source code for the remote attestation. The default value ego uniqueid app uses an EGo command to compute this information using the Secure Miner's source code.
skipattestation: if it is set to true, the remote attestation phase of the CONFINE protocol will be skipped. The default value is true. If the Secure Miner is running in simulation mode, this must be set to true.

Secure miner

In order to enable communication with log servers, you need to specify their references in the logserver-config.json file. Let's navigate to the file and open it

cd mining-data/collaborators/process-01/
nano logserver-config.json

Now specify, for each log server, its http_reference and the merge_key (i.e, the case attribute storing the process instance identifier). Let's assume you have three log servers whose http_reference are "localhost:8087", "localhost:8088", "localhost:8089" respectively, and their merge_key is "concept:name". You should have a setting like this

[
  {
    "http_reference": "http://localhost:8087",
    "merge_key": "concept:name"
  },
  {
    "http_reference": "http://localhost:8088",
    "merge_key": "concept:name"
  },
  {
    "http_reference": "http://localhost:8089",
    "merge_key": "concept:name"
  }
]

Change this file according to your settings.

We are ready to run the Secure Miner. As per log servers, we provide the runMiner.sh shell script to facilitate the deployment of the Secure Miner into the Intel SGX TEE. Navigate to the folder of the script

cd ../../...

and run the Secure Miner using the following command:

./runMiner.sh -port 8094 -segsize 2000 -test true -simulation true

with parameters :

port: the port on which the Secure Miner will receive the event log data. The default value is 8084.
segsize: the segment size in KB employed by the log servers during the data transmission phase of the protocol. The default value is 2000.
test: if it is true, it generates the test data. The default value is true.
simulation: if it is true, run the Intel SGX trusted app in simulation mode (no TEE deployment). The default value is true. If you are not running on a SGX enabled machine with a full Intel SGX installation, set this parameter to true

If the Secure Miner is correctly running, you should be able to see its terminal interface

You can interact with the Secure Miner application through four commands:

1: Run the CONFINE protocol having the Incremental Heuristics Miner as the process discovery algorithm of the computation phase. If everything goes right, the Secure Miner will output the process model in the /src/mining-data/output folder as a PNML file
2: Run the CONFINE protocol having the Heuristics Miner as the process discovery algorithm of the computation phase. If everything goes right, the Secure Miner will output the process model in the /src/mining-data/output folder as a PNML file
3: Run the CONFINE protocol having the Incremental Declare Conformance as the conformance checking algorithm of the computation phase. This command will require you to specify the path of a declare process model inside the src/mining-data/input folder. If everything goes right, the Secure Miner will output the conformance checking result in the /src/mining-data/output folder as a JSON file
4: Run the CONFINE protocol having the Declare Conformance as the conformance checking algorithm of the computation phase. This command will require you to specify the path of a declare process model inside the src/mining-data/input folder. If everything goes right, the Secure Miner will output the conformance checking result in the /src/mining-data/output folder as a JSON file
5: Classic Heuristics Miner execution using an event log inside the src/mining-data/input folder. The CONFINE protocol is not executed.
6: Show the self-signed TLS certificate randomly generated by the Secure Miner at start-up. Log servers should copy and paste the result of this command inside the minerList.json file, according to the steps described previously in the log server paragraph.

Evaluation

The following section contains the experimental toolbench used to evaluate the effectiveness of CONFINE, presented in the paper Section 6. Evaluation files can be found in /evaluation/. We conduct convergence analysis to demonstrate the correctness of the collaborative data exchange process. Moreover, we gauge the memory usage with synthetic and real-life event logs, to observe the trend during the enactment of our protocol and assess scalability.

Requirements

To run our Python scripts, the following libraries are required: os, pandas, numpy, matplotlib, scipy, sklearn, datetime.

Tests

For each test we used the event log of the motivating scenario and additionally, the BPIC logs, specifically Sepsis and BPIC 2013 event logs. We further processed these logs to simulate an inter-organizational scenario. We made specific modifications on the scalability tests event logs to allow the observation of different configurations of number of events per case, number of cases and number of provisioning organizations.

Output Convergence

To experimentally validate the correctness of our approach in the transmission and computation phases, we run a convergence test. To this end, we created a synthetic event log (available in /event_log/) consisting of 1000 cases of 14 events on average by simulating the inter-organizational process of our motivating scenario and we partitioned it in three sub-logs (Respectively Hospital, Specialized clinic and Pharmaceutical company event logs). We run the stand-alone HeuristicsMiner on the former, and processed the latter through our CONFINE toolchain. The convergence results are available in /output/ in the form of a workflow net.

Memory Usage

To evaluate the runtime memory utilization of our CONFINE implementation, we run a memory usage test, split into four different configurations:

In the first test, we excluded the computation phase by leaving the HeuristicsMiner inactive, so as to isolate execution from mining-specific operations. In this case we set the value of the segment size ${SEGSIZE_VALUE} to 2000 KiloBytes and we used the same synthetic event_log of our motivating scenario.
In the second test, on the other hand, we included the computation phase using the same synthetic event log and setting the segment size value as in the first test. The results of the first two tests are available in /output_test_motivating_scenario/.
In the third test, we also gauged the runtime memory usage of two public real-world event logs too. Since those are intra-organizational event logs, we split the contents to mimic an inter-organizational context. In particular, we separated the Sepsis event log based on the distinction between normal-care and intensive-care paths, as if they were conducted by two distinct organizations. Similarly, we processed the BPIC 2013 event log to sort it out into the three departments of the Volvo IT incident management system.
In the fourth test, we conducted an additional test to examine the trend of memory usage as the segment size varies with all the aforementioned event logs. The results of the test are available in /output_test_segment_size/.

Scalability

We examine the scalability of the Secure Miner, focusing on its capacity to efficiently manage an increasing workload in the presence of limited memory resources. We implemented three distinct test configurations gauging runtime memory usage as variations of our motivating scenario log.

To conduct the test on the maximum number of events, we modified the motivating scenario event log by adding a loop back from the final to the initial activity of the process model, progressively increasing the number of iterations. The results of the test are available in /output_test_max_events/.
Concerning the test on the number of cases, we simulated additional process instances, building new event logs. The results of the test are available in /output_test_cases/.
Finally, for the assessment of the number of organizations, the test necessitated the distribution of the process model activities’ into a variable number of pools, each representing a different organization. Event logs are available in /log_test_organizations/, results of the test are available in /output_test_organizations/.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.idea		.idea
evaluation		evaluation
src		src
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
backup.zip		backup.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CONFINE: Preserving data secrecy in inter-organizational process mining

Publications and further material

Framework overview

Screencast

Repository

Installation with Docker

Setup and run

Provisioner

Secure miner

Evaluation

Requirements

Tests

Output Convergence

Memory Usage

Scalability

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Process-in-Chains/CONFINE

Folders and files

Latest commit

History

Repository files navigation

CONFINE: Preserving data secrecy in inter-organizational process mining

Publications and further material

Framework overview

Screencast

Repository

Installation with Docker

Setup and run

Provisioner

Secure miner

Evaluation

Requirements

Tests

Output Convergence

Memory Usage

Scalability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages