Optimization of consensus-based machine learning processes using Galois closed patterns
This study deals with the extension, development, and implementation of techniques for the optimization of
machine learning processes based on Galois closed patterns for consensus-based (ensemble method) unsupervised,
semi-supervised and supervised learning approaches by heuristic and statistical analysis of closed concept lattice.
The context of this study aims at the optimization of learning paths and the improvement of MOOC content.
The completion rate of a MOOC is around 10% and the certification rate is 5.5%. These very low rates could be
improved by analyzing learners' activity histories, identifying success and failure factors, and using the knowledge
obtained for dynamic improvement of the teaching content and personalized follow-up of learners (recommendations,
learning paths by profile, etc.).
This study aims at studying the OULAD dataset [4] which contains anonymized data on courses, learners, and their
interactions with the Virtual Learning Environment (VLE) for seven selected courses (called modules). The courses
start in February and October: they are marked with 'B' and 'J' respectively. This dataset consists of 7 tables, each
stored in a CSV file, linked together by unique identifiers according to the conceptual scheme below.
The fact that this dataset contains demographic data as well as aggregated data on student interactions with the
virtual learning environment (VLE) allows the analysis of student behavior, represented by their actions. It contains
information on 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE
represented by daily summaries of student clicks (10,655,280 entries).
The analyses to be carried out using unsupervised and supervised learning techniques are aimed at understanding
the criteria for identifying and predicting learner dropout, as well as the success or failure of learners who do not drop
out.
Different knowledge patterns (descriptive) and knowledge models (predictive) will be generated on different
categories of attributes: demographic information, assessment results and learner interaction data with the VLE.
This works aims at reproducibility, in terms of results, and extension, through the application of closed patterns-based
unsupervised, semi-supervised and supervised learning methods, of initial studies carried out on these data,
providing relevant elements concerning the preparation of the data, the application of learning methods and the
nature of the knowledge extracted.
• Extraction of knowledge patterns by similarity analysis methods between instances: Clustering, outlier detection,
and semi-supervised classification.
• Extraction of knowledge patterns by predictive methods: Supervised classification, regression, and deep learning.
• Implementation and extensions of closed patterns-based analysis methods in R and/or Python.
Supervisor: Nicolas PASQUIER, University of Côte d'Azur, I3S Laboratory, CNRS, UMR-7172.
[email protected]
Research Associate : Sartaj Hajam , University of Côte d'Azur, I3S Laboratory, CNRS, UMR-7172.
[email protected]
[1] Chen, H., Yin, C., Li, R., Rong, W., Xiong, Z., & David, B. (2020). Enhanced learning resource recommendation
based on online learning style model. Tsinghua Science & Technology, 25, 348-356.
[2] Hlioui, F., Aloui, N., & Gargouri, F. A Withdrawal Prediction Model of At-Risk Learners Based on Behavioural
Indicators. IJWLTT, 16(2):32-53, 2021.
[3] Jha, N. I., Ghergulescu, I., & Moldovan, A. N. OULAD MOOC Dropout and Result Prediction using Ensemble,
Deep Learning and Regression Techniques. CSEDU, 2:154-164, 2019.