Predicting Performance in First-Year Required Courses Using Machine Learning
An Analysis of Students' Learning Outcomes Based on At-Enrollment Data
DOI:
https://doi.org/10.52731/lir.v005.473Keywords:
Data Science Education, students’ performance, Machine learning, Academic IR, early interventionAbstract
In response to the growing importance of data literacy across disciplines, this study explores the potential of machine learning to predict student performance in first-year information literacy courses using only at-enrollment data. Conducted at Hokuriku University in Japan, the study utilizes a rich dataset encompassing students' academic background, standardized test scores, cognitive skills assessments, and self-reported academic habits collected at the time of admission. This research uses Random Forest, Support Vector Machine, and Logistic Regression models to identify at-risk students early in the academic year. Our findings reveal that Random Forest achieved the highest accuracy in binary classification with an AUC score of 0.878, highlighting key predictors such as English proficiency, high school GPA, and conceptual skills. This predictive approach demonstrates the feasibility of early intervention for at-risk students, offering insights into student preparedness and support enhancement. By identifying critical factors influencing success in mandatory data science education, this study contributes to the global dialogue on improving foundational data science courses and proposes scalable methods to foster equitable academic outcomes.
References
S. Tajiri, K. Takamatsu, N. Shiratori, T. Oishi, M. Mori, and M. Murota, “Integrating Tableau into a First-Year Information Literacy Course: A Practical Approach to Enhancing Data Science Education,” in 16th International Conference on Data Science and Institutional Research (DSIR 2024), 2024, p. in press.
S. Tajiri, K. Takamatsu, N. Shiratori, T. Oishi, M. Mori, and M. Murota, “Comparative Analysis of Grade Distributions in Team- Taught Introductory Data Science Courses for First-Year Students.” .
G. R. Pike and J. L. Saupe, “Does High School Matter? An Analysis of Three Methods of Predicting First-Year Grades,” Res. High. Educ., vol. 43, no. 2, pp. 187–207, 2002.
A. Peña-Ayala, “Educational data mining: A survey and a data mining-based analysis of recent works,” Expert Syst. Appl., vol. 41, no. 4, pp. 1432–1462, Mar. 2014.
E. B. Costa, B. Fonseca, M. A. Santana, F. F. de Araújo, and J. Rego, “Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses,” Comput. Human Behav., vol. 73, pp. 247–256, Aug. 2017.
B. Albreiki, N. Zaki, and H. Alashwal, “A systematic literature review of student’ performance prediction using Machine Learning techniques,” Educ. Sci. (Basel), vol. 11, no. 9, p. 552, Sep. 2021.
L. Sandra, F. Lumbangaol, and T. Matsuo, “Machine learning algorithm to predict student’s performance: A systematic literature review,” TEM J., pp. 1919–1927, Nov. 2021.