Machine learning: An application for School Dropout prevention
DOI:
https://doi.org/10.33448/rsd-v14i6.49029Keywords:
Retention, Dropout, Machine learning, Naive bayes, Logistic regression.Abstract
The retention of students in mathematics subjects continues to be a major concern for educational institutions and is a multifaceted challenge, since it directly influences the effectiveness of the teaching process and, consequently, encourages school dropouts. The aim of this study is to present an application of machine learning to prevent school drop-outs. The aim of this research is to discern the probabilities associated with passing or failing students enrolled in mathematics subjects in the first semester of school, based on historical records of academic performance at the Federal Institute of Ceará (IFCE), Fortaleza campus. To this end, Machine Learning algorithms were applied to indicate the probability of a student passing or failing mathematics in the first semester, generating reports to help decision making by the coordinators of the courses in which this subject is taught. The Logistic Regression and Naive Bayes algorithms will be used. To do this, libraries available for the Python language will be used, namely Scikit-learn. The expected result is that the Machine Learning algorithms will provide insights into preventive academic pedagogical and managerial actions aimed at reducing failure rates and, consequently, school dropout.
Downloads
References
Akamine, C. T. & Yamamoto, R. K. (2009). Estudo dirigido: estatística descritiva. (3ed). Editora Érica.
Batista, A.S. (2015). Regressão Logística: Uma introdução ao modelo estatístico - Exemplo de aplicação ao Revolving Credit. Vida Economica Editorial. Recuperado de https://books.google.com.br/books?id=EtAsCgAAQBAJ.
Brasil. (2018). Lei nº 13.709, de 14 de agosto de 2018. Dispõe sobre a proteção de dados pessoais e altera a Lei nº 12.965, de 23 de abril de 2014 (Marco Civil da Internet).Brasília, DF: Presidência da República. Recuperado de: http://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei/L13709.htm.
Bussab, Wilton de O.; Morettin, Pedro A. (2010). Estatística básica. In: Estatística básica. p. xvi, 540-xvi, 540.
Burkov, A. (2019). The Hundred-Page Machine Learning Book. Ed. Andriy Burkov. ISBN-13: 978-1999579500.
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research (JAIR), 16, 321-357. https://doi.org/10.1613/jair.953
Costa, N. ., Nascimento, M. ., Nascimento, A. C. ., Azevedo, C. ., & Suela, M. . (2024). Classificadores Híbridos Baseados no Naive Bayes Avaliados em Diferentes Níveis de Dependência Entre Variáveis. Enciclopedia Biosfera, 21(47), 47-61. https://conhecer.org.br/ojs/index.php/biosfera/article/view/5749
Creswell, J. W. (2014). Research design: Qualitative, quantitative, and mixed methods approaches. (4th ed.). Sage Publications.
Faceli, K., Lorena, A. C., Gama, J., & Carvalho, A. C. P. de L. F. de. (2011). Inteligência artificial: uma abordagem de aprendizado de máquina. Rio de Janeiro: LTC.
Felizardo, L. F., Gualberto, D. R., & Carrano, D. P. (2024). Redes neurais artificiais em ambientes educacionais: business intelligence aplicado à pedagogia. Caderno Pedagógico, 21(6), e5210. https://doi.org/10.54033/cadpedv21n6-258
Fernández, A., García, S., Herrera, F., Chawla N. V. (2018). SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. Journal of Artificial Intelligence Research, 61, 863-905.
Galdino, M. V. (2018). Modelos probabilísticos e não probabilísticos de classificação binária para pacientes com ou sem demência como auxílio na prática clínica em geriatria [Tese de Doutorado, Universidade Estadual Paulista]. Repositório UNESP.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O'Reilly Media.
Gil, A. C. (2017). Como elaborar projetos de pesquisa. (6ed). Editora Atlas.
Hair, Joseph F. et al. (2009). Análise multivariada de dados. Bookman editora.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. https://doi.org/10.1109/TKDE.2008.239
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Wiley.
Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31, 685–695.
https://doi.org/10.1007/s12525-021-00475-2.
Mckinney, Wes. Python para análise de dados: Tratamento de dados com Pandas, NumPy e IPython. Novatec Editora, 2018.
Melgaço da Silva, L., & Ciasca, M. (2021). história da educação profissional no Brasil: do período colonial ao Governo Michel Temer (1500-2018). Educação Profissional E Tecnológica Em Revista, 5(1), 73-101. https://doi.org/10.36524/profept.v5i1.677
Mendoza Lira, Michelle, Quiroz Muñoz, Javiera, Muñoz Pérez, Daniela, Contreras Pérez, Camila, & Ballesta Acevedo, Emilio. (2023). Conceituações de evasão e retenção escolar segundo uma escola primária no Chile. Cadernos de Pesquisa Educacional , 14 (2), e206. Epub 1 de dezembro de 2023. https://doi.org/10.18861/cied.2023.14.2.3368.
Montgomery, D. C., & Runger, G. C. (2010). Applied Statistics and Probability for Engineers. John Wiley & Sons.
Moore, D. S., McCabe, G. P., & Craig, B. A. (2014). Introduction to the Practice of Statistics (8th ed.). W.H. Freeman and Company.
Morettin, Pedro A.; Singer, Julio M. (2021). Estatıstica e Ciência de Dados. Texto Preliminar, IME-USP.
Müller, A. C. & Guido, S. (2017). Introduction of machine leaning with phyton. Published by O’Reilly Media, Inc.
Pedregosa, F. et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830, 2011. https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html.
Pereira A. S. et al. (2018). Metodologia da pesquisa científica. [free e-book]. Editora da UAB/NTE/UFSM.
Sabry, F. (2023). Naive Bayes Classifier: Fundamentals and Applications [Série Artificial Intelligence]. One Billion Knowledgeable.
Scikit-Learn. (2024). Sklearn.feature_selection.SelectKBest. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html.
Scikit-Learn. (2024). Scikit-learn: Machine Learning in Python. https://scikit-learn.org/stable/.
Shitsuka et al. (2014). Matemática fundamental para a tecnologia. Editora Érica.
Tanimu, Jesse Jeremiah et al. A machine learning method for classification of cervical cancer. Electronics, 11(3), 463, 2022.
Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.
Yin, R. K. (2018). Case study research and applications: Design and methods. (6th ed.). Sage Publications.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Valberto Rômulo Feitosa Pereira; Mairton Cavalcante Romeu; Nairys Costa de Freitas; Vinicius Silva Pereira

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2) Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3) Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.