Machine learning Bureau score for Home Lending in an American finance company
DOI:
https://doi.org/10.33448/rsd-v13i10.47092Keywords:
Machine learning, Predictive modeling, Home lending, Credit bureau, LMI consumers, Credit risk, Mortgage application.Abstract
Our client is a leading provider of mortgage financing, originating loans and lines of credit to consumers in the US. Currently, they receive applications where applicants provide personal information and a soft pull of their FICO score is requested. That score is used to evaluate the applicant’s credit worthiness and determine conditional approval and the type of product available for the customer, including conventional, FHA or other mortgage loans. After conditional approval, a formal application is initiated, and underwriters review the information to determine the final application decision. When evaluating applications below regulatory and business thresholds, the company has the intention to approve more applications and increase loan volume, and there is an expectation that through the enhanced credit assessment, our client will improve the percentage of Low to Moderate Income (LMI) population able to obtain mortgage loans. Both aspects have a direct impact on the reputation and economic profits of the firm, so they are of pressing importance to the company. This project aims to build an applicant-level bureau-only score based on upgraded bureau internal attributes. This score will eventually serve as the basis for evaluating a customer’s credit risk before any loan structure or collateral information is considered. It will be used as a standalone score that can be used in the initial customer evaluation to identify better leads (mortgage inquiries for preapproval) and as input to a future application-level model.
Downloads
References
Bao, W.; Lianju, N.; & Yue, K. (2019) Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Syst. Appl. 128, 301–315.
Barroso J. B. R. B., Silva T. C., & Souza S. R. S. d. (2018), Identifying systemic risk drivers in financial networks, Physica A: Statistical Mechanics and its Applications. 503, 650–674, https://doi.org/10.1016/j.physa.2018.02.144, 2-s2.0-85043784193.
Brownlee, J. (2020, Feb) How to calibrate probabilities for imbalanced classification. https://machinelearningmastery.com/probability-calibration-for-imbalanced-classification
Chaudhuri, T., & Yulei, F. (2020). Machine Learning Applications in Real Estate: Methods and Challenges. Journal of Real Estate Finance and Economics, 61(2), 192-210. https://doi.org/10.1007/s11146-019-09732-8
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). New York, NY, USA: ACM. https://doi.org/10.1145/2939672.2939785.
Cox, D. R. (1958). The Regression Analysis of Binary Sequences. Journal of the Royal Statistical Society. Series B (Methodological), 20(2), 215–242. http://www.jstor.org/stable/2983890
Deepchecks Community Blog (2023). Understanding F1 Score, Accuracy, ROC-AUC, and PR-AUC Metrics for Models
Hodges, H., Garrity, C., & Pope, J. (2024). Deep Learning, Feature Selection, and Model Bias with Home Mortgage Loan Classification. In M. Castrillon-Santana, M. De Marsico, & A. Fred (Eds.), Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods (Vol. 1, pp. 248-255). (International Conference on Pattern Recognition Applications and Methods; Vol. 1). SciTePress. https://doi.org/10.5220/0012326800003654
Khemakhem, S.; & Boujelbene, Y. (2017) Artificial Intelligence for Credit Risk Assessment: Artificial Neural Network and Support Vector Machines. ACRN Oxf. J. Financ. Risk Perspect.6, 1–17.
Krasovytskyi, D., & Stavytskyy, A. (2024). Predicting Mortgage Loan Defaults Using Machine Learning Techniques. Ekonomika, 103(2), 140–160. https://doi.org/10.15388/Ekon.2024.103.2.8
Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan & R. Garnett (ed.), Advances in Neural Information Processing Systems 30 (pp. 4765--4774). Curran Associates, Inc.
Lundberg, S.M., Erion, G.G., & Lee, S. (2018). Consistent Individualized Feature Attribution for Tree Ensembles. ArXiv, abs/1802.03888.
Mili M., Sahut J. M., & Teulon F. (2018), Modeling recovery rates of corporate defaulted bonds in developed and developing countries, Emerging Markets Review. 36, 28–44, https://doi.org/10.1016/j.ememar.2018.03.001, 2-s2.0-85045029245.
Niculescu-Mizil, A., & Caruana, R. (2005, July). Obtaining Calibrated Probabilities from Boosting. In UAI (Vol. 5, pp. 413-20).
Nielsen, D. (2016). Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition?
Ozturkkal, B., & Wahlstrøm, R. (2022), Explaining mortgage defaults using SHAP and LASSO. http://dx.doi.org/10.2139/ssrn.4212836
Prado, J. W.; de Castro Alcântara, V.; de Melo Carvalho, F.; Vieira, K. C.; Machado, L. K. C.; & Tonelli, D. F. (2016) Multivariate Analysis of Credit Risk and Bankruptcy Research Data: A Bibliometric Study Involving Different Knowledge Fields (1968–2014). Scientometrics, 106, 1007–1029.
Roberts, A. (2022). What Is PR AUC? https://arize.com/blog/what-is-pr-auc/#:~:text=Amber%20Roberts,-Machine%20Learning%20Engineer&text=AUC%2C%20short%20for%20area%20under,the%20positive%20and%20negative%20classes.
Sirmans, G. S., MacDonald, L., & Macpherson, D. A. (2006). The Value of Housing Characteristics: A MetaAnalysis. Journal of Real Estate Finance and Economics, 33(3), 215-240. https://doi.org/10.1007/s11146-006-9983- 5
Wang, F.; Ding, L.; Yu, H.; & Zhao, Y. (2020) Big data analytics on enterprise credit risk evaluation of E-Business platform. Inf. Syst. E-Bus. Manag. 18, 311–350.
XGBoost developers (2018). xgboost, release 0.80, September, https://media.readthedocs.org/pdf/xgboost/latest/xgboost.pdf.
Zhang M. J. (2018) Risk and Prevention of Commercial Bank Mortgage Economic and Trade Practice 18 155-157
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Lucero Isabel Izquierdo Munoz; Jose Manuel San Martin Galindo

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1) Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2) Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3) Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.