Crafting an Efficient Credit Risk Alert System: Assessment and Validation of the WOE-Enhanced Logistic Regression Model
DOI:
https://doi.org/10.70693/cjst.v1i1.793Keywords:
Debt default; Risk management; LR-WOE; Feature engineeringAbstract
In the current development landscape of the credit industry, risk management faces a series of challenges. Although technological advancements have brought significant progress to this field, issues such as high labor costs and insufficient customer authentication persist, highlighting the urgent need to build efficient risk prediction models. Taking the GiveMeSomeCredit dataset on the Kaggle platform as an example, this study applies feature-engineering techniques to develop a debt default early warning model aimed at identifying potential credit risks in advance. By combining in-depth optimization of IV and WOE values, logistic regression models and an LR-WOE model were constructed. Comprehensive evaluation using metrics such as PSI, KS statistic, and AUC scores ensured the robustness of the models' risk prediction accuracy. The research findings reveal: (1) Family structure plays a crucial role in credit risk assessment. Applicants with 0 or 7 dependents exhibit a higher probability of default compared to the overall sample, while those with 6 or 8 dependents demonstrate relatively lower default risk. (2) The constructed LR-WOE model performed the best, indicating its effectiveness in distinguishing borrowers with different credit profiles and maintaining stable predictive performance across various thresholds. Integrating WOE transformation techniques with logistic regression models can help financial institutions assess credit risks more accurately and optimize risk management strategies.
References
Zhang AQ, Fu HY, Wang JJ, et al. Establishing a nomogram to predict refracture after percutaneous kyphoplasty by logistic regression [J]. Frontiers in Neuroinformatics, 2023, 17.
Li HC, Shao YM, Jiang HL, et al. Research on machine learning method for disaster prediction caused by heavy rainfall in heilongjiang province [J]. Journal of Catastrophology, 2024, 39(03): 60-65.
Goel A, Gorse D. A comparison of deep and shallow models for the detection of induced seismicity [J]. Geophysical Prospecting, 2024, 72(01): 285-297.
Silva ECE, Lopes IC, Correia A, et al. A logistic regression model for consumer default risk [J]. Journal of Applied Statistics, 2020, 47(13-15): 2879-2894.
Wu Y, Pan YW. Application analysis of credit scoring of financial institutions based on machine learning model [J]. Complexity, 2021.
Dumitrescu E, Hué S, Hurlin, C, et al. Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects [J]. European Journal of Operational Research, 2022, 297(03): 1178-1192.
Wei Y, Hasan H. Application of logical regression function model in credit business of commercial banks [J]. Applied Mathematics and Nonlinear Sciences, 7(01): 513-522.
Saha P, Bose I, Mahanti A. A knowledge based scheme for risk assessment in loan processing by banks [J]. Decision Support Systems, 84: 78-88.
Mao Y, Chen WL, Guo BL, et al. A novel logistic regression model based on density estimation [J]. Acta Automatica Sinica, 40(01): 62-72.
Aneta WD, Mateusz H. An implementation of ensemble methods, logistic regression, and neural network for default prediction in peer-to-peer lending [J]. Zbornik radova Ekonomskog fakulteta u Rijeci : č asopis za ekonomsku teoriju ipraksu , 2021, 39(1): 163-197.
Liang S, Tan L. Research on the construction of personal credit score model based on WOE analysis and logistics model [J]. Academic Journal of Business Management, 2022, 4(10): 38-42.
Yao G, Hu XJ, Wang GX. A novel ensemble feature selection method by integrating multiple ranking information combined with an SVM ensemble model for enterprise credit risk prediction in the supply chain [J]. Expert Systems with Applications, 2022, 200.
Niu K, Zhang ZM, Liu Y, et al. Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending [J]. Information Sciences, 2020, 536:120-134.
Hwang RC, Chu CK, Yu KH. Predicting the loss given default distribution with the zero-inflated censored beta-mixture regression that allows probability masses and bimodality [J]. Journal of Financial Services Research, 2021, 59(03): 143-172.
Zhang H, Chen QG, Shen QJ. Applied condition of maximum likelihood estimated method in logistic regression for case - control study [J]. Chinese Journal of Health Statistics, 2006, 23(03): 206-208.
Jiang JY, Tao Q, Gao QK, et al. Dual coordinate descent method for solving AUC optimization problem [J]. Journal of Software, 2014, 25(10): 2282-2292.
Zhou YQ, Han DQ, Yang Y. A research on the influence of evidence distance selection on combination of conflict evidences [J]. Journal of Xi'an Jiaotong University, 2018, 52(06): 1-8.
Irwin JR, Irwin C T. Appraising credit ratings: does the cap fit better than the roc? [J]. International Journal of Finance Economics, 2013, 18(4): 396-408.
Barddal JP, Loezer L, Enembreck F, et al. Lessons learned from data stream classification applied to credit scoring [J]. Expert Systems with Applications, 2020, 162.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ran Meng-huan, Zhu Xiao-xin

This work is licensed under a Creative Commons Attribution 4.0 International License.