Research

This page summarizes my research activities, including publications in academic journals and conference papers. Check out my public talks or follow me on Google Scholar to see all of my work.

This page overviews my research activities:

📚 Selected academic publications
📈 My citation statistics
📝 Outlets where I was acting as a reviewer

Publications

2024

Kozodoi, N., Lessmann, S., Alamgir, M., Moreira-Matias, L. Papakonstantinou, K.
Fighting Sampling Bias: A Framework for Training and Evaluating Credit Scoring Models

ArXiv preprint.

Abstract: Scoring models support decision-making in financial institutions. Their estimation and evaluation are based on the data of previously accepted applicants with known repayment behavior. This creates sampling bias: the available labeled data offers a partial picture of the distribution of candidate borrowers, which the model is supposed to score. The paper addresses the adverse effect of sampling bias on model training and evaluation. To improve scorecard training, we propose bias-aware self-learning - a reject inference framework that augments the biased training data by inferring labels for selected rejected applications. For scorecard evaluation, we propose a Bayesian framework that extends standard accuracy measures to the biased setting and provides a reliable estimate of future scorecard performance. Extensive experiments on synthetic and real-world data confirm the superiority of our propositions over various benchmarks in predictive performance and profitability. By sensitivity analysis, we also identify boundary conditions affecting their performance. Notably, we leverage real-world data from a randomized controlled trial to assess the novel methodologies on holdout data that represent the true borrower population. Our findings confirm that reject inference is a difficult problem with modest potential to improve scorecard performance. Addressing sampling bias during scorecard evaluation is a much more promising route to improve scoring practices. For example, our results suggest a profit improvement of about eight percent, when using Bayesian evaluation to decide on acceptance rates.

2023

Kozodoi, N., Zinovyeva, L., Valentin, S., Pereira, J., Agundez, R. (2023).
Probabilistic Demand Forecasting with Graph Neural Networks

In Workshop on ML for Irregular Time Series at ECML PKDD 2023.

Abstract: Demand forecasting is a prominent business use case that allows retailers to optimize inventory planning, logistics, and core business decisions. One of the key challenges in demand forecasting is accounting for relationships and interactions between articles. Most modern forecasting approaches provide independent article-level predictions that do not consider the impact of related articles. Recent research has attempted addressing this challenge using Graph Neural Networks (GNNs) and showed promising results. This paper builds on previous research on GNNs and makes two contributions. First, we integrate a GNN encoder into a state-of-the-art DeepAR model. The combined model produces probabilistic forecasts, which are crucial for decision-making under uncertainty. Second, we propose to build graphs using article attribute similarity, which avoids reliance on a pre-defined graph structure. Experiments on three real-world datasets show that the proposed approach consistently outperforms non-graph benchmarks. We also show that our approach produces article embeddings that encode article similarity and demand dynamics and are useful for other downstream business tasks beyond forecasting.

2022

Kozodoi, N. (2022).
Machine Learning for Credit Risk Analytics

PhD Thesis. Humboldt University of Berlin.

Abstract: The rise of machine learning (ML) and the rapid digitization of the economy has substantially changed decision processes in the financial industry. Financial institutions increasingly rely on ML to support decision-making. Credit scoring is one of the prominent ML applications in finance. The task of credit scoring is to distinguish between applicants who will pay back the loan or default. Financial institutions use ML to develop scoring models to estimate a borrower's probability of default and automate approval decisions. This dissertation focuses on three major challenges associated with building ML-based scorecards in consumer credit scoring: (i) optimizing data acquisition and storage costs when dealing with high-dimensional data of loan applicants; (ii) addressing the adverse effects of sampling bias on training and evaluation of scoring models; (iii) measuring and ensuring the scorecard fairness while maintaining high profitability. The thesis offers a set of tools to remedy each of these challenges and improve decision-making practices in financial institutions.

2021

Kozodoi, N., Jacob, J., & Lessmann, S. (2021).
Fairness in Credit Scoring: Assessment, Implementation and Profit Implications.

European Journal of Operational Research, 297, 1083-1094.

Abstract: The rise of algorithmic decision-making has spawned much research on fair machine learning (ML). Financial institutions use ML for building risk scorecards that support a range of credit-related decisions. Yet, the literature on fair ML in credit scoring is scarce. The paper makes three contributions. First, we revisit statistical fairness criteria and examine their adequacy for credit scoring. Second, we catalog algorithmic options for incorporating fairness goals in the ML model development pipeline. Last, we empirically compare different fairness processors in a profit-oriented credit scoring context using real-world data. The empirical results substantiate the evaluation of fairness measures, identify suitable options to implement fair credit scoring, and clarify the profit-fairness trade-off in lending decisions. We find that multiple fairness criteria can be approximately satisfied at once and recommend separation as a proper criterion for measuring the fairness of a scorecard. We also find fair in-processors to deliver a good balance between profit and fairness and show that algorithmic discrimination can be reduced to a reasonable level at a relatively low cost. The codes corresponding to the paper are available on GitHub.

2020

Kozodoi, N., Lessmann, S. (2020).
Multi-Objective Particle Swarm Optimization for Feature Selection in Credit Scoring.

In Workshop on Mining Data for Financial Applications at ECML PKDD 2020 (pp. 68-76). Springer, Cham.

Abstract: Credit scoring refers to the use of statistical models to support loan approval decisions. An ever-increasing availability of data on potential borrowers emphasizes the importance of feature selection for scoring models. Traditionally, feature selection has been viewed as a single-objective task. Recent research demonstrates the effectiveness of multi-objective approaches. We propose a novel multi-objective feature selection framework for credit scoring that extends previous work by taking into account data acquisition costs and employing a state-of-the-art particle swarm optimization algorithm. Our framework optimizes three fitness functions: the number of features, data acquisition costs and the AUC. Experiments on nine credit scoring data sets demonstrate a highly competitive performance of the proposed framework.

Kozodoi, N., Katsas, P., Lessmann, S., Moreira-Matias, L., & Papakonstantinou, K. (2020).
Shallow Self-Learning for Reject Inference in Credit Scoring.

In ECML PKDD 2019 Proceedings (pp. 516-532). Springer, Cham.

Abstract: Credit scoring models support loan approval decisions in the financial services industry. Lenders train these models on data from previously granted credit applications, where the borrowers’ repayment behavior has been observed. This approach creates sample bias. The scoring model is trained on accepted cases only. Applying the model to screen applications from the population of all borrowers degrades its performance. Reject inference comprises techniques to overcome sampling bias through assigning labels to rejected cases. This paper makes two contributions. First, we propose a self-learning framework for reject inference. The framework is geared toward real-world credit scoring requirements through considering distinct training regimes for labeling and model training. Second, we introduce a new measure to assess the effectiveness of reject inference strategies. Our measure leverages domain knowledge to avoid artificial labeling of rejected cases during evaluation. We demonstrate this approach to offer a robust and operational assessment of reject inference. Experiments on a real-world credit scoring data set confirm the superiority of the suggested self-learning framework over previous reject inference strategies. We also find strong evidence in favor of the proposed evaluation measure assessing reject inference strategies more reliably, raising the performance of the eventual scoring model.

2019

Kozodoi, N., Lessmann, S., Papakonstantinou, K., Gatsoulis, Y., & Baesens, B. (2019).
A Multi-Objective Approach for Profit-Driven Feature Selection in Credit Scoring.

Decision Support Systems, 120, 106-117.

Abstract: In credit scoring, feature selection aims at removing irrelevant data to improve the performance of the scorecard and its interpretability. Standard techniques treat feature selection as a single-objective task and rely on statistical criteria such as correlation. Recent studies suggest that using profit-based indicators may improve the quality of scoring models for businesses. We extend the use of profit measures to feature selection and develop a multi-objective wrapper framework based on the NSGA-II genetic algorithm with two fitness functions: the Expected Maximum Profit (EMP) and the number of features. Experiments on multiple credit scoring data sets demonstrate that the proposed approach develops scorecards that can yield a higher expected profit using fewer features than conventional feature selection strategies.

Kozodoi, N., Lessmann, S., Baesens, B., & Papakonstantinou, K. (2019).
Profit-Oriented Feature Selection in Credit Scoring Applications.

In Operations Research 2018 Proceedings (pp. 59-65). Springer, Cham.

Abstract: In credit scoring, feature selection aims at removing irrelevant data to improve the performance of the scorecard and its interpretability. Standard feature selection techniques are based on statistical criteria such as correlation. Recent studies suggest that using profit-based indicators for model evaluation may improve the quality of scoring models for businesses. We extend the use of profit measures to feature selection and develop a wrapper-based framework that uses the Expected Maximum Profit measure (EMP) as a fitness function. Experiments on multiple credit scoring data sets provide evidence that EMP-maximizing feature selection helps to develop scorecards that yield a higher expected profit compared to conventional feature selection strategies.

2018

Artinger, F., Kozodoi, N., Wangenheim, F., Gigerenzer, G. (2018).
Recency: Prediction with Smart Data.

In 2018 AMA Winter Academic Conference Proceedings (pp. L2-L6).

Abstract: Since the early 1910s, managers have been using a simple recency-based decision strategy, the hiatus heuristic, to identify valuable customers. This study analyses the role of recency using a library of 60 data sets from business and other areas including weather, sports, and medicine. We find that the hiatus heuristic outperforms complex algorithms from machine learning, stochastic and econometric models in many of these environments. Moreover, if one includes further variables apart from recency in the complex algorithms, their performance does not improve. We show that the results are not so much driven by limited sample size than by the dominant role that recency plays in most of these environments. We conclude that less can be more, that is, relying on smart data such as recency can yield powerful predictions.

Citations

Reviews

I have been acting as an expert reviewer at the following academic outlets: