My portfolio includes three data science projects on different topics focusing on tabular data, computer vision and package development. To see more of my work, visit my GitHub page or download my CV.
Profit-driven demand forecasting with gradient boosting
- developed a two-stage demand forecasting pipeline with LightGBM models
- performed a thorough cleaning, aggregation and feature engineering on transactional data
- implemented custom loss functions aimed at maximizing the retailer's profit
Forecasting demand is an important managerial task that helps to optimize inventory planning. The optimized stocks can reduce retailer's costs and increase customer satisfaction due to faster delivery time. This project uses historical purchase data to predict future demand for different products.
The project pipeline includes several crucial steps:
- thorough data preparation, cleaning and feature engineering
- aggregation of transactional data into the daily format
- implementation of custom profit-driven loss functions
- two-stage demand forecasting with LightGBM models
- hyper-parameter tuning with Bayesian algorithms
- stacking ensemble to further maximize the performance
A detailed walkthrough is provided in this blog post.
Catheter and line position detection with deep learning
- built deep learning models to detect catheter and tube position on X-ray images
- developed a comprehensive PyTorch GPU/TPU computer vision pipeline
- finished in the top-5% of the Kaggle competition leaderboard with silver medal
Hospital patients can have catheters and tubes inserted during their admission. If tubes are placed incorrectly, serious health complications can occur later. Deep learning helps to automate detection of malpositioned tubes, which allows to reduce the workload of clinicians and prevent treatment delays.
This project works with a dataset of 30,083 high-resolution chest X-ray images. The images have 11 binary labels indicating normal, borderline or abnormal placement of endotracheal tubes, nasogastric tubes, central venous catheters and Swan-Ganz catheters.
Within the project, I develop a comprehensive GPU/TPU image processing and modeling pipeline written in PyTorch. The solution builds an ensemble of seven CNN models that reaches the test mean AUC of 0.971 and places in the top-5% among the 1,549 competing teams. The code is documented and published on GitHub.
fairness: Package for computing algorithmic fairness metrics
- developing and actively maintaining an R package for fair machine learning
- the package offers calculation, visualization and comparison of algorithmic fairness metrics
- the package is published on CRAN and has more than 11k total downloads
How to measure fairness of a machine learning model? To date, a number of algorithmic fairness metrics have been proposed. Demographic parity, proportional parity and equalized odds are among the most commonly used metrics to evaluate group fairness in binary classification problems.
Together with Tibor V. Varga, we developed the
fairness R package for fair machine learning. The package offers tools to calculate, visualize and compare commonly used metrics of algorithmic fairness across the sensitive groups. After publishing the package on CRAN in 2019, I have been actively working on maintaining the package and extending its functionality. The comprehensive overview of
fairness is provided in this blog post.