Original Research

Using machine learning to identify factors associated with practice location of the healthcare workforce

AUTHORS

name here
Jerry Bounsanga
1 MStat, Statistician ORCID logo

name here
Martin S Lipsky
2 MD, Faculty ORCID logo

name here
Eric S. Hon
3 AB, Manager ORCID logo

name here
Frank W Licari
4 DDS, Dean ORCID logo

name here
Clark Ruttinger
5 MBA, Research Director ORCID logo

name here
Andrew Salt
6 BS, Research Specialist

name here
Man Hung
7 PhD, Research Dean * ORCID logo

AFFILIATIONS

1 Quality Outcomes Research and Assessment, School of Medicine, University of Utah Health, Salt Lake City, UT 84108, USA

2 Roseman University of Health Sciences, South Jordan, UT 84095, USA

3 Department of Economics, University of Chicago, Chicago, IL 60637, USA

4, 7 College of Dental Medicine, Roseman University of Health Sciences, South Jordan, UT 84095, USA

5, 6 Utah Medical Education Council, Salt Lake City, UT 84102, USA

ACCEPTED: 12 October 2021


early abstract:

Purpose: Past studies examined factors associated with rural practice, but none employed newer machine learning (ML) methods to explore potential predictors. The primary aim of this study was to identify factors related to practice in a rural area. Secondary aims were to capture a more precise understanding of the demographic characteristics of the healthcare professions workforce in Utah in the United States of America (USA) and to assess the viability of ML as a predictive tool.
Methods: This study incorporated four datasets: the 2017 dental workforce, the 2016 physician workforce, the 2014 nursing workforce, and the 2017 pharmacy workforce, collected by the Utah Medical Education Council. Supervised ML techniques were used to identify factors associated with practice location, the outcome variable of interest.
Findings: The study sample consisted of 11,259 healthcare professionals with an average age of 46.6 years, of which 36.6% were males and 94.5% Caucasian. Four ML methods were applied to assess model performance by comparing accuracy, sensitivity, specificity and area under the receiver operating curve (ROC). Of the methods used, support vector machine performed the best (accuracy = 99.7%, precision =100%, sensitivity = 100%, specificity = 99.4%, and ROC = 0.997). The models identified income and rural upbringing as the top factors associated with rural practice.   
Conclusions: By far, income emerged as the most important factor associated with rural practice, suggesting that attractive income offers might help rural communities address health professional shortages. Rural upbringing was the next most important predictive factor, validating and updating earlier research. The performance of the ML algorithms suggests their usefulness as a tool to model other databases for individualized prediction.