Original Research

Using machine learning to identify factors associated with practice location of the healthcare workforce


name here
Jerry Bounsanga1
MStat, Statistician

name here
Martin S Lipsky2
MD, Faculty

name here
Eric S. Hon3
AB, Manager

name here
Frank W Licari4
DDS, Dean

name here
Clark Ruttinger5
MBA, Research Director

name here
Andrew Salt6
BS, Research Specialist

name here
Man Hung7
PhD, Research Dean *


1 Quality Outcomes Research and Assessment, School of Medicine, University of Utah Health, Salt Lake City, UT 84108, USA

2 Roseman University of Health Sciences, South Jordan, UT 84095, USA

3 Department of Economics, University of Chicago, Chicago, IL 60637, USA

4, 7 College of Dental Medicine, Roseman University of Health Sciences, South Jordan, UT 84095, USA

5, 6 Utah Medical Education Council, Salt Lake City, UT 84102, USA

ACCEPTED: 12 October 2021

early abstract:

Purpose: Past studies examined factors associated with rural practice, but none employed newer machine learning (ML) methods to explore potential predictors. The primary aim of this study was to identify factors related to practice in a rural area. Secondary aims were to capture a more precise understanding of the demographic characteristics of the healthcare professions workforce in Utah in the United States of America (USA) and to assess the viability of ML as a predictive tool.
Methods: This study incorporated four datasets: the 2017 dental workforce, the 2016 physician workforce, the 2014 nursing workforce, and the 2017 pharmacy workforce, collected by the Utah Medical Education Council. Supervised ML techniques were used to identify factors associated with practice location, the outcome variable of interest.
Findings: The study sample consisted of 11,259 healthcare professionals with an average age of 46.6 years, of which 36.6% were males and 94.5% Caucasian. Four ML methods were applied to assess model performance by comparing accuracy, sensitivity, specificity and area under the receiver operating curve (ROC). Of the methods used, support vector machine performed the best (accuracy = 99.7%, precision =100%, sensitivity = 100%, specificity = 99.4%, and ROC = 0.997). The models identified income and rural upbringing as the top factors associated with rural practice.   
Conclusions: By far, income emerged as the most important factor associated with rural practice, suggesting that attractive income offers might help rural communities address health professional shortages. Rural upbringing was the next most important predictive factor, validating and updating earlier research. The performance of the ML algorithms suggests their usefulness as a tool to model other databases for individualized prediction.