Purpose: The purpose of the study was to integrate several hospital data systems into a case-controlled database to use big data analytics for the identification of significant CLABSI attributes and develop time-varying patient risk scores for CLABSI.
Methods: The study incorporated a case-control study design using heterogeneous data collected from medical records for patients with CLABSI among the total number of patients with central lines. A database was created using pertinent variables of interest as informed from literature review and content experts. Training and testing sets were created, and multivariate logistic regressions were used to fit the binary responses (positive CLABSI vs negative CLABSI) to the attributes of the training set. The trained model was then used to classify the cases in the test set. Furthermore, the Cox Proportional Hazard Model (PHM) was used to infer the hazard rate and risk score for each patient during hospitalization. Due to the high imbalance between CLABSI and No-CLABSI incidents, an oversampling method was performed to generate a balanced dataset.
Results: Between January 2015 until August 2016, there were a total of 5,779 instances of central line cases associated with 3,947 patients, out of which 96 were positive CLABSI cases. Significant attributes for CLABSI cases were the ICU location (P = 0.008), time from insertion to CLABSI occurrence (P = < .001), the numbers of surgeries (P = 0.003), and the number of central line manipulations (P = 0.003). Multivariate logistic regression and the Cox PHM provided useful information for patient hazard rates and risk scores.
Conclusions: Data analytic techniques can be used to identify significant risk factors for CLABSI such as ICU location, time from insertion to CLABSI occurrence, number of surgeries, and number of central line manipulations. Quantitative techniques such as the Cox PHM can be utilized to develop patient time-varying risk scores for CLABSI. Patient risk scores can assist healthcare professionals evaluate a patient's risk for CLABSI and determine the need for preventative care. Further research using big data analytics has the potential to further these pilot results.