Methods: The following 4 topics will be addressed:
Data Acquisition and Management: From ethics approval to ensuring individual patient privacy to preventing undesired user access, collecting and storing “big data” is no simple task. The presenter will provide: (a) an overview of key concepts, (b) an exemplar for constructing a data acquisition and management team, and (c) several resources for learning more independently.
Missing Data: Almost all large datasets contain some amount of missing data. Regardless of the amount, finding the cause of missingness is of paramount importance. Approaches to determining a cause will be introduced, and disadvantages of complete case analysis will be described. Advantages and disadvantages of median imputation, multiple imputation, and machine learning imputation will be compared.
Statistical Model Assumptions: There are a variety of statistical models available, and with recent advances in machine learning methods, more approaches to retrieve information from the data are available to a wide array of users. An overview of the purpose and requirements of traditional modeling (e.g., logistic and linear regression) and machine learning approaches (e.g., random forests and cluster analyses) will be provided.
Model Evaluation: Determining how well a model performs on the current data and how well it is expected to perform on future data is essential in determining whether or not the model is helpful for clinical care. Internal (e.g., bootstrapping and cross-validation) versus external validation (e.g., split sample and chronological validation) techniques will be presented along with their respective advantages and disadvantages.
Results: Our in-hospital cardiopulmonary arrest prediction model required a team-based approach to solving the aforementioned challenges, and the audience will hear not only how we chose to solve the problems but also other approaches we considered. From the perspective of data acquisition/management, we found the best approach to be the inclusion of database and informatics specialists who used structured query language to extract the relevant data and then store it on a secure, organizational server. Following a simulation study, we discovered the missing data problem was best resolved by creating a multiple imputation model that included the outcome variable. Statistical model assumptions were best met by not assuming linearity while not permitting too many spline knots. Model evaluation comprised internal bootstrap validation for the regression models and split-sample validation for the machine learning methods.
Conclusion: Arriving at clinically meaningful insights contained within large datasets requires multifaceted expertise and teamwork. Nurses and other clinicians are the best members of the team to identify a problem that “big data” can help solve. To ensure a clinically meaningful solution surfaces from big data efforts, nurses should be aware of common challenges in big data research. As nurses become more knowledgeable, they position themselves to be leaders in these research teams and advocates for implementation of novel findings.
See more of: Research Sessions: Oral Paper & Posters