Evaluation of Interrater Reliability on a Clinical Judgment Rubric: A Tale of Three Experts

Friday, April 4, 2014

Mary A. Cazzell, PhD, RN
Cook Children's Medical Center, Fort Worth, TX
Mindi Anderson, PhD, RN, CPNP-PC, CNE, CHSE, ANEF
UTA College of Nursing, University of Texas at Arlington College of Nursing, Arlington, TX
Linda Frye, MSN, RN
Weatherford Community College, Weatherford, TX
Tim Taylor, BSN, RN
Director Critical Care & Cardiology, Methodist Mansfield Medical Center, Mansfield, TX

Introduction: The purpose was to evaluate interrater reliability of the Lasater Clinical Judgment Rubric (LCJR) used to evaluate nursing student performances during a pediatric medication administration Objective Structured Clinical Evaluation (OSCE). The science of nursing education research in simulation can only be advanced when psychometrically established measures are used.

Methods: Standardized rater training was provided to three raters using a LCJR training video. The raters of varying backgrounds (academic versus clinical)  scored 160 videotaped OSCEs of senior-level nursing students performing pediatric medication administration using an OSCE checklist correlated to indicators of clinical judgment on the LCJR. The LCJR includes 11 items to rate clinical judgment in four areas of effectiveness (Beginning, Developing, Accomplished, and Exemplary) under four major categories (Noticing, Interpreting, Responding, and Reflecting). 

Results: Moderate interrater reliability (ICC = 0.53) was obtained for total LCJR scores by all three raters. Scoring by two raters (academic and clinical) achieved the strongest interrater reliability results for: Information Seeking (ICC= 0.75), Making Sense of Data (ICC=0.97), and Interpreting (ICC=0.76). The lowest interrater reliability findings were for Prioritizing Data across and between all raters (ICC=0.05). Using paired-sample t tests, the two raters (academic vs. clinical) demonstrated no significant differences in scoring psychomotor skills (hand hygiene/gloving, intravenous and oral medication administration), affective domain skills (communication, professional behaviors and dress), or total LCJR scores.

Discussion/Conclusion: The strongest interrater reliability statistics were for “yes/no” performance items. The lowest scores across and between all raters were for checking for medications that were due (Prioritizing Data). Considerations for establishing interrater reliability of clinical judgment tools must include: clinical versus academic background of raters, correlation of a simulation scenario to concepts measured by the evaluation instrument, complexity of checklist and/or overlapping of scoring rubric categories, and consistency of rater training related to expected benchmarks for student population.

See more of: Poster Presentations
See more of: Oral Paper & Posters