Patient-Reported Outcome Measures and Test-Retest Reliability in Older Adults: A Systematic Review

Monday, 23 July 2018: 10:20 AM

Myung Sook Park, PhD
Konkuk University, Chungju, Korea, Republic of (South)
Kyung Ja Kang, PhD
Jeju National University, Jeju, Korea, Republic of (South)
Sun Joo Jang, PhD
Eulji University College of Nursing, Daejeon, Korea, Republic of (South)
Sun Ju Chang, PhD, RN
College of Nursing & The Research Institute of Nursing Science,, Seoul National University, Seoul, Korea, Republic of (South)

Purpose: As a patient-reported outcome has been regarded as useful information to evaluate health-related outcomes, the importance of patient-reported outcome measures has come to the fore in recent years. With this interest in patient-reported outcome measures, researchers have been paying attention to ensure the measure’s psychometric properties, including reliability and validity. With regard to reliability, test-retest reliability for evaluating the measure’s stability could be affected by some factors, including time interval between two administrations, participants’ characteristics such as age, and statistical methods. Although diverse patient-reported outcome measures have been used in healthcare environments for older adults, there is lack of evidence for applying test-retest reliability in older adults. Hence, this systematic review tried to evaluate current literature related to the quality of test-retest reliability including time interval, sample size, and statistical methods used in patient-reported outcome measures in older adults.

Methods: This study was guided by the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) checklist (Equator Network, 2013) and the systematic review guidelines suggested by the National Evidence-based Healthcare Collaborating Agency (NECA) in Korea. From January 1, 2000 to August 10, 2017, the four electronic databases (MEDLINE, Embase, CINAHL, and PsycINFO) were searched using a combination of keywords that were decided based on the PICO statement. Studies published in English in peer-reviewed journals, targeted towards older adults aged 65 years or older, and assessed test-retest reliability of patient-reported outcome measures were included for this systematic review. To find eligible studies, four researchers independently applied three steps—including identifying duplicated studies, reviewing the title and abstract of the studies, and reviewing the full text of the studies. Quality assessment of the identified studies was conducted using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) by the researchers independently. The data was extracted based on the PICO frameworks and synthesized using a narrative synthesis method.

Results: Among 12,641 studies retrieved from the databases, 95 studies were finally selected for this systematic review. The methodological quality of the 95 studies were rated as 1 for good, 73 for fair, and 21 for poor. The most frequently used time interval between the first and second administrations for the test-retest reliability was between 14 to 20 days (41.1%). The median time interval, calculated using 74 studies that reported a single or average time interval, was 14 days. Regarding sample size for the test-retest reliability, the range was 10 to 663, and the median ratio of the number of items to sample size for the test-retest reliability was 1:2.6. Finally, the most frequently used statistical methods for calculating test-retest reliability that used continuous scores was the intraclass correlation coefficient (ICC) (71.5%), followed by correlation coefficients (27.2%), and Kappa coefficients (1.3%). However, among the 63 studies that used ICC, 21 and 30 were described as ICC’s model and 95% confidence interval, respectively.

Conclusion: The finding on the median time interval of 14 days was consistent with current suggestions. That is, the findings of this systematic review did not support the evidence on the need to consider the time interval for test-retest reliability in older adults. The finding that ICC was the most frequently used statistical method for continuous scores agrees with previous studies as well as current suggestions. However, as the studies that reported ICC’s model and 95% confidence interval were relatively low, researchers should be aware of those reports.