Are Your Multiple-Choice Tests "FIT"? Using the Fairness of Items Tool (FIT) as a Component of the Test Development Process

Monday, 9 November 2015: 2:25 PM

Nikole Anderson Hicks, PhD, MSN, BSN, RNC, CNE
College of Nursing, University of Cincinnati, Cincinnati, OH, USA


This presentation will discuss the development, validation, and implementation of the Fairness of Items Tool (FIT) for its use by nurse educators as a component of the test development process to improve the quality of multiple-choice examinations. The FIT provides clear and concise guidelines for nursing faculty to use in developing unbiased test items.


Multiple-choice examinations are a common assessment method used in nursing programs, and conclusions based on these assessments have high stakes consequences. Faculty members therefore have an obligation to ensure that tests are valid and reliable assessments of student learning. For a test to be fair, valid, and reliable, it must contain well-written items. Constructing and revising test items is difficult and time consuming, and nursing faculty members lack adequate preparation and sufficient time for examination construction and analysis. Published guidelines are available to assist faculty in creating examination items; however, assessments and textbook item banks contain violations of these guidelines, resulting in the administration of assessments containing flawed test items. Developing clear and concise guidelines for nursing faculty to use in developing unbiased test items is one strategy that may improve the quality of nursing assessments, thereby improving the quality of the decisions made based on these assessments.


Development and validation of the FIT was a three-phase process grounded in two theoretical frameworks adapted for this research study: the Revised Framework for Quality Assessment and the Conceptual Model for Test Development. In the first phase, the tool was developed by the primary investigator through an extensive review of published higher education and nursing literature related to item-writing rules, examination bias, and cultural bias. Phases two and three used systematic methods to establish the validity and reliability of the FIT. In phase two, content validity and face validity were established through review by a panel of item-writing experts. In phase three, multiple measures were used to establish reliability and construct validity through testing of the FIT by nursing faculty (N = 488) to evaluate sample MCQs.

The sample for this research study was drawn from a list of 5,786 names and email addresses systematically sampled from AACN member school websites. Inclusion criteria included active teaching in a nursing program and utilization of faculty-generated MC examinations for student assessment. Faculty-generated MC examinations include those that are developed by faculty through writing new test items, using test bank items, revising test items from any source, or any combination of these activities. Overall, the demographic characteristics of the sample population were fairly representative of the general nursing faculty population, consisting primarily of educated white females over age 45. The sample population was more likely to have doctoral preparation, full-time and tenured or tenure track status, certification in academic nursing education, and hold higher academic rank than the general nursing faculty population. Males were slightly overrepresented in the sample, while African Americans were underrepresented. The sample represented all regions in the United States, over 162 programs of nursing, and diverse clinical specialties.


The results of this research study support the hypothesis that the FIT is a valid and reliable tool for identifying bias in MCQs as a component of a systematic process for test development. The known groups comparison supported the validity of the FIT as a measure of item bias. Tests for independence demonstrated that FIT scores are not affected by demographic variables. Analysis of agreements provided strong support for equivalence, and the KR20 supported the stability of the FIT. Cronbach alpha correlation coefficients demonstrated adequate reliability for a newly developed tool. This research study also demonstrated that participants made similar decisions when using the FIT to evaluate MCQs.


Nurse educators can use the FIT as a component of the test development process to improve the quality of multiple-choice examinations. The FIT provides clear and concise guidelines for writing MCQs and revising textbook test bank items. The FIT provides a means to facilitate systematic research to validate guidelines and testing procedures and to improve the quality of MC test items. Improving the quality of examinations has the potential to improve student success and better prepare graduates for licensure and certification examinations, indirectly increasing the quality, quantity, and diversity of nurses joining the workforce.   

Note: This research study used Research Electronic Database Capture (REDCap), a secure, web-based survey tool and database, supported by Center for Clinical and Translational Science and Training grant UL1-RR026314.