OBJECTIVE: Comparing observed and expected distributions of baseline continuous variables in randomized controlled trials (RCTs) can be used to assess publication integrity. We explored whether baseline categorical variables could also be used.
STUDY DESIGN AND SETTING: The observed and expected (binomial) distribution of all baseline categorical variables were compared in four sets of RCTs: two controls, and two with publication integrity concerns. We also compared baseline calculated and reported p-values.
RESULTS: The observed and expected distributions of baseline categorical variables were similar in the control datasets, both for frequency counts (and percentages) and between-groups differences in frequency counts. However, in both sets of RCTs with publication integrity concerns, about twice as many variables as expected had between-group differences in frequency counts of 1 or 2, and far fewer variables than expected had between-group differences of >4 (P<0.001 for both datasets). Furthermore, about 1 in 6 reported p-values for baseline categorial variables differed by >0.1 from the calculated p-value in trials with publication integrity concerns.
CONCLUSION: Comparing the observed and expected distributions and reported and calculated p-values of baseline categorical variables may help in the assessment of publication integrity of a body of RCTs.
Bibliographical noteFunding: This research received no specific funding. MB is a recipient of an HRC Clinical Practitioners Fellowship. The Health Services Research Unit is funded by the Chief Scientist Office of the Scottish Government Health and Social Care Directorates. The authors are independent of the HRC. The HRC had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
- statistical methods
- research integrity
- categorical variables
- data intergrity
- fabricated data