Q&A with Members: George Ofosu and Dan Posner

November 16, 2021

In today’s Q&A with Members post, we feature EGAP members George Ofosu (LSE) and Dan Posner (UCLA). We asked them about their paper “Pre-Analysis Plans: An Early Stocktaking,” published in Perspectives on Politics in 2021. The paper is a high-level look at the state of pre-analysis plans submitted during the first few years that preregistration became established as a norm for experimental work in political science. It will be followed in the coming months by a second paper looking at pre-analysis plans submitted in recent years.

This is the first paper in a two-paper series looking at studies registered with EGAP and the AEA. This paper focuses on pre-analysis plans (PAPs) submitted with registrations in the early years of each registry’s existence, and the forthcoming second paper will focus on PAPs from more recent registrations. What was the impetus for taking on the project of comparing PAPs over time?

George Ofosu and Dan Posner: This paper focused on the early days of PAPs, from the first registrations in 2011 up through the end of 2016. We analyzed a random sample of 196 PAPs from the EGAP and AEA registries, stratified by registration year, registry, and whether the PAP was initially private/gated. Half of our sample was of PAPs that had resulted in a publicly available working paper or published article, so that we could analyze how faithfully the working paper/article hewed to what was pre-registered. The main take-away of our analysis was that PAPs registered during that period varied significantly in their clarity and comprehensiveness — and thus in their ability to tie researchers’ hands and generate more credible findings. Readers of that paper might wonder, however, whether the shortcomings in some of the PAPs we analyzed might simply have reflected the early days of PAP writing, when researchers were still figuring out how to write PAPs and what should go in them. We decided it would be worthwhile to use our same coding rubric to analyze a more recent sample of PAPs, and to compare patterns across the early and present periods.

What are the primary conclusions from your paper looking at the first few years of PAP submissions? What stands out most regarding the state of PAP generation in this stocktaking?

GO and DP: PAPs are a potentially powerful tool for reducing “fishing” (when researchers select models in order to generate positive findings) and “HARKing” (hypothesizing after the results are known), and thus for promoting research credibility. But they can only play this role if they are sufficiently clear, precise, and comprehensive to meaningfully tie researchers’ hands, and if researchers engage with what they pre-registered in the papers they write. Our main finding is that many PAPs are written in a way that permits significant latitude for researchers to select which findings to present, and that many papers that result from pre-registered designs fail to follow what was pre-specified (or, since slavishly sticking to one’s PAP is neither necessary nor always desirable, at least engage with what was pre-registered to highlight and explain the rationale for the deviations). While it is true that even weak PAPs improve research credibility by reducing what Simmons et al. (2011) call “researcher degrees of freedom,” strengthening them will go a long way toward improving the credibility of research findings.

Of course, writing a PAP is time consuming and (it is often claimed) denies researchers the opportunity to learn unexpected things from one’s data, so the benefits a PAP may bring for improving research credibility must be balanced against these shortcomings. We discuss this tradeoff at length in the paper and argue that the benefits are worth the additional effort. We also push back against the perception that pre-registration limits a researcher’s ability to explore her data. It just requires that the researcher distinguish in her write-up between the analyses that were pre-specified and those that were exploratory (and thus require further testing).

What role does the review process at academic journals play in establishing norms related to study pre-registration?

GO and DP: As we stress in the paper, writing a PAP makes it possible to identify fishing and HARKing. But actually identifying such practices requires painstaking comparisons of what was pre-registered with what appears in the resulting paper. Such policing is tedious and time consuming, and the disciplines currently provide little incentives for undertaking such work. The journal review process might play a role in policing, but the demands that this would place on journal editors and reviewers may be too great for this to be a feasible approach. At the very least, journals can encourage pre-registration (as many do) and require authors submitting papers to include (redacted versions of) their PAPs with their submissions. This would make it easier for reviewers who were so inclined to assess whether the paper deviated from what was pre-registered and evaluate whether these deviations are sufficiently documented and well-justified.

Requiring links to the PAP in the published paper would also generate incentives for authors to stay faithful to what they pre-registered, lest their deviations — or at any rate those they do not discuss in the paper — provide fodder for replications by graduate students and others curious to see whether the results hold in their original specifications. Adopting this simple norm seems to us to be a more promising and viable approach to harnessing the review process than requiring that reviewers, who already do yeoman’s work for free, to police whether paper authors are following their PAPs.

Another approach we discuss in the paper is providing greater opportunities for researchers who write PAPs to receive feedback on their study designs. EGAP is already a leader in this effort by reserving slots at its meetings for the discussion of PAPs alongside finished papers. Making such presentations more broadly accepted at invited talks or at professional meetings would increase the benefits of writing PAPs and go a long way toward creating incentives for researchers to invest in them.

In the paper, you discuss the lack of standards regarding PAPs. From your analysis, what do you consider to be the key elements of an effective pre-analysis plan?

GO and DP: The lack of common disciplinary standards for what should be included in a PAP is an obstacle to both producing better PAPs and generating incentives for researchers to invest the considerable effort it takes to produce and police them. Our view is that, at minimum, an effective PAP should include four elements: 1) a clearly specified research hypothesis; 2) a clearly specified independent/treatment variable; 3) a clearly specified dependent variable; and 4) a description of the precise statistical model that will be tested. By “clearly specified” we mean a hypothesis or independent/dependent variable that leaves no room for post-hoc adjustment. An effective PAP should be like a recipe that, if followed by a competent researcher, would produce the same analyses and results if given the same data.

The analysis we present in the Perspectives paper finds that just about a half of PAPs meet these four criteria of a complete PAP. We are eager to see whether this has changed in the more recent period.

Looking ahead to the next paper on PAPs submitted more recently, what outstanding questions are you most excited to answer?

GO and DP: First, we are excited to learn how the patterns we found in the early period compare with what we find in the more recent years. To make such comparisons meaningful, we are investing a lot of energy in inter-coder reliability, having the same coders analyze PAPs from the early and recent periods to rule out that differences are driven by different subjective codings.

To the extent that we find differences in the quality of PAPs across the early and more recent periods, a key issue will be to explain what lies behind the change. One of the aspects of our analysis that we are especially excited about is our ability, through data we have compiled on PAP authors’ prior experience with pre-registration (specifically, whether, for each PAP in our sample, it is the author’s first, second, fifth, or tenth PAP), to say something about learning — and, possibly also, pre-registration fatigue. Many of the PAPs in our 2011-2016 sample were their author’s first handful of pre-registrations. By the more recent period, we see a mix of “veteran” PAP users and novices. Our data and coding will allow us to compare PAP quality across these types to say something about the adoption of this relatively new form of scholarly activity.