C1 Background and Explanation of Rationale 
A long strain of research in political science has shown how the responses of interviewees in facetoface and telephone surveys can vary depending on the race or gender of the interviewer (Davis, 1997; Davis & Silver, 2003; Hatchett & Schuman, 1975; Cotter et al., 1982; Reese et al., 1986; Huddy et al., 1997). This variation means that the inferences researchers draw and the replicability of the study can depend on who runs the study. Regardless of whether the relationship between characteristics of the interviewer and the observed variation in responses is perceived to be a benefit 1 or a threat, 2 it is useful for researchers to be cognizant of the circumstances in which we might expect this relationship to be most likely to occur. For example, it is often suggested that responses in online surveys are less likely to be affected by attributes of the researcher than in other survey methods.
In this paper we explore how attributes of the researcher affect responses in online surveys. In particular, we use a survey experiment that explicitly manipulates the race and gender cued by the researcher name on the informed consent page. The informed consent page is generally required by the Institutional Review Board (IRB) of research universities to be displayed at the start of every internetbased survey. Manipulating the researcher name allows us to test how information conveyed about the race and gender of the researcher through the informed consent page affects survey responses. We focus on gender and race because these two factors can be clearly conveyed through names (Bertrand & Mullainathan, 2004; Milkman et al., 2012), and have been central to the existing literature on surveys and identity. The experiment is a 2x2 factorial design, where the first factor is the putative gender of the investigator (male or female) and the second is the investigator’s putative race (white or black).

C3 How will these hypotheses be tested? * 
Study 1: For Hypothesis 1, we estimate two separate treatment effects. The first is the effect of assignment to a putatively female name on the probability that a respondent indicates that they believe that women should have an equal role in the workforce. The second is the effect of assignment to a putatively black name on the respondent’s racial resentment scale. We expect that the effect for the former will be positive while the latter will be negative. For Hypothesis 2, we will estimate the effect of assignment to a putatively white and male name on the probability that a respondent correctly completes both of the attention check assignments. We expect this effect estimate to be positive. For estimation, we will fit a linear probability model of the outcome on treatment and compute standard errors via a nonparametric bootstrapping procedure. While not needed for identification, we will include respondentlevel covariates (e.g. gender, income, education) in the regression model in order to increase the efficiency of our estimator. Because respondents have the option to stop taking the survey after treatment is assigned, there is concern that an analysis conditional on survey completion will be biased for the average treatment effect if treatment also affects the probability that a respondent will drop out. To obtain unbiased treatment effect estimates in this situation, we adopt an estimation strategy similar to that of Rotnitzky & Robins (1995) and weight each respondent observation in the outcome regression by its estimated probability of not dropping out of the sample. We estimate this probability via a logistic regression of completion on treatment using the entire set of respondents (those that both completed and did not complete the survey). Our rejection levels for twosided hypothesis tests of whether the average treatment effects differ from zero are calibrated to correct for problems of multiple testing. We are willing to tolerate an overall Type I error rate of α = .05. With three main hypothesis tests, we could obtain a conservative rejection threshold for each individual hypothesis test of .05/3 = .017 using the Bonferroni correction. This controls the Familywise Type I Error Rate and guarantees that the probability of any single erroneous rejection in the set of tests is less than or equal to .05. However, this approach sacrifices a significant amount of power. A less conservative but more powerful approach is to set a rejection threshold to control the False Discovery Rate (FDR). We use the BenjaminiHochberg procedure to set a rejection level for the hypothesis tests (Benjamini & Hochberg, 1995). This entails a twostep procedure where we order the 3 pvalues of the individual hypothesis tests from smallest to largest, p (1) , ..., p (3) and then set our rejection level to p (k) , where k is the largest value of i that satisfies p (i) ≤ 3 i α. This procedure controls the expected share of false hypothesis rejections out of the total number of rejections to be no greater than .05. We do not specify any exante interactions of the treatment effects with baseline covari ates. However, because the mechanism through which any treatment effects operate are of significant interest, we will conduct exploratory analyses of potential treatment effect hetero geneity by estimating models with interactions between treatment and respondent identity variables. Among other interactions, we are interested in seeing whether any average treat ment effect is primarily driven by behavior changes among men (in the case of the gender treatment) and white respondents (in the case of the race treatment). We will attempt to replicate any promising results from these exploratory analyses in a followup experiment that explicitly registers interactive hypotheses prior to the experiment.
In this preregistration plan, we outline an experiment that tests whether or not investigator characteristics have an effect on subjects responses and subject effort. This design permits direct tests of these hypotheses. In addition to these primary hypothesis tests, we hope to conduct an exploratory analyses of heterogeneous treatment effects, which will serve as the basis for a second experiment to test the mechanisms that we hope to identify in the experiment laid out in this plan. The second experiment will be preregistered separately, given that its design depends on the results of the experiment outlined here.
Study 2: For Hypothesis 1, we estimate two separate treatment effects. 3 The first is the effect of assignment to a putatively female name on the probability that a respondent indicates that they believe that women should have an equal role in the workforce. The second is the effect of assignment to a putatively black name on the respondent’s racial resentment scale. We expect that the effect for the former will be positive while the latter will be negative. For Hypothesis 2, we will estimate the effect of assignment to a putatively white and male name on the probability that a respondent correctly completes both of the attention check assignments. We expect this effect estimate to be positive. For estimation, we will fit a linear probability model of the outcome on treatment and compute standard errors via a nonparametric bootstrapping procedure. While not needed for identification, we will include respondentlevel covariates (e.g. gender, income, education) in the regression model in order to increase the efficiency of our estimator. Our rejection levels for twosided hypothesis tests of whether the average treatment effects differ from zero are calibrated to correct for problems of multiple testing. We are willing to tolerate an overall Type I error rate of α = .05. With three main hypothesis tests, we could obtain a conservative rejection threshold for each individual hypothesis test of .05/3 = .017 using the Bonferroni correction. This controls the Familywise Type I Error Rate and guarantees that the probability of any single erroneous rejection in the set of tests is less than or equal to .05. However, this approach sacrifices a significant amount of power. A less conservative but more powerful approach is to set a rejection threshold to control the False Discovery Rate (FDR). We use the BenjaminiHochberg procedure to set a rejection level for the hypothesis tests (Benjamini & Hochberg, 1995). This entails a twostep procedure where we order the 3 pvalues of the individual hypothesis tests from smallest to largest, p (1) , ..., p (3) and then set our rejection level to p (k) , where k is the largest value of i that satisfies p (i) ≤ 3 i α. This procedure controls the expected share of false hypothesis rejections out of the total number of rejections to be no greater than .05. We do not specify any exante interactions of the treatment effects with baseline covariates. However, because the mechanism through which any treatment effects operate are of significant interest, we will conduct exploratory analyses of potential treatment effect heterogeneity by estimating models with interactions between treatment and respondent identity variables. Among other interactions, we are interested in seeing whether any average treat ment effect is primarily driven by behavior changes among men (in the case of the gender treatment) and white respondents (in the case of the race treatment). We will attempt to replicate any promising results from these exploratory analyses in a followup experiment that explicitly registers interactive hypotheses prior to the experiment. Additionally, because respondents have the option to stop taking the survey after treat ment is assigned but before outcomes are measured, there is concern that an analysis conditional on survey completion will be biased for the average treatment effect if treatment also affects the probability that a respondent will drop out. Although it is not possible to adjust for nonignorable dropout in the absence of prior covariates on respondents, we will examine whether there appears to be systematic differences between treatment arms with respect to attrition and employ sensitivity analyses in the vein of Scharfstein et al. (1999) in order to evaluate the robustness of our estimates to this potential source of bias.
