Title Scaling Bayesian Probabilistic Record Linkage with Post-Hoc Blocking: An Application to the California Great Registers
Post date 01/16/2019
C1 Background and Explanation of Rationale

We implement a Bayesian Record Linkage Algorithm described in our paper. This pre-registration is for the hand-coding of the correctness of linkages. We seek to compare our record linkage algorithm to fastLink, a software package developed by Enamorado et al.

We split our sample into two categories, movers and non-movers, and draw a stratified sample from the movers and non-movers groups as follows:
150 linkages made only by fastLink (i.e. the Bayesian algorithm’s probability of the linkage being correct was less than or equal to .5, while fastLink returned a probability greater than.9)
150 linkages made only by the Bayesian algorithm (i.e. fastLink’s probability of the linkage being correct was less than or equal to .9, while the Bayesian algorithm returned a probability greater than .5)
100 linkages returned by both (with probability greater than .9 for fastLink and greater than .5 for the bayesian algorithm).

The researchers then hand-code the linkages, marking them as correct, incorrect, or ambiguous.

C2 What are the hypotheses to be tested? We seek to test whether the Bayesian algorithm made a higher proportion of correct matches, testing movers and non-movers separately. We will drop the ambiguous cases from the analysis.`
C3 How will these hypotheses be tested? * a test of a difference in proportions conducted at the .05 significance level.
C4 Country United States
C5 Scale (# of Units) 800
C6 Was a power analysis conducted prior to data collection? Yes
C7 Has this research received Insitutional Review Board (IRB) or ethics committee approval? No
C8 IRB Number n/a
C9 Date of IRB Approval n/a
C10 Will the intervention be implemented by the researcher or a third party? not provided by authors
C11 Did any of the research team receive remuneration from the implementing agency for taking part in this research? No
C12 If relevant, is there an advance agreement with the implementation group that all results can be published? No
C13 JEL Classification(s) not provided by authors