Originally posted: 2023-10-18. Live edit this notebook here.

This article is part of the probabilistic linkage training materials

This visualisation aims to clarify the intuition behind the following formula from the Fellegi Sunter model, which was presented in the article on the maths of Fellegi Sunter in the main tutorial.

posterior match probability=λmλm+(1λ)u\text{posterior match probability} = \frac{\lambda m }{\lambda m + (1 - \lambda) u}

where

λ=prior match probability\lambda = \text{prior match probability}

and the m and u probabilities are defined in relation to a scenario:

m=Pr(scenariorecords match)m = \text{Pr}(\text{scenario}|\text{records match})

u=Pr(scenariorecords do not match)u = \text{Pr}(\text{scenario}|\text{records do not match})

This article also explains why it's just a re-statement of Bayes Theorem.

Recall that the prior is the probability that two random records match, which is one of the parameters to be estimated.

We can visualise this parameter by showing how it divides the set of all pairwise record comparisons:

Using the definition of the m and u probabilities above, we can further subdivide this space as follows:

Given our observation that the scenario holds, we can discard the areas in white as no longer applicable given this new information.

Turning this back into formulas we can write:

Substituting in numbers we have:

In a more general sense, these visualisations explains the intuition behind Bayes Theorem.

Recall that Bayes Theorm is:

posterior probability=likelihood×prior probabilityevidence\text{posterior probability} = \frac{\text{likelihood} \times \text{prior probability}}{\text{evidence}}

or:

Pr(ab)=Pr(ba)Pr(a)Pr(b)\operatorname{Pr}(a|b) = {\frac{\operatorname{Pr}(b|a)\operatorname{Pr}(a)}{\operatorname{Pr}{(b)}}}

In the context of record linkage, we can describe these parts as:

Prior: The overall proportion of comparisons which are matches Pr(match)\operatorname{Pr}(\text{match})

Evidence: We have observed that a scenario holds, Pr(scenario)\operatorname{Pr}(\text{scenario})

Likelihood: The probability that the scenario holds amongst matches, given by Pr(scenariorecords match)\operatorname{Pr}(\text{scenario}|\text{records match})

So Bayes' Theorem is:

Pr(matchscenario)=Pr(scenariomatch)Pr(match)Pr(scenario)\operatorname{Pr}(\text{match}|\text{scenario}) = \frac{\operatorname{Pr}(\text{scenario}|\text{match})\operatorname{Pr}(\text{match})}{\operatorname{Pr}{(\text{scenario})}}

but:

Pr(scenario)=λm+(1λ)u\text{Pr}(\text{scenario}) = \lambda m + (1 - \lambda) u

So Bayes Theorem is just our original formula:

posterior match probability=λmλm+(1λ)u\text{posterior match probability} = \frac{\lambda m }{\lambda m + (1 - \lambda) u}

See also this great video!

This article is part of the probabilistic linkage training materials