Bias – p. 1

Subsampling Bias

One aspect of Gy’s theory is the rule that a sample is guaranteed to be unbiased if and only if the sampling is correct. Gy defines sampling to be correct if every fragment of material in the lot has the same probability of being selected for the sample, and we define sampling to be fair if the sample is guaranteed to be unbiased for any possible values of the critical contents of the fragments. So, Gy’s rule equates correct sampling with fair sampling.

It is easy to see that correctness does not really guarantee an unbiased sample. Suppose the lot consists of only two fragments, and sampling is done by choosing one of these two fragments at random. Each fragment has probability 0.5 of being selected for the sample, and only one fragment is selected. So, you could flip a coin to decide which fragment is chosen. This sampling is correct according to Gy’s definition of correctness. But suppose one of the two fragments (fragment 1) has mass 99 g and critical content 0 (no analyte), and the other fragment (fragment 2) has mass 1 g and critical content 1 (pure analyte). Then the critical content of the lot is 0.01, but the expected critical content of the sample is 0.5. Clearly, the sample is biased; so, it isn’t fair according to McCroan’s definition of fairness.

It is also easy to see what is necessary in this example to ensure zero bias. The probability of selecting each fragment must be proportional to its mass. So, the probability of selecting fragment 1 should be 0.99 and the probability of selecting fragment 2 should be 0.01. If the critical content of fragment 1 is a1 and the critical content of fragment 2 is a2, then the critical content of the lot is:

(a1(99 g) + a2(1 g)) / (100 g) = 0.99 a1 + 0.01 a2 .

If the fragment selection probabilities are proportional to mass, then there is probability 0.99 of selecting fragment 1, in which case the critical content of the sample is a1, and there is probability 0.01 of selecting fragment 2, in which case the critical content of the sample is a2. So, the expected critical content of the sample is:

0.99 a1 + 0.01 a2

which exactly equals the critical content of the lot. Since this equality holds regardless of the values of a1 and a2, the sample is fair.

Admittedly, this example is somewhat unrealistic, because the lot (i.e., the laboratory sample) usually consists of thousands or millions of particles of varying sizes and no single particle accounts for a significant fraction of the total mass of the sample (i.e., the subsample). In these situations, Gy’s rule is almost true and seems to be the only practical approach for keeping the sampling bias negligible. However, the example shows that the rule is not quite true, and it makes a mathematician wonder what kind of sampling would ensure the bias is exactly zero.