McCroan/Bias

Subsampling Bias (Continued)

This page takes you down the rabbit hole.

Since there is a difference between stating that a sample is unbiased and stating that it is unbiased for all possible values of a₁, a₂, …, a_N, we offer the following definition.

Definition A sample S from lot L is fair with respect to the fragment masses m₁, m₂, …, m_N if E(a_S) = a_L for all possible values of a₁, a₂, …, a_N.

When the sample consists of only one fragment, it is easy to eliminate sampling bias: simply make each fragment’s selection probability proportional to its mass, as in the example described earlier. It is not completely obvious how to generalize this sampling strategy when the sample consists of more than one fragment. However, while studying Gy’s sampling theory and considering his definition of correct sampling, I (Keith) proved the result shown below as Theorem B.1. If you were already aware of this theorem, please email me at Work@mccroan.com. The theorem describes sufficient conditions to ensure that the sampling bias for a single sample is zero. (The conditions are sufficient but not necessary.)

Theorem B.1 Let F be a random member of L, and let R be a random subset of L \ {F} such that:

the index F and the size |R| are independent random variables,
Pr[ F = j ] = m_j / m_L for all j ∈ L, and
for all integers j ∈ L and all subsets G and H of L \ { j }, if |G| = |H|, then Pr[ R = G | F = j ] = Pr[ R = H | F = j ].

Let S = {F} ∪ R, where ∪ denotes union of sets. Then S is fair.

The theorem implies that to ensure the sample is unbiased, it suffices to choose the first fragment for the sample on the basis of mass (i.e., with each fragment’s selection probability proportional to its mass) and then to select all the other fragments at random, with the added condition that the number of fragments selected cannot be dependent on which fragment is chosen first. So, although completely correct sampling as defined by Gy does not guarantee zero bias, there are bias-free methods of sampling that are incorrect only in the manner of selecting the first fragment.

The theorem is presented above in the form I first imagined, thinking of an actual procedure for choosing the fragments of the sample. A mathematically simpler form appears below as Corollary B2.2. The version that appears below makes it explicit that the conditional probability Pr[ R = G | F = j ] in part (iii) above is a function of the number of fragments |G| and does not depend on j.

Note 1: This result obviously doesn’t have much practical usefulness for typical laboratory samples, which contain thousands or millions of fragments.

Note 2: The fact that the portion of material removed from the lot is an unbiased sample does not necessarily imply that the portion of material left behind is also an unbiased sample. How to split a lot into p nonempty fractions such that each fraction is an unbiased sample is another mathematically interesting problem.

Example: Suppose a lot contains 4 fragments, with masses 1, 2, 5, and 12 (grams). How can one take a sample consisting of exactly two fragments in a manner that guarantees the expected critical content of the sample equals the critical content of the lot?

Answer: Choose the first fragment on the basis of mass. Then choose one of the remaining fragments at random, without regard to mass. When the first fragment is selected, the selection probabilities are as follows:

Fragment	Mass / g	Selection probability
A	1	1/20
B	2	1/10
C	5	1/4
D	12	3/5

E.g., suppose fragment D is chosen first (as it will be three times out of five). Then each of the other three fragments (A, B, C) has probability 1/3 of being selected as the second fragment.

Theorem B.1 implies that regardless of the actual critical content of each of the four fragments, the expected critical content of the sample equals the critical content of the lot.

The proof of Theorem B.1 uses the following lemma.

Lemma B.1.1 Let S be a sample from the lot L. Then S is fair if and only if for all j ∈ L the following equation is true:

Proof…

Proof: (⇒, the only if part) First assume S is fair, and hence unbiased for all values of a₁, …, a_N. Fix j ∈ L and for each i ∈ L, let a_i = δ_ij, where δ denotes the Kronecker delta. So,

a₁ = 0, a₂ = 0, …, a_{j − 1} = 0, a_j = 1, a_{j + 1} = 0, …, a_N = 0

Then:

So,

Then divide m_j from both sides to get:

(⇐, the if part) Now let a₁, …, a_N be arbitrary and suppose for all j ∈ L,

Then:

So, S is unbiased for all a₁, …, a_N and thus is fair.

The lemma is unremarkable. If you ask yourself what it takes to ensure the sample is unbiased, you’re led quickly and naturally to this result. On the other hand Theorem B.1 wasn’t so obvious to me. I wondered for some time how to make any use of the lemma but had little insight until I came up with Theorem B.1 as a conjecture. At first it seemed ridiculous, and I thought it would be amazing if true. Then I proved that it was true, and I was amazed.

I no longer include the original proof of Theorem B.1, because I later looked for and found a more general version of the theorem, which seemed rather obvious in retrospect, but whose statement may be harder to understand at first. The more general theorem and its proof appear below.

Theorem B.2 Let F be a random member of L and S a random sample from L such that:

Pr[ F = j ] = m_j / m_L for all j ∈ L, and
there is a function w defined on the nonempty subsets of L, taking values in the interval [0,1], such that for all subsets ∅ ≠ G ⊆ L

Then S is fair.

Proof…

Proof: For any ∅ ≠ G ⊆ L,

For any j ∈ L,

So, for all j ∈ L,

Then Lemma B1.1 implies S is fair.

In fact any fair sample distribution can be realized in the manner of Theorem B.2—i.e., by choosing the first fragment F on the basis of mass and then completing the sample using conditional probabilities as indicated.

Now we can start to consider how to derive Theorem B.1 from Theorem B.2. First just try to rewrite Theorem B.2, like Theorem B.1, in terms of the random fragment F and the set R = S \ {F}. The result is the following corollary.

Corollary B.2.1 Let F be a random member of L and R a random proper subset of L, possibly void, such that:

Pr[ F = j ] = m_j / m_L for all j ∈ L, and
there is a function w defined on the nonempty subsets of L, taking values in the interval [0,1], such that for all proper subsets H ⊂ L,

Let S = {F} ∪ R. Then S is fair.

The preceding corollary is still more general than Theorem B.1, but the next corollary is not.

Corollary B.2.2 Let F be a random member of L and R a random proper subset of L, possibly void, such that:

Pr[ F = j ] = m_j / m_L for all j ∈ L, and
there is a function f : {0, 1, …, N − 1} → [0,1] such that for all proper subsets H ⊂ L,

Let S = {F} ∪ R. Then S is fair.

Corollary B.2.2 is essentially equivalent to Theorem B.1, although its presentation is mathematically simpler. To prove Corollary B.2.2 let w(G) = f (| G | − 1).

And there is the following.

Corollary B.2.3 Let F be a random member of L and R a random subset of L, possibly void, such that:

Pr[ F = j ] = m_j / m_L for all j ∈ L, and
there is a function f : {0, 1, …, N} → [0,1] such that Pr[ R = G | F ] = f (| G |) for all subsets G ⊆ L.

Let S = {F} ∪ R. Then S is fair.

This corollary differs from Corollary B.2.2 in that it allows the possibility that F ∈ R. To prove it from the main theorem, let w(G) = f (| G |) + f (| G | − 1).

Corollary B.2.4 Suppose F is a random member of L such that Pr[ F = j ] = m_j / m_L for all j ∈ L and Pr[ F ∈ S ] = 1. Then S is fair if and only if for all j ∈ L,

Proof: Let w(G) = Pr[ S = G | F ∈ G ]. The fact that Pr[ F = j ] = m_j / m_L for all j ∈ L implies that Pr[ F ∈ G ] = m_G / m_L for all G ⊆ L. And since Pr[ F ∈ S ] = 1, we get Pr[ S = G ] = Pr[ S = G and F ∈ G ] = Pr[ F ∈ G ] Pr[ S = G | F ∈ G ] = w(G) m_G / m_L. The theorem follows.

Choosing even one fragment F on the basis of mass is generally impractical because of the large number of fragments whose masses must be known. Although practicality is not really an important consideration in this analysis, a more practical approach to provably fair sampling would be to partition the N fragments into some relatively small number of nonempty cells and then choose one of those cells at random to be the sample, with each cell’s selection probability proportional to its mass. However, there are many fair sample distributions that cannot be realized in this manner. For example, if N ≥ 3, it is possible to devise a sample distribution that ensures a sample of size N − 1, but the distribution cannot be produced by partitioning the fragments into cells and choosing a cell at random, because in any partition some cells will always have sizes different from N − 1.

Lemma B.1.1 can be restated as follows:

Lemma B.1.1′ Let S be a sample from L and let Γ denote the collection of all nonempty subsets of L:

Then S is fair if and only if there is a function w : Γ → [0,1] such that:

, for all j ∈ L, and

, for all G ∈ Γ.

So, if we define the function w by , then the sample is fair if and only if for all j. In fact, it is easy to show that the sample is fair if and only if there is any constant c such that for all j, because in this case c = 1.

How do we interpret w(G) for a fair sample S? If we implement S by choosing its first fragment F with mass-based selection probabilities and then completing it as described above, then w(G) = Pr[ S = G | F ∈ G ]. Alternatively, if we take a random or fixed partition P of L and then randomly select one of its cells with mass-based selection probabilities as described above, then w(G) = Pr[ G ∈ P ]. In the latter case, the fact that is a simple consequence of the fact that P is a partition of L.

Bias – p. 3

Subsampling Bias (Continued)