Subsampling

Laboratory Subsampling

French statistician/geologist Pierre Gy has developed a theory of material sampling, which is applicable to the subsampling of solid materials in the laboratory for radiochemical analysis. The complete theory is extensive and covers many aspects of sampling for lots of 0, 1, 2, or 3 dimensions. However, a laboratory sample corresponds to what Gy calls a 0-dimensional lot, which is the simplest kind of lot.

The aspects of the theory of 0-dimensional lots considered here include:

How to reduce or eliminate sampling bias
How to reduce the sampling variance
How to quantify the fundamental sampling variance

Definitions

Certain terms will be used repeatedly and need to be defined up front.

The lot is the collection of solid material whose properties are of interest. The lot consists of N fragments, or particles, where N is typically a very large number. For our purposes the lot is usually a laboratory sample.

The critical component is the component of interest (analyte) in the lot. The critical content of the lot or of any portion of the lot is the ratio of the mass of the critical component to the total mass of the lot or portion. For our purposes the critical component is usually a radionuclide, such as ¹³⁷Cs or ²³⁸Pu, and the critical content is the mass fraction of that radionuclide (which is proportional to the specific activity).

The sample is a random nonempty subset of the fragments of the lot. The sample is taken from the lot and the measured properties of the sample are assumed to represent those of the lot. For our purposes the sample is usually a subsample, or aliquot, taken from a laboratory sample (the lot) for radiochemical analysis. To avoid confusion, we will usually avoid the sample/subsample terminology and talk only about the lot/sample.

Gy defines a probabilistic sample to be correct if every fragment in the lot has the same probability of being included in the sample. A probabilistic sample is biased if the expected value (or mean) of the critical content of the sample differs from the critical content of the lot. If the sample is not biased, it is unbiased. To say that a sample is unbiased does not mean that its critical content exactly equals that of the lot. Instead, it means only that the sample is selected in such a way that if the sampling could be repeated many times (with replacement), the average value of the sample’s critical content would equal the critical content of the lot.

It is not always possible to determine from the manner of selecting a sample whether it is biased. Whether the sample is biased depends on the masses and critical contents of its fragments. For example, if every fragment has exactly the same critical content, then any sample is unbiased, but if only one fragment contains all the lot’s critical component, then unbiased sampling requires more care. In practice, one never knows the critical contents of the fragments; so, one would like a sample that is unbiased regardless of what those critical contents might be.

We will define a sample to be fair if it is unbiased for any possible values of the critical contents of the fragments. Whether a sample is fair depends on the fragment masses but not on their critical contents.

Although in theory fair sampling is a nice goal, in normal practice correct sampling seems to be a more reasonable alternative.

Notation

We denote the lot by L. When we write equations for sampling bias and variance, we will assume the fragments of the lot are numbered from 1 to N. Then m_i denotes the mass of fragment i, A_i denotes the mass of critical component in fragment i, and a_i denotes the critical content of fragment i. If G is any nonempty subset of the fragments, then m_G denotes the total mass of G, A_G denotes the mass of critical component in G, and a_G denotes the critical content of G. So, m_L denotes the mass of the entire lot and a_L denotes its critical content.