Subsampling


Laboratory Subsampling

French statistician/geologist Pierre Gy has devel­oped a theory of material sam­pling, which is appli­cable to the sub­sampling of solid materials in the labo­ra­tory for radio­chemical analy­sis. The com­plete theory is exten­sive and covers many aspects of sam­pling for lots of 0, 1, 2, or 3 di­men­sions. How­ever, a labo­ra­tory sam­ple cor­re­sponds to what Gy calls a 0-​dimen­sional lot, which is the sim­plest kind of lot.

The aspects of the theory of 0-​dimen­sional lots con­sid­ered here include:

  • How to reduce or elimi­nate sam­pling bias
  • How to reduce the sam­pling variance
  • How to quantify the funda­mental sam­pling variance

Definitions

Certain terms will be used repeatedly and need to be defined up front.

The lot is the collection of solid material whose prop­erties are of inter­est. The lot con­sists of N frag­ments, or par­ti­cles, where N is typi­cally a very large number. For our pur­poses the lot is usually a labo­ra­tory sample.

The criti­cal component is the com­po­nent of inter­est (analyte) in the lot. The criti­cal content of the lot or of any por­tion of the lot is the ratio of the mass of the criti­cal com­po­nent to the total mass of the lot or por­tion. For our pur­poses the criti­cal com­po­nent is usually a radio­nuclide, such as 137Cs or 238Pu, and the criti­cal con­tent is the mass frac­tion of that radio­nuclide (which is pro­por­tional to the speci­fic activity).

The sample is a random non­empty sub­set of the frag­ments of the lot. The sam­ple is taken from the lot and the meas­ured prop­er­ties of the sam­ple are assumed to rep­re­sent those of the lot. For our pur­poses the sam­ple is usually a sub­sample, or ali­quot, taken from a labo­ra­tory sam­ple (the lot) for radio­chem­ical analysis. To avoid confu­sion, we will usually avoid the sample/​sub­sample termi­nol­ogy and talk only about the lot/​sample.

Gy defines a prob­abi­lis­tic sam­ple to be correct if every frag­ment in the lot has the same prob­abil­ity of being included in the sample. A prob­abi­lis­tic sam­ple is biased if the ex­pected value (or mean) of the criti­cal content of the sam­ple dif­fers from the criti­cal con­tent of the lot. If the sam­ple is not biased, it is unbiased. To say that a sam­ple is un­biased does not mean that its criti­cal con­tent exactly equals that of the lot. Instead, it means only that the sam­ple is selected in such a way that if the sam­pling could be repeated many times (with replace­ment), the aver­age value of the sam­ple’s criti­cal con­tent would equal the criti­cal con­tent of the lot.

It is not always pos­sible to de­ter­mine from the man­ner of select­ing a sam­ple whether it is biased. Whether the sam­ple is biased de­pends on the masses and criti­cal con­tents of its frag­ments. For example, if every frag­ment has exactly the same criti­cal con­tent, then any sam­ple is un­biased, but if only one frag­ment con­tains all the lot’s criti­cal com­po­nent, then un­biased sam­pling requires more care. In prac­tice, one never knows the criti­cal con­tents of the frag­ments; so, one would like a sam­ple that is un­biased regard­less of what those criti­cal con­tents might be.

We will define a sam­ple to be fair if it is un­biased for any pos­sible values of the criti­cal con­tents of the frag­ments. Whether a sample is fair de­pends on the frag­ment masses but not on their criti­cal contents.

Although in theory fair sam­pling is a nice goal, in nor­mal prac­tice correct sam­pling seems to be a more reason­able alternative.

Notation

We denote the lot by L. When we write equa­tions for sam­pling bias and vari­ance, we will assume the frag­ments of the lot are numbered from 1 to N. Then mi denotes the mass of frag­ment i, Ai denotes the mass of criti­cal com­po­nent in frag­ment i, and ai denotes the criti­cal con­tent of frag­ment i. If G is any non­empty sub­set of the frag­ments, then mG denotes the total mass of G, AG denotes the mass of criti­cal com­po­nent in G, and aG denotes the criti­cal con­tent of G. So, mL denotes the mass of the entire lot and aL denotes its criti­cal content.