Decay & Ingrowth

Decay

In the radiochemistry business, one often needs to calculate decay and ingrowth factors. A decay factor is the expected fraction of a radionuclide remaining after a specified time t has elapsed, or, equivalently, the probability that an atom of that radionuclide will survive at least until time t without decaying. Calculating a decay factor is fairly trivial. It equals e^−λt, where λ is the decay constant for the radionuclide. It’s as simple as that.

In what follows, it will be convenient to have a symbol for the function t ↦ e^−λt. We will use D_λ, where the D obviously means decay.

Ingrowth

The problem becomes more interesting when one needs to calculate an ingrowth factor, defined as the expected amount of a decay product that will exist at a later time because of ingrowth from a specified ancestor. The value of the ingrowth factor depends on whether one defines the amount of a nuclide in terms of the number of atoms or the activity. We will use the number of atoms here, but the conversion to an activity basis is simple (because A = λN).

Consider a simple non-branching decay chain:

where each S_i denotes a state in the chain, with decay constant λ_i. The original ancestor is state S₁, and the final decay product of interest is state S_n. If the final state is stable, then λ_n = 0 s⁻¹. There are many examples of branching decay, where the final state can be reached by following any of several paths through a branching decay diagram starting from the same initial state. (Mathematically speaking the decay diagram is a directed acyclic graph, or DAG, because radionuclides decay only to lower energy states, not higher states.) The ingrowth factor in this case is just a weighted sum of the ingrowth factors for the individual paths, with each path’s weight equal to its relative probability of being taken (the product of the branching fractions along the path). Each of these paths looks like the simple linear chain shown above. So, we will focus on non-branching decay.

The ingrowth factor IF, when defined in terms of numbers of atoms, equals the probability that an atom in state S₁ at time 0 will be in state S_n after time t has elapsed. The probability of this event (E_n) can be calculated from the distributions of the decay times, T₁, T₂, …, T_n, of the n states. Each time, T_i, has an exponential distribution, with a probability density function (pdf) given by:

The pdf for a sum of independent decay times T₁ + T₂ + ⋯ + T_n is the convolution of the pdfs of the individual times.

where the convolution operator ∗ is defined for any two functions f (t) and g(t) by:

The ∗ operator is associative and commutative. Associativity implies that if we have functions f₁(t), f₂(t), …, f_n(t), we can write (f₁ ∗ f₂ ∗ ⋯ ∗ f_n)(t) without ambiguity. Commutativity implies that (f ∗ g)(t) = (g ∗ f)(t).

Note: The definition implies that (f ∗ g)(0) = 0. In fact the first n − 2 derivatives of a convolution (f₁ ∗ f₂ ∗ ⋯ ∗ f_n)(t) are also zero at t = 0.

An atom initially in state S₁ will be in state S_n after time t has elapsed iff (if and only if) it decays through the first n − 1 states before time t and then remains in state S_n till time t without decaying further. So, the event E_n occurs iff T₁ + T₂ + ⋯ + T_n − 1 ≤ t and T_n > t − (T₁ + T₂ + ⋯ + T_n − 1). So, the atom ingrowth factor IF_atom is equal to the probability Pr[E_n], given by:

Notice that if S_n is stable, then λ_n is zero and IF_activity = 0, but IF_atom ≠ 0 unless the ingrowth time is zero.

In order to calculate either of these factors, one needs to be able to calculate the convolution of the exponential functions D_λ₁(t), D_λ₂(t), …, D_{λ_n}(t). The theoretical solution to this problem is well known. The difficulties arise when one tries to implement the algorithm efficiently for a computer in a manner that avoids severe floating-point cancellation errors.

If all the decay constants are distinct (the usual case in the real world), the convolution is given explicitly by:

(∗)

which is easily coded in a programming language like C, C++, C#, Java, or Fortran. (For the case where some of the λ_i are repeated, see below.) Equation (∗) may look familiar to those who have dealt with the ingrowth problem before.

Note: The value of the convolution is zero at t = 0, as mentioned above, although this fact may not be immediately obvious from Equation (∗). The function’s first n − 2 derivatives are also zero at t = 0.

So,

Rounding Error

Consider the simplest nontrivial example, where n = 2. In this case Equation (∗) simplifies to the following:

This convolution can be used when calculating the ingrowth of an immediate daughter product from a parent. Even in this simple example there is the possibility of large rounding error. Suppose t is very small. In this case the numerator evaluates to the difference between two numbers that are nearly equal to 1. The most straightforward implementation of the equation in a computer language will lead to an unnecessary loss of accuracy, possibly even returning zero in situations where the true value, although it may be small, is well within the machine’s range of positive floating-point values.

An obvious fix in this simple case when t is small is to use the Maclaurin series for the convolution (see below). However, there is an even easier fix in any programming language that has a library function for the hyperbolic sine (sinh).

where λ= (λ₁ + λ₂) / 2, and b = (λ₂ − λ₁) / 2.

Warning: Do not use this trick in older versions of Microsoft Excel. My 2007 version of Excel uses a naïve implementation of sinh. It calculates sinh with the following equation, which fails for extremely small values of the argument:

The Numbers spreadsheet program by Apple uses a smarter implementation. Newer versions of Excel, including my 2019 version, also do it better.

The C language and some implementations of its derivatives also provide a library function expm1(x), which calculates e^x − 1 accurately. The convolution equation can be written using expm1 as follows:

where d = λ₂ − λ₁. If you have the expm1 function in your library, you should use it here.

For many of the most common ingrowth examples in the radiochemistry lab, either of the two preceding equations will suffice. If there were no more complicated situations than this, the problem would be completely solved; however, there are more complicated examples where n > 2. For instance one may want to calculate the ingrowth factor for a descendant of a very long-lived radionuclide, where some of the intermediate states are long-lived and some are short-lived. This is another situation that can lead to severe round-off error. (E.g., try calculating 1 year’s worth of ²²²Rn ingrowth from ²³⁸U.)

Equation (∗) has the potential to produce excessive round-off error whenever there is a cluster of distinct values λ_i t that are too close together. This can happen when the values are large, but it is more common when they are small, as in the simple (n = 2) example described above when t is small, or in the ²³⁸U example, where several of the half-lives are very long. Whenever λ_i t and λ_j t are small and almost equal—but not quite—Equation (∗) has large positive and negative terms that almost cancel each other—but not quite. The rounding error in this case may be larger than the actual value.

The Maclaurin Series

When t is small enough, the Maclaurin series for the convolution gives fast and accurate results. This series can be written as follows:

where (n)_k is the Pochhammer symbol, or the rising factorial, (n)_k = (n)(n + 1)(n + 2)…(n + k − 1), and where h_k(x₁, x₂, …, x_n) denotes the complete homogeneous symmetric polynomial of degree k, in n variables, defined by:

The rising factorial is easy to evaluate incrementally for successive values of k, because (n)_k + 1 = (n)_k (n + k). Fortunately h_k(x₁, x₂, …, x_n) is also easy to evaluate incrementally.

h₀(x₁, x₂, …, x_n) = 1
h_k + 1(x₁) = x₁^k+1 = x₁ h_k(x₁)
h_k + 1(x₁, x₂, …, x_i+1) = h_k + 1(x₁, x₂, …, x_i) + x_i + 1 h_k(x₁, x₂, …, x_i + 1)

So, the Maclaurin series is easy to code. It also has the virtue of converging for all values of t, but the convergence is rapid enough to be practical for computation only if all the λ_i t are somewhat smaller than 1. There are many possible scenarios where some of the λ_i t are too large to give rapid convergence of the Maclaurin series while others are so small (and therefore close together) that they lead to unacceptable round-off error in Equation (∗).

When all the λ_i t are small enough, the convolution is approximated well by the first one or two nonzero terms of the Maclaurin series.

The (atom) ingrowth factor is then approximated by:

An example of a scenario where this approximation is useful is the calculation of ingrowth in the decay chain ²³⁴U → ²³⁰Th → ²²⁶Ra, where all three half-lives are fairly long. Using Equation (∗) as written, with ordinary double-precision arithmetic, to calculate the ingrowth of ²²⁶Ra over a time interval shorter than a year is likely to produce a larger rounding error than using just the first two terms of the Maclaurin series.

Note: By comparing the coefficients in the Maclaurin series above with the derivatives of Equation (∗) evaluated at t = 0, one finds that:

If all the λ_i t are approximately equal but none is small, the following identity allows one to compute the convolution rapidly using a Maclaurin series (e.g., with c = λ).

However, such a situation is unlikely to arise in practice. So, it seems that the most obvious approach to the problem of round-off error won’t completely solve it.

The Laplace Transform

Before exploring other options, it will help to introduce a powerful tool for dealing with the problem: the Laplace transform. If f (t) is a function of a non-negative real variable t, the Laplace transform of f (t) is the function F(s) defined by:

where s and F(s) are generally considered to be complex.

If F(s) is the Laplace transform of a function f (t), as shown above, and if F(s) is meromorphic with poles c₁, c₂, …, c_n, which may be either real or complex, then its inverse Laplace transform is:

where Res_{s = c_i}{e^st F(s)} means the residue of e^st F(s) at the pole c_i.

Many useful facts about the function f (t) = (D_λ₁ ∗ D_λ₂ ∗ ⋯ ∗ D_{λ_n})(t) can be derived by algebraic manipulation of the transformed function F(s) followed by application of the inverse transform. Although the same facts can be derived in other ways, the derivations using the Laplace transform tend to be easier.

Instead of applying the equations shown above for the Laplace transform and its inverse, it usually helps to remember some common functions and their transforms—or have access to a table of these—and to remember that both the transform and its inverse are linear.

The Laplace transform of an exponential function D_λ(t) = e^−λt is particularly simple:

By setting λ = 0, we see that:

The Laplace transform of a convolution (as defined above) is just the product of the transforms of the individual functions.

where F(s) and G(s) are the Laplace transforms of f (t) and g(t), respectively. So, the Laplace transform of our convolution is the rational function:

Applying the Laplace Transform

The transformed function can be manipulated algebraically to produce new expressions for the original convolution, and some of these new expressions lead to algorithms for computation with less round-off error. Sometimes the new expression is an infinite series, each term of which involves a convolution of exponential functions where each λ_i from a problematic cluster has been replaced by a single value μ, which might be one of the original λ_i from the cluster or perhaps the cluster average. Substituting a single value μ for several of the λ_i produces a convolution in which the decay constants are not all distinct and to which Equation (∗) doesn’t apply.

So, now we need the more general form of Equation (∗), which doesn’t require all the decay constants to be distinct. For lack of a better notation, define G_μ, n(t) to be the n-fold convolution of the single exponential function D_μ(t).*

The Laplace transform of G_μ, n(t) is:

Our next goal is to derive expressions that can be used to calculate the function g(t) defined by:

where each m_i is a positive integer. The Laplace transform of g(t) is:

The function G(s) is meromorphic, because it is a rational function. In general, if F(s) is the Laplace transform of a function f (t) and F(s) is meromorphic with (real or complex) poles c₁, c₂, …, c_n of orders m₁, m₂, …, m_n, respectively, then:

In our problem the function G(s) has poles −μ₁, −μ₂, …, −μ_n. So,

where A_i, k is given by:

The expression for A_i, k doesn’t look very useful for computation, but with a little work one gets the following:

where the bullet notation m • x indicates that an argument x is repeated m times in the argument list. With another abuse of notation, we can rewrite the preceding equation as:

Then we can write the more general form of Equation (∗):

(∗∗)

Of course Equation (∗∗) has the same round-off issues as Equation (∗) when the μ_i t are clustered together, but it works well enough when the μ_i t are well separated, and unlike the Maclaurin series, it doesn’t have slow convergence (and is less likely to overflow) when the μ_i t are large.

Note: The fact that g(0) = 0 implies:

Other Series

Some relevant facts about the complete homogeneous polynomials:

where each k_i ≥ 0 and each z_i ∈ ℂ.

where σ is any permutation of the set {1, 2, …, n}.

, for ∥ z ∥ < 1

, if each ∥ z_i ∥ < 1

Using these facts we can easily derive the Maclaurin series for f (t).

With a few changes we can derive other series. Suppose for each λ_i we choose a μ_i, which may or may not be equal to λ_i. Then F(s) can be rewritten as follows:

where the k_i are non-negative. Then:

(∗∗∗)

This series is essentially a multivariable Taylor series in the variables λ₁, λ₂, …, λ_n. It can be made useful for calculation if we choose the μ_i carefully. For rapid convergence the differences | μ_i t − λ_i t | must be small (at least smaller than 1). The series will be simpler if there is only one or at most a few distinct μ_i for which μ_i ≠ λ_i, but the number of the μ_i needed to keep the round-off error small is really determined by the number of λ_i clusters, because the μ_i must be well separated. If there is only one cluster, the resulting series will be simple enough. Say the λ_i are arranged in increasing order, and the first r of them form a cluster. We can let:

and μ_i = λ_i for i > r. Then write f (t) as follows:

(∗∗∗∗)

If there is more than one cluster of decay constants λ_i, then a more complicated series is needed, but in many situations Equation (∗∗∗∗) suffices.

Given that μ = (λ₁ + λ₂ + ⋯ + λ_r) / r, it is important to note that the second term of this series (k= 1) is zero. So, convergence cannot be tested until k ≥ 2.

Implementing the series efficiently requires an efficient algorithm for updating (G_{μ, r + k} ∗ D_{λ_r + 1} ∗ ⋯ ∗ D_{λ_n})(t) for successive values of k. If λ_r + 1, λ_r + 2, …, λ_n are distinct, then:

where:

and

, for i = r + 1, r + 2, …, n,

and where for any N and any x₁, x₂, …, x_p,

Notice that W does not depend on k; so, it is calculated only once. Also notice that:

The algorithm can employ a simple vector to hold the values of C_{r + 1, k} , C_{r + 2, k} , …, C_n, k and after each C_i, k is used, replace it with C_{i, k + 1}. Most of the work of incrementing k is done in the recalculation of φ_{r + k − 1}, but there is an incremental algorithm for this calculation, and its complexity remains constant as k increases. Define:

Then φ₀(t; x₁, x₂, …, x_p) = 1. For N ≥ 0,

and for i = 1, 2, …, p,

So, the time required to recalculate φ_N(t; x₁, x₂, …, x_p) for each new value of N is proportional to p but not to N.

The Siewers Algorithm

I discovered the Siewers algorithm—fortunately or unfortunately—only after doing a lot of work on the problem myself. This algorithm is based on the fact that:

, for λ_i ≠ λ_j,

where the check over a symbol (Ď) indicates a function omitted from the convolution. I actually knew this identity already but had not noticed that it could be used to develop an efficient algorithm. At first glance it appears to be O(2ⁿ), but it can be implemented more efficiently than that.

Define:

Then:

So, if we can calculate the function F reliably, we have solved our original problem too.

Notice that the value of F(x₁, x₂, …, x_n) does not depend on the order of the arguments. So, rearrange the arguments if necessary to ensure x₁ ≤ x₂ ≤ ⋯ ≤ x_n. The Siewers algorithm now calculates F(x₁, x₂, …, x_n) using the recursion:

(∗∗∗∗∗)

if the difference x_n − x₁ is large enough (greater than some value δ). If the difference is too small, then:

where c = (x₁ + x_n) / 2, and where the Maclaurin series is used for F(x₁ − c, x₂ − c, …, x_n − c).

Very small values of δ ensure rapid and accurate conversion of the Maclaurin series, while larger values tend to minimize round-off error in the recursive calculation (∗∗∗∗∗). The most appropriate choice of δ balances these two considerations. A value between 1 and 2 is likely to give acceptable results. (I get slightly better accuracy with 2 than with 1.)

The only remaining issue is how to avoid unnecessary recalculation of intermediate quantities of the form F(x_i, x_i + 1, …, x_j) that are needed more than once by the algorithm. For s = 1 to n − 1, and for i = 1 to n + 1 − s, if either x_i + s − x_i > δ or x_{i + s − 1} − x_i − 1 > δ, then calculate:

using the most appropriate method. If s = 1, then of course the most appropriate method is given by:

If s > 1, and if x_{i + s − 1} − x_i > δ, then:

Otherwise, calculate:

where c = (x_i + x_{i + s − 1}) / 2, using the Maclaurin series.

Finally, calculate and output f₁⁽ⁿ⁾, which equals F(x₁, x₂, …, x_n).

In fact, a single array of length n can be used to hold the values f_i^(s) at each step of the iteration (s = 1 to n). The algorithm calculates the following quantities as necessary:

	i = 1	i = 2	`…`	i = n − 1	i = n
s = 1	f₁⁽¹⁾ = F(x₁)	f₂⁽¹⁾ = F(x₂)	`…`	f_n−1⁽¹⁾ = F(x_n−1)	f_n⁽¹⁾ = F(x_n)
s = 2	f₁⁽²⁾ = F(x₁, x₂)	f₂⁽²⁾ = F(x₂, x₃)	`…`	f_n−1⁽²⁾ = F(x_n−1, x_n)
`⋮`	`⋮`	`⋮`	`⋰`
s = n − 1	f₁^{(n − 1)} = F(x₁, x₂, …, x_n − 1)	f₂^{(n − 1)} = F(x₂, x₃, …, x_n)
s = n	f₁⁽ⁿ⁾ = F(x₁, x₂, …, x_n)

The quantities in each row after the first are calculated either (a) recursively, using two of the quantities from the preceding row or (b) using the Maclaurin series. The last row holds the final result.

Final Note

Recall Equation (∗∗∗):

This equation would be most useful if we had an efficient incremental algorithm to calculate the sum:

for increasing values of k. An incremental algorithm is easy enough to find—it’s making it efficient that’s hard. (Earlier we obtained efficiency with the assumption that there was only one distinct value of μ_i for which μ_i ≠ λ_i.) There is a simple incremental algorithm to calculate:

and in fact there is a very efficient incremental algorithm to calculate:

which follows from the somewhat unexpected fact that:

Recalculating this sum when incrementing k requires only a multiplication by t and a division by the new value of k.

Although this sum is easy to calculate, we unfortunately have no need for it. There may be a simple and efficient algorithm for the sum we do need, but this programmer doesn’t know it yet.

The Siewers algorithm looks best.

Algorithms: A few explicit algorithms for some of the calculations described above

Example: ²³⁸U → ²²²Rn — to be completed

Application to mean decay or ingrowth during counting — not completed, but simple (include D₀(t) ≡ 1 in the convolution and divide the final result by t)

Uncertainty of a decay or ingrowth factor — to be completed (theoretical value obtained by propagating uncertainties of time and decay constants would be misleading in the presence of large and unknown rounding error)

See also Non-Poisson Counting — The derivation of equations for the index of dispersion (J factor) in non-Poisson counting measurements makes use of ingrowth factors for short-lived decay products (e.g., ²²²Rn progeny). For a PowerPoint summary see Non-Poisson Counting Statistics, or What’s This J Factor All About?.

Note: (2009-11-10) It’s been more than a year and I still haven’t provided many of these examples.

* The symbol G is used here because the function G_λ, n(t) is so closely related to the pdf for a gamma distribution. We choose to use the function G_λ, n(t) instead of the gamma pdf itself so that we can allow λ to be zero, or even negative.