Population attributable fraction: comparison of two mathematical procedures to estimate the annual attributable number of deaths
- Bernard CK Choi^{1_65, 2_65, 3_65, 4_65}Email author
Received: 10 September 2009
Accepted: 31 August 2010
Published: 31 August 2010
Abstract
Objective
The purpose of this paper was to compare two mathematical procedures to estimate the annual attributable number of deaths (the Allison et al procedure and the Mokdad et al procedure), and derive a new procedure that combines the best aspects of both procedures. The new procedure calculates attributable number of deaths along a continuum (i.e. for each unit of exposure), and allows for one or more neutral (neither exposed nor nonexposed) exposure categories.
Methods
Mathematical derivations and real datasets were used to demonstrate the theoretical relationship and practical differences between the two procedures. Results of the comparison were used to develop a new procedure that combines the best features of both.
Findings
The Allison procedure is complex because it directly estimates the number of attributable deaths. This necessitates calculation of probabilities of death. The Mokdad procedure is simpler because it estimates the number of attributable deaths indirectly through population attributable fractions. The probabilities of death cancel out in the numerator and denominator of the fractions. However, the Mokdad procedure is not applicable when a neutral exposure category exists.
Conclusion
By combining the innovation of the Allison procedure (allowing for a neutral category) and the simplicity of the Mokdad procedure (using population attributable fractions), this paper proposes a new procedure to calculate attributable numbers of death.
Background
There are two mathematical procedures to estimate the number of deaths attributable to a risk factor such as obesity, smoking or alcohol consumption. Number of attributable deaths is the number of deaths in a population that could be avoided if the effects of the risk factor were eliminated from the population. The two procedures are Allison et al [1] and Mokdad et al [2]. Both procedures are under the assumptions of no confounding and no effect modification. Both can be applied to risk factors with polytomous exposure categories.
The Allison procedure [1], originally developed for obesity attributable deaths, is rather complex. It involves 12 steps, uses hazard ratios, and requires calculating hazard rates by using a mathematical process to solve for an unknown quantity. The Mokdad procedure [2] on the other hand, is simpler. It involves only 6 steps, uses relative risks, and does not require solving for any unknown quantity.
In general, a common belief is that the more complex the procedure, the more accurate the results. Allison et al further stipulated that their procedure accounts for "complications", because it can estimate attributable deaths for body mass index (BMI) along a continuum (ie, for each unit of BMI), and can adjust for time using hazard ratio (HR) that the relative risks (RR) cannot achieve [1]. As a result, Mokdad et al used the Mokdad procedure to estimate the attributable numbers for tobacco and alcohol, and then they reverted to the more complex Allison procedure to estimate the attributable number for obesity.
A detailed read of Allison et al's paper revealed that two steps in the Allison procedure are not well-documented. First, while the equation for the overall number of deaths attributable to obesity and overweight (ω) is given in their paper, the equation to calculate the number of deaths attributable to each individual BMI category is missing. It is therefore unclear how data in their table three can lead to results in their table four. Second, it is said in their paper that the hazard (λ) can be obtained by numerically solving a complex equation for λ. Τhe actual method, however, is not given. For the less sophisticated users, the Allison procedure is not user-friendly.
There are several questions arising from looking at these two procedures: How are the Allison and Mokdad procedures mathematically and practically different? What are the best aspects of each procedure? Can the underlying equations be combined or modified to take advantage of the best aspects of both?
This paper compares the Allison and Mokdad procedures for the estimation of annual attributable number of deaths, both mathematically and using real data. The paper also "recovers" the missing Allison equation to calculate the individual number of deaths attributable to each BMI category, develops a similar and simpler equation using the logic of the Mokdad procedure, compares estimated number of attributable deaths under the HR and RR models, and looks at several options for numerically solving the equation for λ. This paper also proposes a modified Mokdad procedure that can achieve the same results as the Allison procedure.
Methods
Mathematical derivations from first principles from population attributable fraction (PAF), defined as the proportion of deaths in a population that can be attributed to the causal effects of a risk factor or set of factors, were used to demonstrate the relationship and differences of the Allison and Mokdad procedures. The missing mathematical equation to estimate the number of deaths attributable to each exposure category was derived for the Allison procedure. A similar equation was created for the Mokdad procedure. The two procedures were then "taken apart" and the logics behind the two procedures were examined and compared. Based on this, a new procedure (modified Mokdad) was developed combining the innovation of the Allison procedure and the logic of the Mokdad procedure. Finally, estimation methods under the hazard ratio and relative risk models were compared. Some options for solving for λ were described. Real datasets provided by Allison et al were used to illustrate the practical differences of the two procedures (Allison and the new procedure provided in this paper).
Results
Conversion table of notations.
Variable | Allison et al [1] | Mokdad et al [2] | This Paper |
---|---|---|---|
No. of deaths attributable | ω | ω | ω |
Total no. of deaths in population | M | M | M |
Total no. in population | N | N | |
Fraction of population nonexposed | P(R) | P_{0} | f_{0}, 1- Σf_{i,} 1- Σf_{i} - f_{q} |
Fraction of population exposed | P(O) | Σ P_{i} | f, Σf_{i} |
Fraction of population exposed to an exposure category (i) | P_{i} | f_{i} | |
Fraction of population exposed to a neutral category (e.g., underweight) | P(Q) | f_{q} | |
Hazard ratio (HR) for an exposure category | h | h | |
Hazard ratio (HR) for a neutral category (e.g., underweight) | q | q | |
Probability of death in a year in population | P(D), M/N | p, P(D) | |
Hazard of death in the nonexposed | λ | λ | |
Conditional probability of death in a year in nonexposed | P(D|R), 1 - e^{-λ} | p_{0}, P(D|E_{0}) | |
Conditional probability of death in a year in various exposure categories | P(D|O), 1 - e^{-hλ} | R_{i} p_{0} | |
Conditional probability of death in a year in a neutral category (e.g., underweight) | P(D|Q), 1 - e^{-qλ} | p_{q} | |
Relative risk | RR, P(D|O)/P(D|R) | RR_{i} | R_{i} |
Population attributable fraction | PAF | PAF |
1. Mathematical proof that the Allison procedure and Mokdad procedure differ in a neutral exposure category (Q)
Equation 2 is another frequently quoted form of PAF [3].
Equation 4 is Mokdad et al [2] (see Additional file 1, Appendix S1, equation A3), as 1-∑f_{i} is P_{0} (Table 1).
Consideration of a neutral exposure category
Nonexposed E_{0} (e.g. normal weight, R) | Exposed Categories E_{1}...E_{i}...E_{k} (e.g. overweight and obese, O) | Neutral Category (e.g. underweight, Q) | Total | |
---|---|---|---|---|
Death D | (1 - Σf_{i}-f_{q}) p_{0} | Σf_{i} R_{i} p_{0} | f_{q} p_{q} | p |
(1 - Σf_{i}- f_{q}) (1 - p_{0}) | Σf_{i}(1- R_{i} p_{0}) | f_{q}(1-p_{q}) | 1 - p | |
Total | 1 - Σf_{i}-f_{q} | Σf_{i} | f_{q} | 1 |
The methodology of Allison et al leads to a modified Levin equation as shown below.
where ω is number of deaths attributable to exposure categories E_{1} ... E_{i} ... E_{k} combined, M is total number of deaths, N is total number in population, P(R) is f_{0} = 1- Σf_{i} - f_{q} , P(O) is Σf_{i}, P(D|R) is p_{0}.
Comparing equation 8 (derived from Allison et al) with Levin's original equation for PAF (equation 1) which is identical to the Mokdad procedure, the Allison procedure subtracts out a certain weighted proportion of deaths associated with the neutral category (the underweight Q) from the attributable deaths to the exposure (the overweight and obese). In other words, the Allison procedure allows for a neutral exposure category (neither nonexposed nor exposed), while the Mokdad procedure does not.
2. Recovery of the Allison equation for number of deaths attributable to each exposure category
where ω_{i} is number of deaths attributable to each exposure category i.
The next section shows that the Mokdad procedure can be modified to do the same calculations as the Allison procedure, but more simply.
3. Development of an equation for number of deaths attributable to each exposure category for the modified Mokdad procedure
The Allison procedure directly estimates the number of attributable deaths, ω. The Mokdad procedure indirectly estimates ω by first estimating PAF. Using the logic of the Mokdad procedure, we develop a new, modified Mokdad procedure to estimate ω as follows:
Equations 11 and 12 are new equations we created for exposure category-specific PAF and attributable number, respectively. Because the Mokdad procedure uses the PAF approach, the neutral category Q reappears in equations 11 and 12, because Q is part of the total population. P(Q) is easy to obtain from health surveys. Equation 12 is expected to yield identical results as those of equation 10 (Allison's), because it is derived from equation 10. Equation 12 is simpler to use as it needs only total deaths (M), fractions of exposure in each category ((P(O_{i}), P(R), P(Q)) which are readily available from health surveys and the relative risks (RR_{i}, RR_{q}). Additionally, equation 10 is difficult to use because it also requires total population (N), probability of death in each of the exposure categories (P(D|O_{i})), and probability of death in the reference group (P(D|R)). The probabilities of death are difficult to obtain.
4. Difference in the estimated number of attributable deaths under the hazard ratio and the relative risk models
where h is hazard ratio (HR) and λ is hazard rate in nonexposed.
Comparison of hazard ratio (HR) and relative risk (RR) using the notations of Allison et al [1].
Hazard rate in nonexposed λ | Hazard ratio HR = h | Probability of death in nonexposed P(D|R) = 1 - e^{-λ} | Probability of death in exposed P(D|O) = 1 -e^{-hλ} | Relative risk |
---|---|---|---|---|
Theoretical comparison | ||||
0.01 | 1 | 0.00995 | 0.00995 | 1.00 |
3 | 0.02955 | 2.97 | ||
5 | 0.04877 | 4.90 | ||
7 | 0.06761 | 6.79 | ||
0.10 | 1 | 0.09516 | 0.09516 | 1.00 |
3 | 0.25918 | 2.72 | ||
5 | 0.39346 | 4.13 | ||
7 | 0.50341 | 5.29 | ||
0.008651 | 1.39 | 0.00861368777 | 0.0119528799 | 1.38766116 |
0.008651 | 0.98 | 0.00861368777 | 0.0084421433 | 0.98008466 |
5. Options for numerically solving an equation for the hazard of death in the nonexposed (λ)
There are commercial packages available for solving an equation for an unknown quantity; packages such as MATHEMATICA and MAPLE [5]. However, for these packages there is a steep learning curve for beginners, and packages can be quite expensive [6]. We looked into two simpler non-commercial options which one can easily program at no cost.
The first option is Newton's method (Additional file 1, Appendix S2). Applying Newton's method to the Alameda County Health Study data, provided in Allison et al's table three, gave an estimated λ of 0.008651. The second option is Taylor series (Additional file 1, Appendix S3). Applying Taylor series to the same data gave the same estimated λ of 0.008651. The two options gave virtually the same answers, with an error margin of less than 0.000001.
6. Comparison of the Allison procedure and modified Mokdad procedure with real datasets
We used the real dataset from the Alameda County Health Study provided by Allison et al [1] to compare the results using the Allison procedure and the modified Mokdad procedure, under both the hazard ratio (HR) and the relative risk (RR) models (Additional file 1, Appendix S4).
From our Additional file 1, Appendix S4, it can be seen that the results using the Allison procedure and the modified Mokdad procedure, under the hazard ratio (HR) and the relative risk (RR) models, are very similar to each other. The Allison procedure is a HR approach and the Mokdad procedure is a RR approach. Therefore the results of the Mokdad procedure using RR are closer to the Allison procedure than the Mokdad procedure using HR. However, the Mokdad procedure using HR to approximate RR provides good enough estimates of attributable number of deaths, and it avoids the use of equation 13 which involves estimation of RR that involves complex estimation of λ.
Discussion
The procedures recommended by Allison et al [1] and Mokdad et al [2] can both be applied to estimate the number, as well as fraction, of a single outcome (such as death) attributable to a risk factor (such as increased body mass index, BMI) that is polytomous (e.g., overweight, obese, and even stratified by BMI unit). Although not specifically mentioned in the two original articles [1, 2], both procedures can be applied to one or more risk factor combinations (such as BMI and smoking) as long as the risk factor combinations are expressed in independent (i.e., nonoverlapping) exposure categories. Furthermore, both procedures are under the assumptions of no confounding and no effect modification by the risk factors of interest and other covariates (such as age or sex).
The Allison procedure can be applied to the situation when there is a nonexposed category, one or more exposure categories, and one or more neutral (neither nonexposed nor exposed) categories. Allowance of a neutral exposure category is a benefit of the Allison procedure from a causal inference perspective, because in reality the population cannot always be dichotomized into nonexposed and exposed. The Mokdad procedure cannot allow for a neutral category. This paper proposes a modified Mokdad procedure that can achieve the same results as the Allison procedure, but through a simpler way.
The Allison procedure involves twelve steps, while the Mokdad procedure involves only six steps (Additional file 1, Appendix S1). The reason why the Allison procedure involves more steps is because it attempts to directly estimate the attributable number of deaths (equation A1), and this necessitates the estimation of the probabilities of death in the nonexposed, various exposure and the neutral categories. This in turn necessitates the calculation of the hazard rate in the nonexposed, λ, which requires substantial mathematical skills. The Mokdad procedure, on the other hand, first calculates the population attributable fraction (equation A3), and then obtain the attributable number of deaths by multiplying the PAF with the total number of deaths (equation A2). Our paper (equations 1-4) shows that in the derivation of the equation for PAF in the Mokdad procedure, the probabilities of death cancel out each other in the numerator and denominator, leaving only fractions of exposure and relative risks as necessary input parameters for the estimation of PAF. This greatly simplifies the calculation process in the Mokdad procedure.
The Mokdad procedure, however, breaks down if a neutral category (such as underweight) that is neither nonexposed (such as normal weight) nor exposed (such as overweight and obese) exists. Also, while the Mokdad procedure can calculate the overall number of deaths attributable to a risk factor with multiple exposure categories, it does not calculate the number attributable to each individual exposure category. The Allison procedure, on the other hand, can estimate the individual exposure category attributable numbers (although the exact equation was not given in Allison et al [1].)
Eight steps in our proposed new procedure (modified Mokdad) to calculate number of deaths attributable to a risk factor with multiple exposure categories, allowing for one or more neutral categories.
Let ω be the number of deaths attributable to a risk factor (e.g. overweight and obese); and ω_{i} be the number of deaths attributable to a specific exposure category i (e.g. overweight) of the risk factor. |
Using the notations in Table 2, |
Using the notations of Allison et al, equation 11 can be expressed as |
Using the notations of Mokdad et al, and adding P_{q} (fraction of population underweight) and RR_{q} (relative risk for underweight), equation 11 becomes |
Both our proposed procedure (Table 6) and the Allison procedure allow for one or more neutral categories (such as underweight). The numbers of death associated with the neutral categories are excluded from the calculation of the number of death attributable to the risk factor under study. The Allison procedure involves twelve steps while the proposed procedure involves only eight steps. The proposed procedure, using the logic of the Mokdad procedure, does not require calculation of the probabilities of death in the various categories. Therefore no solving for the hazard λ is required. The proposed procedure is expected to produce similar results as the more complex Allison procedure. Slight discrepancies in the results, as shown in the real examples provided in this paper, are due to rounding errors in the additional steps in the Allison procedure to estimate probabilities of death in various nonexposed, exposure and neutral categories, and solving for λ, the hazard in the nonexposed, all of which are not required in the proposed procedure. Discrepancies will also occur depending on whether relative risks or hazard ratios are used, but this is expected to be small when the event (e.g., death) is rare (section 4). If one insists to use the Allison procedure instead of the proposed procedure, this paper discusses a number of options for solving for λ which could be helpful (section 5).
Acknowledgements
The author would like to thank Sunita Narang, Justin Francis and Rita Zhang for statistical and data support for this paper. The author declares there is no conflict of interest. No funding or support was received for this study. The author has full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Declarations
Authors’ Affiliations
References
- Allison DB, Fontaine KR, Manson JE, Stevens J, VanItallie TB: Annual deaths attributable to obesity in the United States. JAMA 1999, 282:1530–1538.View ArticlePubMedGoogle Scholar
- Mokdad AH, Marks JS, Stroup DF, Gerberding JL: Actual causes of death in the United States, 2000. JAMA 2004, 291:1238–1245.View ArticlePubMedGoogle Scholar
- Levin ML: The occurrence of lung cancer in man. Acta Union International Contra Cancrum 1953, 9:531–541.Google Scholar
- Last JM: A Dictionary of Epidemiology. NewYork: Oxford University Press; 1995.Google Scholar
- Department of Applied Mathematics: Mathematica, Maple, Matlab, IDL. [http://amath.colorado.edu/computing/mmm/]
- bitwise magazine. Maple 10 v Mathematica 5.2 [http://www.bitwisemag.com/copy/reviews/software/maths/maple10_mathematica52.html]
- Wikipedia: Newton's method. [http://en.wikipedia.org/wiki/Newton's_method]
- WolframMathWorld: Newton's method. [http://mathworld.wolfram.com/NewtonsMethod.html]
- Wikipedia: Taylor series. [http://en.wikipedia.org/wiki/Taylor_series]
- WolframMathWorld: Taylor series. [http://mathworld.wolfram.com/TaylorSeries.html]
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.