# Population attributable fraction: comparison of two mathematical procedures to estimate the annual attributable number of deaths

## Abstract

### Objective

The purpose of this paper was to compare two mathematical procedures to estimate the annual attributable number of deaths (the Allison et al procedure and the Mokdad et al procedure), and derive a new procedure that combines the best aspects of both procedures. The new procedure calculates attributable number of deaths along a continuum (i.e. for each unit of exposure), and allows for one or more neutral (neither exposed nor nonexposed) exposure categories.

### Methods

Mathematical derivations and real datasets were used to demonstrate the theoretical relationship and practical differences between the two procedures. Results of the comparison were used to develop a new procedure that combines the best features of both.

### Findings

The Allison procedure is complex because it directly estimates the number of attributable deaths. This necessitates calculation of probabilities of death. The Mokdad procedure is simpler because it estimates the number of attributable deaths indirectly through population attributable fractions. The probabilities of death cancel out in the numerator and denominator of the fractions. However, the Mokdad procedure is not applicable when a neutral exposure category exists.

### Conclusion

By combining the innovation of the Allison procedure (allowing for a neutral category) and the simplicity of the Mokdad procedure (using population attributable fractions), this paper proposes a new procedure to calculate attributable numbers of death.

## Background

There are two mathematical procedures to estimate the number of deaths attributable to a risk factor such as obesity, smoking or alcohol consumption. Number of attributable deaths is the number of deaths in a population that could be avoided if the effects of the risk factor were eliminated from the population. The two procedures are Allison et al  and Mokdad et al . Both procedures are under the assumptions of no confounding and no effect modification. Both can be applied to risk factors with polytomous exposure categories.

The Allison procedure , originally developed for obesity attributable deaths, is rather complex. It involves 12 steps, uses hazard ratios, and requires calculating hazard rates by using a mathematical process to solve for an unknown quantity. The Mokdad procedure  on the other hand, is simpler. It involves only 6 steps, uses relative risks, and does not require solving for any unknown quantity.

In general, a common belief is that the more complex the procedure, the more accurate the results. Allison et al further stipulated that their procedure accounts for "complications", because it can estimate attributable deaths for body mass index (BMI) along a continuum (ie, for each unit of BMI), and can adjust for time using hazard ratio (HR) that the relative risks (RR) cannot achieve . As a result, Mokdad et al used the Mokdad procedure to estimate the attributable numbers for tobacco and alcohol, and then they reverted to the more complex Allison procedure to estimate the attributable number for obesity.

A detailed read of Allison et al's paper revealed that two steps in the Allison procedure are not well-documented. First, while the equation for the overall number of deaths attributable to obesity and overweight (ω) is given in their paper, the equation to calculate the number of deaths attributable to each individual BMI category is missing. It is therefore unclear how data in their table three can lead to results in their table four. Second, it is said in their paper that the hazard (λ) can be obtained by numerically solving a complex equation for λ. Τhe actual method, however, is not given. For the less sophisticated users, the Allison procedure is not user-friendly.

There are several questions arising from looking at these two procedures: How are the Allison and Mokdad procedures mathematically and practically different? What are the best aspects of each procedure? Can the underlying equations be combined or modified to take advantage of the best aspects of both?

This paper compares the Allison and Mokdad procedures for the estimation of annual attributable number of deaths, both mathematically and using real data. The paper also "recovers" the missing Allison equation to calculate the individual number of deaths attributable to each BMI category, develops a similar and simpler equation using the logic of the Mokdad procedure, compares estimated number of attributable deaths under the HR and RR models, and looks at several options for numerically solving the equation for λ. This paper also proposes a modified Mokdad procedure that can achieve the same results as the Allison procedure.

## Methods

Mathematical derivations from first principles from population attributable fraction (PAF), defined as the proportion of deaths in a population that can be attributed to the causal effects of a risk factor or set of factors, were used to demonstrate the relationship and differences of the Allison and Mokdad procedures. The missing mathematical equation to estimate the number of deaths attributable to each exposure category was derived for the Allison procedure. A similar equation was created for the Mokdad procedure. The two procedures were then "taken apart" and the logics behind the two procedures were examined and compared. Based on this, a new procedure (modified Mokdad) was developed combining the innovation of the Allison procedure and the logic of the Mokdad procedure. Finally, estimation methods under the hazard ratio and relative risk models were compared. Some options for solving for λ were described. Real datasets provided by Allison et al were used to illustrate the practical differences of the two procedures (Allison and the new procedure provided in this paper).

## Results

Table 1 is a conversion table of the notations used in Allison et al, Mokdad et al, and this paper. The 12 steps in the Allison Procedure and the 6 steps in the Mokdad procedure to calculate the attributable number of deaths are summarized in Additional file 1, Appendix S1, using their original notations.

### 1. Mathematical proof that the Allison procedure and Mokdad procedure differ in a neutral exposure category (Q)

Based on Levin , and using notations in Table 2, where PAF is population attributable fraction; p is the probability of death in the population, or P(D); p0 is the probability of death among the nonexposed, or P(D|E0). Then using equation T1 (Table 2), Equation 2 is another frequently quoted form of PAF .

Extending the same methodology above for the case of a dichotomous exposure variable to the case of a polytomous exposure variable, and using equation T2 (Table 3),  Equation 4 is Mokdad et al  (see Additional file 1, Appendix S1, equation A3), as 1-∑fi is P0 (Table 1).

From equation 3, But because p = (1-∑fi)p0 + ∑fi Ri p0, see equation T2 in Table 3, therefore Equation 5 is a modified form of Allison et al  (see Additional file 1, Appendix S1, equation A1), as shown below. From equations A2 (Additional file 1, Appendix S1) and 5, and given p = M/N, Comparing the Mokdad equation (6) with the Allison equation (A1), the Mokdad procedure allows only for a single nonexposed category E0 and a number of exposure categories E1 ... Ei ...Ek (see Table 3), while the Allison procedure in addition allows for a neutral (i.e., neither nonexposed nor exposed) category, in this case the category Q (underweight) (see Table 4).

The methodology of Allison et al leads to a modified Levin equation as shown below.

The original Allison equation (Additional file 1, Appendix S1, equation A1) was written for a three category exposure (R, reference group or the nonexposed E0; O, overweight and obese groups; and Q, the underweight group) (Table 4) and is where ω is number of deaths attributable to exposure categories E1 ... Ei ... Ek combined, M is total number of deaths, N is total number in population, P(R) is f0 = 1- Σfi - fq , P(O) is Σfi, P(D|R) is p0.

From equation A1, and using the notations of this paper (Table 4), given p = M/N. And since PAF= ω/M (equation A2), Comparing equation 8 (derived from Allison et al) with Levin's original equation for PAF (equation 1) which is identical to the Mokdad procedure, the Allison procedure subtracts out a certain weighted proportion of deaths associated with the neutral category (the underweight Q) from the attributable deaths to the exposure (the overweight and obese). In other words, the Allison procedure allows for a neutral exposure category (neither nonexposed nor exposed), while the Mokdad procedure does not.

### 2. Recovery of the Allison equation for number of deaths attributable to each exposure category

To derive the missing Allison equation, from equation A1, Τherefore, the missing equation is where ωi is number of deaths attributable to each exposure category i.

Equations 9 and 10 are new equations we created for the Allison procedure. It is worth noting that the underweight (Q) category disappears in equations 9 and 10, as it does not play a role in the calculations of attributable numbers to obesity (O). However, equations 9 and 10 are still difficult to use because the probabilities of death in the exposed and reference groups can only be estimated through a complex process: The next section shows that the Mokdad procedure can be modified to do the same calculations as the Allison procedure, but more simply.

### 3. Development of an equation for number of deaths attributable to each exposure category for the modified Mokdad procedure

The Allison procedure directly estimates the number of attributable deaths, ω. The Mokdad procedure indirectly estimates ω by first estimating PAF. Using the logic of the Mokdad procedure, we develop a new, modified Mokdad procedure to estimate ω as follows:

From equation 10, and using notations in Table 4,   Using Allison's notations, the above equation can also be expressed as It then follows that Equations 11 and 12 are new equations we created for exposure category-specific PAF and attributable number, respectively. Because the Mokdad procedure uses the PAF approach, the neutral category Q reappears in equations 11 and 12, because Q is part of the total population. P(Q) is easy to obtain from health surveys. Equation 12 is expected to yield identical results as those of equation 10 (Allison's), because it is derived from equation 10. Equation 12 is simpler to use as it needs only total deaths (M), fractions of exposure in each category ((P(Oi), P(R), P(Q)) which are readily available from health surveys and the relative risks (RRi, RRq). Additionally, equation 10 is difficult to use because it also requires total population (N), probability of death in each of the exposure categories (P(D|Oi)), and probability of death in the reference group (P(D|R)). The probabilities of death are difficult to obtain.

### 4. Difference in the estimated number of attributable deaths under the hazard ratio and the relative risk models

Relative risk (RR) is an estimate of hazard ratio (HR). HR is the ratio of hazard rates (instantaneous incidence rates) in the exposed to the nonexposed at a point in time . RR is the ratio of average risks of disease or death in the exposed to the nonexposed over a period of time. where h is hazard ratio (HR) and λ is hazard rate in nonexposed.

In theory, as pointed out by Allison et al, RR estimates "without adjustment for time can bias results (though the bias may be small)" . The question is, in practice, does it matter whether RR or HR is used? Table 5 is a theoretical comparison of HR and RR using equation 13, based on hazard rates of 0.01 and 0.10, and HR of 1, 3, 5, 7. When hazard rate is low (e.g., 0.10 or below), HR and RR are close to each other. The lower the hazard rate (e.g., 0.01), the closer together the HR and RR. From the real data for Alameda County Health Study provided by Allison et al, when HR was 1.39, the RR was 1.38766116; when HR was 0.98, the RR was 0.98008466 (Table 5). There is no practical difference in the real setting in the estimated number of attributable deaths under the HR and RR models.

### 5. Options for numerically solving an equation for the hazard of death in the nonexposed (λ)

There are commercial packages available for solving an equation for an unknown quantity; packages such as MATHEMATICA and MAPLE . However, for these packages there is a steep learning curve for beginners, and packages can be quite expensive . We looked into two simpler non-commercial options which one can easily program at no cost.

The first option is Newton's method (Additional file 1, Appendix S2). Applying Newton's method to the Alameda County Health Study data, provided in Allison et al's table three, gave an estimated λ of 0.008651. The second option is Taylor series (Additional file 1, Appendix S3). Applying Taylor series to the same data gave the same estimated λ of 0.008651. The two options gave virtually the same answers, with an error margin of less than 0.000001.

### 6. Comparison of the Allison procedure and modified Mokdad procedure with real datasets

We used the real dataset from the Alameda County Health Study provided by Allison et al  to compare the results using the Allison procedure and the modified Mokdad procedure, under both the hazard ratio (HR) and the relative risk (RR) models (Additional file 1, Appendix S4).

From our Additional file 1, Appendix S4, it can be seen that the results using the Allison procedure and the modified Mokdad procedure, under the hazard ratio (HR) and the relative risk (RR) models, are very similar to each other. The Allison procedure is a HR approach and the Mokdad procedure is a RR approach. Therefore the results of the Mokdad procedure using RR are closer to the Allison procedure than the Mokdad procedure using HR. However, the Mokdad procedure using HR to approximate RR provides good enough estimates of attributable number of deaths, and it avoids the use of equation 13 which involves estimation of RR that involves complex estimation of λ.

## Discussion

The procedures recommended by Allison et al  and Mokdad et al  can both be applied to estimate the number, as well as fraction, of a single outcome (such as death) attributable to a risk factor (such as increased body mass index, BMI) that is polytomous (e.g., overweight, obese, and even stratified by BMI unit). Although not specifically mentioned in the two original articles [1, 2], both procedures can be applied to one or more risk factor combinations (such as BMI and smoking) as long as the risk factor combinations are expressed in independent (i.e., nonoverlapping) exposure categories. Furthermore, both procedures are under the assumptions of no confounding and no effect modification by the risk factors of interest and other covariates (such as age or sex).

The Allison procedure can be applied to the situation when there is a nonexposed category, one or more exposure categories, and one or more neutral (neither nonexposed nor exposed) categories. Allowance of a neutral exposure category is a benefit of the Allison procedure from a causal inference perspective, because in reality the population cannot always be dichotomized into nonexposed and exposed. The Mokdad procedure cannot allow for a neutral category. This paper proposes a modified Mokdad procedure that can achieve the same results as the Allison procedure, but through a simpler way.

The Allison procedure involves twelve steps, while the Mokdad procedure involves only six steps (Additional file 1, Appendix S1). The reason why the Allison procedure involves more steps is because it attempts to directly estimate the attributable number of deaths (equation A1), and this necessitates the estimation of the probabilities of death in the nonexposed, various exposure and the neutral categories. This in turn necessitates the calculation of the hazard rate in the nonexposed, λ, which requires substantial mathematical skills. The Mokdad procedure, on the other hand, first calculates the population attributable fraction (equation A3), and then obtain the attributable number of deaths by multiplying the PAF with the total number of deaths (equation A2). Our paper (equations 1-4) shows that in the derivation of the equation for PAF in the Mokdad procedure, the probabilities of death cancel out each other in the numerator and denominator, leaving only fractions of exposure and relative risks as necessary input parameters for the estimation of PAF. This greatly simplifies the calculation process in the Mokdad procedure.

The Mokdad procedure, however, breaks down if a neutral category (such as underweight) that is neither nonexposed (such as normal weight) nor exposed (such as overweight and obese) exists. Also, while the Mokdad procedure can calculate the overall number of deaths attributable to a risk factor with multiple exposure categories, it does not calculate the number attributable to each individual exposure category. The Allison procedure, on the other hand, can estimate the individual exposure category attributable numbers (although the exact equation was not given in Allison et al .)

By combining the innovation of the Allison procedure  (i.e., allowing for a neutral category which is neither nonexposed nor exposed) and the simplicity of the Mokdad procedure  (i.e., calculating attributable numbers indirectly through population attributable fractions), this paper proposes a new procedure (modified Mokdad) to calculate population attributable fractions and numbers (Table 6). This paper also "recovers" the missing equation in Allison et al that calculates the individual exposure category attributable numbers (equation 10), and develops a similar equation using the Mokdad et al approach (equation 12). Furthermore, this paper extends the concept and equation of population attributable fraction, originally proposed by Levin , from a two-category risk factor (nonexposed, exposed) (equation 1) to a three-category risk factor (nonexposed, exposed, and neutral) (equation 8).

Both our proposed procedure (Table 6) and the Allison procedure allow for one or more neutral categories (such as underweight). The numbers of death associated with the neutral categories are excluded from the calculation of the number of death attributable to the risk factor under study. The Allison procedure involves twelve steps while the proposed procedure involves only eight steps. The proposed procedure, using the logic of the Mokdad procedure, does not require calculation of the probabilities of death in the various categories. Therefore no solving for the hazard λ is required. The proposed procedure is expected to produce similar results as the more complex Allison procedure. Slight discrepancies in the results, as shown in the real examples provided in this paper, are due to rounding errors in the additional steps in the Allison procedure to estimate probabilities of death in various nonexposed, exposure and neutral categories, and solving for λ, the hazard in the nonexposed, all of which are not required in the proposed procedure. Discrepancies will also occur depending on whether relative risks or hazard ratios are used, but this is expected to be small when the event (e.g., death) is rare (section 4). If one insists to use the Allison procedure instead of the proposed procedure, this paper discusses a number of options for solving for λ which could be helpful (section 5).

## Acknowledgements

The author would like to thank Sunita Narang, Justin Francis and Rita Zhang for statistical and data support for this paper. The author declares there is no conflict of interest. No funding or support was received for this study. The author has full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

## References

1. 1.

Allison DB, Fontaine KR, Manson JE, Stevens J, VanItallie TB: Annual deaths attributable to obesity in the United States. JAMA 1999, 282:1530–1538.

2. 2.

Mokdad AH, Marks JS, Stroup DF, Gerberding JL: Actual causes of death in the United States, 2000. JAMA 2004, 291:1238–1245.

3. 3.

Levin ML: The occurrence of lung cancer in man. Acta Union International Contra Cancrum 1953, 9:531–541.

4. 4.

Last JM: A Dictionary of Epidemiology. NewYork: Oxford University Press; 1995.

5. 5.

Department of Applied Mathematics: Mathematica, Maple, Matlab, IDL. [http://amath.colorado.edu/computing/mmm/]

6. 6.

bitwise magazine. Maple 10 v Mathematica 5.2 [http://www.bitwisemag.com/copy/reviews/software/maths/maple10_mathematica52.html]

7. 7.

Wikipedia: Newton's method. [http://en.wikipedia.org/wiki/Newton's_method]

8. 8.

WolframMathWorld: Newton's method. [http://mathworld.wolfram.com/NewtonsMethod.html]

9. 9.

Wikipedia: Taylor series. [http://en.wikipedia.org/wiki/Taylor_series]

10. 10.

WolframMathWorld: Taylor series. [http://mathworld.wolfram.com/TaylorSeries.html]

## Author information

Authors

### Corresponding author

Correspondence to Bernard CK Choi.

### Competing interests

The authors declare that they have no competing interests.

## Electronic supplementary material

### file containing Appendices S1-S4

Additional file 1: . (DOC 222 KB)

## Rights and permissions

Reprints and Permissions 