Categorisation of continuous risk factors in epidemiological publications: a survey of current practice
 Elizabeth L Turner^{1_66}Email author,
 Joanna E Dobson^{1_66} and
 Stuart J Pocock^{1_66}
DOI: 10.1186/1742557379
© Turner et al. 2010
Received: 12 April 2010
Accepted: 15 October 2010
Published: 15 October 2010
Abstract
Background
Reports of observational epidemiological studies often categorise (group) continuous risk factor (exposure) variables. However, there has been little systematic assessment of how categorisation is practiced or reported in the literature and no extended guidelines for the practice have been identified. Thus, we assessed the nature of such practice in the epidemiological literature. Two months (December 2007 and January 2008) of five epidemiological and five general medical journals were reviewed. All articles that examined the relationship between continuous risk factors and health outcomes were surveyed using a standard proforma, with the focus on the primary risk factor. Using the survey results we provide illustrative examples and, combined with ideas from the broader literature and from experience, we offer guidelines for good practice.
Results
Of the 254 articles reviewed, 58 were included in our survey. Categorisation occurred in 50 (86%) of them. Of those, 42% also analysed the variable continuously and 24% considered alternative groupings. Most (78%) used 3 to 5 groups. No articles relied solely on dichotomisation, although it did feature prominently in 3 articles. The choice of group boundaries varied: 34% used quantiles, 18% equally spaced categories, 12% external criteria, 34% other approaches and 2% did not describe the approach used. Categorical risk estimates were most commonly (66%) presented as pairwise comparisons to a reference group, usually the highest or lowest (79%). Reporting of categorical analysis was mostly in tables; only 20% in figures.
Conclusions
Categorical analyses of continuous risk factors are common. Accordingly, we provide recommendations for good practice. Key issues include predefining appropriate choice of groupings and analysis strategies, clear presentation of grouped findings in tables and figures, and drawing valid conclusions from categorical analyses, avoiding injudicious use of multiple alternative analyses.
Background
A primary goal of observational epidemiology is to assess the strength, direction and shape of relationships between risk factors (exposures) and disease outcomes using appropriate statistical methods. Reports of such studies often categorise (group) continuous variables i.e. risk factors, health outcomes or confounders. In this article we focus on the categorisation of continuous risk factors.
There has been much methodological research into the practice of categorisation covering topics such as dichotomisation [1, 2], efficiency loss and the effects of categorisation [3–9], reasons not to categorise [10, 11] and flexible modelling methods to avoid categorisation [12, 13]. A survey of reporting practices in the epidemiological literature [14] found categorisation to be common (84% of articles with a continuous risk factor used some form of categorisation). However, other than the limited information provided by this previous study, there is no documented evidence of how categorisation is carried out in published epidemiology and whether its planning, analysis and presentation are performed satisfactorily.
In this article we present an illustrative survey of recent epidemiological literature regarding such categorisation, including examples of both good and bad practice. We then present a series of recommendations for the practice of categorisation drawing from the lessons provided by the survey, from the broader literature and from experience. Such guidelines complement the STROBE guidelines [15] for the reporting of observational studies in epidemiology, in particular recommendation 11 which suggests that authors "Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen and why".
Methods
For five major epidemiological journals (American Journal of Epidemiology, Annals of Epidemiology, Epidemiology, International Journal of Epidemiology, Journal of Clinical Epidemiology) and five general medical journals (Annals of Internal Medicine, British Medical Journal, Journal of the American Medical Association, Lancet and New England Journal of Medicine) we reviewed two months' issues (December 2007 and January 2008), identifying all articles in observational epidemiology that examined in individuals the relationship between continuous risk factors and health outcomes. For the purpose of this survey, we defined a "continuous" risk factor as one which had at least 10 levels on an ordinal scale. If an article contained more than one main continuous risk factor, the first mentioned in the abstract was chosen. If more than one categorisation was performed, we focused on the most prominently featured (typically the first mentioned in the abstract). We considered only original research and excluded articles of randomised controlled trials, metaanalyses, studies where the analyses were not performed in individuals, agreement studies and studies where the main research question related to an effect modifier or interaction. Eligible articles had reporting of sufficient detail to ascertain how the continuous risk factor was analysed.
Two authors (ELT and JED) independently completed a proforma (piloted and agreed by SJP) for each eligible study using a standard form with precoded boxes and open text fields (Appendix 1 and Additional file 1). Any inconsistencies were reconciled by agreement where possible, or were resolved by the third author (SJP).
Results
Overall survey findings
Main features of the 58 eligible articles with a continuous risk factor.
Characteristic/Feature  Number of Articles n = 58 

Journal (number of articles reviewed ^{ a } )  
American Journal of Epidemiology (48)  31 (53%) 
Annals of Epidemiology (23)  9 (16%) 
Epidemiology (16)  6 (10%) 
Journal of the American Medical Association (26)  4 (7%) 
New England Journal of Medicine (32)  3 (5%) 
Annals of Internal Medicine (11)  2 (3%) 
British Medical Journal (24)  1 (2%) 
International Journal of Epidemiology (16)  1 (2%) 
Lancet (49)  1 (2%) 
Journal of Clinical Epidemiology (9)  0 
Number of participants  
< 1,000  10 (17%) 
1,0005,000  23 (40%) 
5,00020,000  8 (14%) 
20,000100,000  11 (19%) 
> 100,000  6 (10%) 
Study Design  
Cohort  31 (53%) 
Crosssectional  17 (29%) 
Casecontrol  10 (17%) 
Type of outcome  
Binary  27 (47%) 
Time to event  18 (31%) 
Continuous  9 (16%) 
Ordered categorical  2 (3%) 
Unordered categorical  2 (3%) 
Type of analysis for continuous exposure variable  
Categorically only  29 (50%) 
Both continuously and categorically  21 (36%) 
Continuously only  8 (14%) 
Characteristics of Categorisation
Categorisation characteristics of the main continuous risk factor.
Categorisation Characteristics  Number of Articles n = 50 

Number of categorisations used  
One  38 (76%) 
Two  12 (24%) 
Three or more  0 
Number of categories ^{ a }  
2  3 (6%) 
3  9 (18%) 
4  17 (34%) 
5  13 (26%) 
6  4 (8%) 
7 to 10  3 (6%) 
Unknown^{b}  1 (2%) 
Choice of category boundaries  
Quantiles  17 (34%) 
Equally spaced intervals  9 (18%) 
External criteria  6 (12%) 
Other  17 (34%) 
Unknown^{b}  1 (2%) 
"Zero/Never" category  10 (20%) 
Presentation of categorical results  
Tables only  37 (74%) 
Figures only  5 (10%) 
Both Tables and Figures  5 (10%) 
Neither  3 (6%) 
Example of categorisation from the survey (1).
Formaldehyde levels (ppb)  Adjusted  

Prevalence (%)  Odds Ratio (95% CI)  Odds Ratio (95% CI)  
< 18  15/298 (5.0)  1.00  
1827  15/299 (5.0)  1.03 (0.472.29)  1.00 
2846  17/301 (5.7)  1.11 (0.502.42)  
≥47  10/100 (10.0)  2.36 (0.926.09)  2.25 (1.015.01) 
Pvalue for trend = 0.08 
Example of categorisation from the survey (2).
34 < 37 weeks  < 34 weeks  

Inflammation  No.  Prevalence (%)  Adjusted Odds Ratio  95% CI  Prevalence (%)  Adjusted Odds Ratio  95% CI  
No  279  20.8  1.0  8.6  1.0  
Yes  58  31.0  1.9  1.0,3.7  15.5  2.0  0.8,4.9 
Example of categorisation from the survey (3).
% optimal birth weight  Adjusted  

Odds Ratio  95% CI  
< 75  2.42  1.93,3.05 
7584  1.73  1.47,2.02 
8594  1.09  0.95,1.26 
95104  1 (referent)  
105114  0.98  0.83,1.15 
115124  0.97  0.76,1.24 
> 124  1.15  0.78,1.69 
Example of categorisation from the survey (4).
No. of cases  No. of controls  Odds Ratio  95% CI  

No bereavement  2589  12722  Referent  
Time since bereavement (yrs)  
≤ 5  24  180  0.6  0.4, 1.0 
610  18  116  0.8  0.5, 1.2 
1115  8  107  0.4  0.2, 0.8 
1620  11  80  0.7  0.4, 1.3 
≥ 21  44  265  0.8  0.6, 1.1 
Overall, we note that there was usually little discussion of why categorisation was performed or justification for the number of categories chosen. Justification provided could be for testing of nonlinearity [17, 24] or to detect a threshold effect [25].
Choice of category boundaries. Boundaries for the categories were chosen using quantiles (n = 17, 34%), equallyspaced intervals (n = 9, 18%), external criteria (n = 6, 12%), or by other means (n = 17, 34%) (Table 2). One article (2%) did not provide details.
All 10 cohort studies which used the 'quantiles' approach based it on the risk factor distribution in the entire cohort whilst the two casecontrol studies which used quantiles (Chen et al. [26] used quintiles and Tworoger et al. [27] quartiles) based it on the distribution in the controls only. For skewed risk factor distributions, nonequal categories could be used. For example, in a crosssectional study, Matsunga et al. [19] used the 30^{th}, 60^{th} and 90^{th} percentiles of the distribution of the exposure in all subjects as the cutoffs to create 4 groups (Table 3). As noted above, the authors also provided results from a twogroup categorisation which combined the lower three groups to compare to the upper 10^{th} of the distribution.
Equallyspaced boundaries, for example 5 or 10year age bands, were used by 9 (18%) articles. Such equallyspaced boundaries were also used for variables expressed as percentages. For example, in assessing the relationship between percentage of optimal birth weight and intellectual disability, Leonard et al. [21] used groups of width 10% (Table 5).
An 'external criteria' approach to categorisation was classified as one which used wellrecognised, published boundaries for the risk factor. Six articles (12%) used such external groupings. For example, Brunner Huber et al. [28] used WHO guidelines for body mass index (underweight: < 18.5; normal weight: 18.5 24.9; overweight: 2529.9; obese ≥ 30 kg/m^{2}) to examine the effect of obesity on oral contraceptive failure.
Other approaches to categorisation were used by 17 (34%) articles. For example, in assessing the relationship between blood levels of vitamin D and fracture, Roddam et al. [29] categorised the exposure according to "proposed levels of vitamin D deficiency" whilst Park et al. [30] used "predefined categories of total calcium and dairy food intakes to maximise contrasts and ensure comparability with other studies" in assessing the effect of calcium on prostate cancer.
"Zero/Never" categories. Ten (20%) of the fifty categorisations described were of risk factors with a spike/clumping at the zero level of that risk factor or with a 'never' exposed category (Table 2). In the former, the spike at zero was used to form a 'zero' category, for example, 'packyears of smoking' [31] and 'average weekly drinks' [32]. An example of the latter was a casecontrol study to assess the effect of time since bereavement due to loss of a child on the risk of ALS [23], where most subjects (94%) had not lost a child. Thus, the exposure 'years since bereavement' contained a 'never' category (Table 6).
Presentation of categorical results. The majority (n = 37, 74%) of the fifty articles with a categorical analysis presented the results in tables only (Table 2). Another 5 (10%) also included a figure with a table, whilst 5 (10%) used figures only. A few articles (n = 3) provided neither tables nor figures. Such articles referred to having used categorisation simply for the purposes of exploring the relationship between the exposure and outcome rather than to present the results, e.g. Inskip et al. [25].
An example of a figure which has been used to summarise results from both a continuous and categorical analysis is presented in Figure 1. With this figure, Rosenlund et al. [17] clearly convey the relationship between NO_{2} exposure and incidence of first coronary event.
Estimation and Inference
Estimation and statistical testing by analysis type^{a}.
Analysis type  

Continuous (n = 8)  Categorical (n = 29)  Both (n = 21)  Overall (n = 58)  
Type of estimate  n  
Continuous  7  0  16  23 (40%) 
By group for all groups  0  4  6  10 (17%) 
By group relative to ref group  0  26  12  38 (66%) 
Other  1^{b}  1^{c}  3^{d}  5 (9%) 
Type of statistical test  n  
Continuous  8  0  19  27 (47%) 
Score trend test  0  11  1  12 (21%) 
Median/mean trend  0  7  1  8 (14%) 
Pairwise  0  17  9  26 (45%) 
Global  0  3  6  9 (16%) 
Other  0  0  1^{e}  1 (2%) 
Fifty articles presented categorical analyses (29 categorical only and 21 both continuous and categorical). Of these, 10 (20%) presented an estimate by group for all groups usually (50%) accompanied by a CI or SE. Thirtyeight (76%) articles presented an estimate by group relative to a reference group, accompanied by a CI or SE in all but one case. Of those, most (n = 30, 79%) selected an extreme category (highest: n = 5, 13%; lowest: n = 25, 66%) as the reference group, whilst 21% (n = 8) chose a category in the middle of the distribution. When selecting the reference group of nonequally distributed categories, the largest group may be selected. For example, in categorising calcium intake to assess its effect on prostate cancer risk, Park et al. [32] selected the second of six predefined groups which, we note, is the largest group.
Inference. Most articles reported results in terms of statistical significance, either by formal hypothesis testing (i.e. by reporting pvalues) or by an inferred hypothesis test via interpretation of confidence intervals (Table 7). Of the 29 articles with a continuous analysis, 27 performed statistical testing either formally by use of pvalues (19 articles) or implicitly by interpreting confidence intervals (8 articles). Of the 50 articles with a categorical analysis, 20 used some kind of trend test: most commonly by assigning a score to each category (where a 1unit increase in the score corresponded to moving to the next highest category in the order of categories). One alternative was to assign a category mean or median to each member of a category and then analyse that as a continuous variable in the appropriate statistical model, e.g. Park et al. [32]. Overall, global tests across all categories without use of their ordering were used in 8 articles, all using pvalues. Pairwise tests comparing all groups to a reference group were used in 26 articles, 11 by pvalues and 15 via interpretation of confidence intervals.
Discussion
Understanding the relationship between a continuous exposure variable (risk factor) and a health outcome involves determining the direction, strength and shape of that relationship. Our survey demonstrates that categorisation is commonly used (86% of such articles surveyed), as was seen in an earlier survey (84% of such articles surveyed) [14]. With its breadth of information, the current survey has shown that there is great diversity in practice. The categorisation of continuous confounding variables and continuous outcome variables, although not addressed in this article, also plays an important role in the analysis of epidemiological data.
Motivation for categorisation
Much research exists detailing the advantages and disadvantages of categorisation vs. analysing variables continuously (Appendix 2). From a statistical viewpoint, categorisation of a continuous variable can often result in a loss of statistical efficiency [4–6]. Some authors [10, 11] advise against the practice of categorisation irrespective of the number of categories.
More practical considerations, however, may favour categorisation for ease of interpretation of parameter estimates, their presentation to less statisticallyminded public health professionals, and may be motivated by use of clinically relevant 'cut points' if they exist. Categorisation is often used in conjunction with a continuous analysis, for example, to check for, or model, nonlinear effects. Some authors [12, 13] have advocated the use of alternative, more flexible modelling approaches using the continuous variable such as spline regression modelling and generalised additive models to model nonlinear outcomeexposure relationships, which avoid the need to categorise. Such modelling approaches are more statistically complex which may limit their widespread use, and can pose difficulties of interpretation e.g. is statistical uncertainty adequately expressed, and does the model extrapolate beyond the observed range of the data.
The decision to categorise a continuous risk factor should be made in light of the various advantages and disadvantages and will differ for each specific situation. For example, in examining the relationship between biomarkers of inflammation and mortality using a continuous analysis only [16], the analyses may have benefitted from a categorical analysis: a hazard ratio per 1 unit increase in the biomarker is not as readily interpretable.
Despite recommendation 11 of the STROBE statement [15] which advises authors to explain how continuous variables were analysed and to describe how and why groupings were chosen, few authors in our survey described their reasons for categorising a continuous variable. Ideally, choice and rationale for categorisation should be made prior to data analysis and documented in the publication.
Specific Choice of Categorisation
When categorisation of a continuous risk factor is performed, decisions on the nature of the categorisation are needed i.e. the number of categories and the choice of cutpoints. These may differ depending on the reason for categorisation and the size of the study. For example, a larger number of groups may be used when the study is large or when the purpose of categorisation is to check for nonlinear effects. When the purpose is to assess effect modification (subgroup analyses) lack of statistical power may necessitate fewer groups.
There is much theoretical work on the nature of categorisation [4–6]. Equallysized groups (such as using quartiles) have the merit of objective simplicity but are not the most statistically efficient choice. For instance, with Normal data it is more efficient to have larger groups in the centre of the distribution making the extreme groups smaller and hence even more extreme. Statistical efficiency increases with the number of groups [4, 6], i.e. the more crude the grouping (e.g. 2 groups), the greater the loss of statistical power. Statistical efficiency is usually greatest using a continuous analysis, provided the model fit (e.g. linearity assumption) is good.
In selecting the category boundaries it is necessary to determine how many categories will be formed as sufficient individuals/events are needed in each group. Sensibly, only studies with large sample size or a strong exposureoutcome relationship would be able to support a large number of categories. If 'natural' or clinically important cutpoints are relevant for the exposure variable then it is still important to verify that sufficient information is present in each of the categories for a robust analysis. If not, merging of some of the categories may be required.
Dichotomisation of the exposure variable is strongly advised against. In terms of statistical power, it is equivalent to discarding a third of the data [1, 2] and makes it impossible to detect nonlinearity in the exposureoutcome relationship. In our survey, no article solely presented results from dichotomisation, but three articles did place more emphasis on dichotomised results.
Multiple alternative categorisations should be undertaken with care and interpreted cautiously, as deliberate or subconscious data dredging could lead to a choice of grouping that accentuates an association thus increasing the risk of a false positive finding, and/or an exaggerated estimate of the exposure/outcome relationship. However, additional investigation of effect modification (subgroup analyses) may necessitate a secondary categorisation with fewer groups.
Inevitably readers can only assess categorisations reported in publications: other categorisation choices may have been analysed but not included. As a result, our survey of current practice is limited to what is published with no awareness of authors' selections in what they chose to report so that a full critical evaluation of those choices is not possible here. Authors usually did not explicitly report the reasons for the number of categories and choice of category boundaries, or specify whether these were chosen prior to analysis and whether they were the only categories explored.
Estimation and Inference
Practice varies as to which contrasts are best used for estimation and for inference in a categorical analysis. Pairwise comparisons relative to a reference group was the most commonly reported approach in our survey, and they are easy to interpret. For such estimates, the highest or lowest category is usually (79%) chosen as the reference group possibly for ease of interpretation. However, if the largest group is not an extreme category, that may be a better choice of reference category for both statistical efficiency and practical reasons. Such multiple pairwise tests increase the chance of false positive results. Hence, a global test of the relationship between the categorised risk factor and the outcome is desirable. If the relationship between the risk factor and outcome appears to be monotonic, a trend test will be substantially more powerful and insightful than an unordered global test across multiple groups. It may also be helpful to report the estimate from a continuous analysis. All estimates, whether pairwise or otherwise, should be accompanied by either their standard error or 95% CI as an indication of statistical uncertainty.
Presentation and reporting of results
Most articles in our survey reported the results of their categorical analysis in table form only (74%); whereas only 20% (10 articles) used figures and 6% (3 articles) used neither. We would encourage a greater use of figures as a valuable way of visually conveying information across categories to the reader as demonstrated by the example shown in Figure 1 [17]. Ideally, the number of patients and, if relevant, the number of events and estimates in each category should also be tabulated on the figure e.g. as numbers under the xaxis. If multiple exposure variables are of interest, space constraints may prevent all results being reported as figures.
Care should be taken in the choice of results to report in the article's abstract. These should accurately reflect the analysis as a whole and avoid only reporting statistically significant results, particularly if multiple categorisations of the same risk factor were undertaken with differing conclusions
Recommendations
 1.
Be aware of the advantages and disadvantages of categorisation (Appendix 2).
 2.
Define (as far as possible) the chosen categories prior to analysis but be careful to not miss interesting hypothesis generating opportunities, especially in large studies.
 3.
Consider the distribution of the data when choosing categories: for skewed exposure distributions, consider cutpoints which sensibly capture the tail of the distribution; for more symmetric distributions, refer to the theoretical literature [5, 6] in considering whether to deviate from grouping into equal sized groups.
 4.
Report clearly the reasons for categorisation and the specific chosen boundaries.
 5.
Take care when choosing the number of categories, bearing in mind the extent of data available i.e. large studies may permit a large number of categories.
 6.
It is best to avoid dichotomisation.
 7.
Be wary of injudicious use of multiple alternative categorisations, especially if done to artificially accentuate associations.
 8.
Consider use of figures to more clearly visualise the pattern of outcomes across categories.
 9.
Provide numbers of participants and events by category in appropriate tables and figures.
 10.
Consider use of an appropriate trend test across groups.
 11.
Provide confidence intervals for point estimates within each group, or appropriate estimates and confidence intervals of pairwise betweengroup differences.
 12.
Take care in choosing the appropriate estimates (and significance tests) for association, especially regarding the choice of reference group for pairwise group comparisons or the wisdom of a more global monotonic trend across groups.
 13.
Consider presenting results from continuous analysis including statistical modelling to account for nonlinear associations (e.g. spline modelling, generalised additive models), though beware of potential model instabilities and over interpretation.
Conclusions
There exists a healthy debate concerning the advantages and disadvantages of categorisation. Some are of the opinion that categorisation should not be used even going so far as to state that "categorisation of continuous data is not necessary, and indeed is not a natural way of analyzing continuous data for most statisticians" [pg566, 10]. We venture the alternative view that the categorisation of continuous risk factors has, and will likely continue to play, an important role in the analysis of epidemiological data.
In this article we have focused on the diversity of current practice in the use of categorisation in epidemiological studies. We hope that our survey, critical appraisal and consequent recommendations regarding how categorisation (grouping) is, and should be, presented will be of value to future authors and journals in enhancing the quality of epidemiological publications.
Appendix 1
Survey information collected from eligible epidemiological publications
Type of study design (casecontrol, cohort, crosssectional)
Main outcome

Type (binary, timetoevent, ordered categorical, unordered categorical, continuous and other)
Main continuous risk factor

Characteristics:

Primary outcome measure (e.g. odds ratio, rate ratio)

Type of analysis i.e. treated as a continuous variable only, as a categorical variable only or both; if 'both', was emphasis on continuous or categorised form

Number of other continuous risk factors categorised in the same manner

Nature of the categorisation:

Number of categories

Criteria used to select boundaries of categories (i.e. quantiles; equallyspaced groups; external criteria where an explicit reference to wellrecognised boundaries was provided; other)

Number of alternative categorisations

Inclusion of a "zero/never" category

Details of the analysis:

Estimation: type of estimate (continuous; by group for all groups; by group relative to reference group; other) and reporting of confidence intervals

Statistical testing: type of test (continuous analysis; trend test; pairwise comparisons; global test; other) and reporting of pvalues

Presentation of categorical results:
Tables, figures or both
Appendix 2
Advantages and disadvantages of categorisation of continuous risk factors
Advantages of categorisation

Presentation of results may be simpler to understand by nonstatisticians. For instance, some people may find risks presented relative to a reference group easier to interpret than regression coefficients or correlation coefficients.

Results may relate more directly to individuals and thus be more readily interpretable. For example, a relative risk for a high category versus a low category subject could be obtained.

There may be a natural or conventional form of categories that should be used. For example, SBP < 140, 140159, ≥160 mmHg.

Categorisation may remove the need for any parametric assumptions regarding the shape (e.g. linearity) of the outcome/exposure relationship.

A 'never' or 'zero' exposed category can be easily incorporated into a categorical analysis e.g. 'no bereavement' for the exposure 'years since bereavement'.
Disadvantages of categorisation

No single "right answer", as different choices of categories may lead to somewhat different findings, and sometimes conclusions may actually differ.

No agreed objective criteria on the number of groups to choose or the boundaries (cutoff points) for grouping.

No agreement on which contrasts to use for inference, e.g. whether to use trend test or pairwise comparisons.

No agreement whether to use an extreme (i.e. lowest or highest) or middle (most common) group as the reference.

Deliberate or subconscious data dredging could lead to a choice of grouping that accentuates an association thus increasing the risk of a false positive finding, and/or an exaggerated estimate of the exposure/outcome relationship.

Statistical power/efficiency is lost compared to a continuous variable in regression.

Continuous modelling can potentially give greater insight.
Declarations
Acknowledgements
We thank Tim Collier for his helpful advice during the initial stages of the study and the two reviewers whose helpful comments improved the overall clarity of the article.
Authors’ Affiliations
References
 Altman DG, Royston P: The cost of dichotomising continuous variables. Br Med J 2006, 332:1080.View ArticleGoogle Scholar
 Royston P, Altman DG, Sauerbrei W: Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med 2006, 25:127–141.View ArticlePubMedGoogle Scholar
 Cochran WG: The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 1968, 24:295–313.View ArticlePubMedGoogle Scholar
 Connor RJ: Grouping for testing trends in categorical data. J Am Stat Assoc 1972, 67:601–604.View ArticleGoogle Scholar
 Cox DR: Note on grouping. J Am Stat Assoc 1957, 52:543–547.View ArticleGoogle Scholar
 Lagakos SW: Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. Stat Med 1988, 7:257–274.View ArticlePubMedGoogle Scholar
 Morgan TM, Elashoff RM: Effect of categorising a continuous covariate on the comparison of survival time. J Am Stat Assoc 1986, 81:919–921.View ArticleGoogle Scholar
 Taylor JMG, Yu M: Bias and efficiency loss due to categorising an explanatory variable. J Multivar Anal 2002, 83:248–263.View ArticleGoogle Scholar
 Zhao PZ, Kolonel LN: Efficiency loss from categorising quantitative exposures into qualitative exposures in casecontrol studies. Am J Epidemiol 1992, 136:464–474.PubMedGoogle Scholar
 Altman DG: Categorizing continuous variables. In Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. Chicester: John Wiley and Sons; 1998:563–567.Google Scholar
 Dinero TE: Seven Reasons why you should not categorise continuous data. J Health Soc Policy 1996, 8:63–72.View ArticlePubMedGoogle Scholar
 Greenland S: Doseresponse and trend analysis in epidemiology: alternatives to categorical analysis. Epidemiology 1995, 6:356–365.View ArticlePubMedGoogle Scholar
 Greenland S: Avoiding power loss associated with categorisation and ordinal scores in doseresponse and trend analysis. Epidemiology 1995, 6:450–454.View ArticlePubMedGoogle Scholar
 Pocock SJ, Collier TJ, Dandero KJ, de Stavola BL, Goldman MB, Kalish LA, Kasten LE, McCormack VA: Issues in the reporting of epidemiological studies: a survey of recent practice. Br Med J 2004, 329:883–888.View ArticleGoogle Scholar
 von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP: The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 2007, 370:1453–1457. STROBE InitiativeView ArticlePubMedGoogle Scholar
 Vidula H, Tian L, Liu K, Criqui MH, Ferrucci L, Pearce WH, Greenland P, Green D, Tan J, Garside DB, Guralnik J, Ridker PM, Rifai N, McDermott MM: Biomarkers of inflammation and thrombosis as predictors of nearterm mortality in patients with peripheral arterial disease: a cohort study. Ann Intern Med 2008, 148:85–93.PubMedGoogle Scholar
 Rosenlund M, Picciotto S, Forastiere F, Stafoggia M, Perucci CA: Trafficrelated air pollution in relation to incidence and prognosis of coronary heart disease. Epidemiology 2008, 19:121–128.View ArticlePubMedGoogle Scholar
 Tsai SP, Ahmed FS, Wendt JK, Bhojani F, Donnelly RP: The impact of obesity on illness absence and productivity in an industrial population of petrochemical workers. Ann Epidemiol 2008, 18:8–14.View ArticlePubMedGoogle Scholar
 Matsunga I, Miyake Y, Yoshida T, Miyamoto S, Ohya Y, Sasaki S, Tanaka K, Oda H, Ishiko O, Hirota Y, The Osaka Maternal and Child Health Study Group: Ambient formaldehyde levels and allergic disorders among Japanese pregnant women: baseline data from the Osaka Maternal and Child Health Study. Ann Epidemiol 2008, 18:78–84.View ArticleGoogle Scholar
 Catov JM, Bodnar LM, Ness RB, Barron SJ, Roberts JM: Inflammation and dyslipidemia related to risk of spontaneous preterm birth. Am J Epidemiol 2007, 166:1312–1319.View ArticlePubMedGoogle Scholar
 Leonard H, Nassar N, Bourke J, Blair E, Mulroy S, de Klerk N, Bower C: Relation between intrauterine growth and subsequent intellectual disability in a tenyear population cohort of children in Western Australia. Am J Epidemiol 2008, 167:103–111.View ArticlePubMedGoogle Scholar
 Cauley JA, Hochberg MC, Lui LY, Palermo L, Ensrud KE, Hillier TA, Nevitt MC: Longterm risk of incident vertebral fractures. JAMA 2007, 298:2761–2767.View ArticlePubMedGoogle Scholar
 Fang F, Ye W, Fall K, Lekander M, Wigzell H, Sparen P, Adami HO, Valdimarsdóttir U: Loss of a child and the risk of amyotrophic lateral sclerosis. Am J Epidemiol 2008, 167:203–210.View ArticlePubMedGoogle Scholar
 Bartali B, Frongilo EA, Guralnik JM, Stipanuk MH, Allore HG, Cherubini A, Bandinelli S, Ferrucci L, Gill TM: Serum micronutrient concentrations and decline in physical function among older persons. JAMA 2008, 299:308–315.View ArticlePubMedGoogle Scholar
 Inskip HM, Dunn N, Godfrey KM, Cooper C, Kendrick T, Southampton Women's Survey Study Group: Is birth weight associated with risk of depressive symptoms in young women? Evidence from the Southampton women's survey. Am J Epidemiol 2008, 167:164–168.View ArticlePubMedGoogle Scholar
 Chen H, O'Reilly EJ, Schwarzschild MA, Ascherio A: Peripheral inflammatory biomarkers and risk of Parkinson's disease. Am J Epidemiol 2008, 167:90–95.View ArticlePubMedGoogle Scholar
 Tworoger SS, Lee IM, Buring JE, Hankinson SE: Plasma androgen concentrations and risk of incident ovarian cancer. Am J Epidemiol 2008, 167:211–218.View ArticlePubMedGoogle Scholar
 Brunner Huber LR, Toth JL: Obesity and oral contraceptive failure: findings from the 2002 national survey of family growth. Am J Epidemiol 2007, 166:1306–1311.View ArticlePubMedGoogle Scholar
 Roddam AW, Neale R, Appleby P, Allen NE, Tipper S, Key TJ: Association between plasma 25hydroxyvitamin D levels and fracture risk: the EPICOxford study. Am J Epidemiol 2007, 166:1327–1336.View ArticlePubMedGoogle Scholar
 Park Y, Mitrou PN, Kipnis V, Hollenbeck A, Schatzkin A, Leitzmann MF: Calcium, dairy foods, and risk of incident and fatal prostate cancer: the NIHAARP diet and health study. Am J Epidemiol 2007, 166:1270–1279.View ArticlePubMedGoogle Scholar
 Kifley A, Liew G, Wang JJ, Kaushik S, Smith W, Wong TY, Mitchell P: Longterm effects of smoking on retinal microvascular caliber. Am J Epidemiol 2007, 166:1288–1297.View ArticlePubMedGoogle Scholar
 Mukamal KJ, Kennedy M, Cushman M, Kuller LH, Newman AB, Polak J, Criqui MH, Siscovick DS: Alcohol consumption and lower extremity arterial disease among older adults: the cardiovascular health study. Am J Epidemiol 2008, 167:34–41.View ArticlePubMedGoogle Scholar
 Auchincloss AH, Diez R, Ana V, Brown DG, Erdmann CA, Bertoni AG: Neighborhood resources for physical activity and healthy foods and their association with insulin resistance. Epidemiology 2008, 19:146–157.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments
View archived comments (1)