Skip to main content

Choosing an appropriate bacterial typing technique for epidemiologic studies


A wide variety of bacterial typing systems are currently in use that vary greatly with respect to the effort required, cost, reliability and ability to discriminate between bacterial strains. No one technique is optimal for all forms of investigation. We discuss the desired level of discrimination and need for a biologic basis for grouping strains of apparently different types when using bacterial typing techniques for different epidemiologic applications: 1) confirming epidemiologic linkage in outbreak investigations, 2) generating hypotheses about epidemiologic relationships between bacterial strains in the absence of epidemiologic information, and 3) describing the distributions of bacterial types and identifying determinants of those distributions. Inferences made from molecular epidemiologic studies of bacteria depend upon both the typing technique selected and the study design used; thus, choice of typing technique is pivotal for increasing our understanding of the pathogenesis and transmission, and eventual disease prevention.


Ever since Koch discovered how to grow bacteria in pure culture, the laboratory has been an integral component of epidemiologic studies of bacterial diseases. Over time, our ability to discriminate among bacterial strains from the same species has increased, enhancing outbreak investigations and surveillance, studies of the natural history of infection, and our understanding of the transmission, pathogenesis and phylogeny of bacteria.


Bacterial typing systems

Traditional typing systems for discriminating between bacteria from a single species have been based on phenotype, such as serotype, biotype, phage typing, or antibiogram (susceptibility to one or more antibiotics). More recently, techniques have been developed based on indirect measures of genetic sequence (such as pulsed-field gel electrophoresis (PFGE)) and direct measures of genetic sequence (such as multilocus sequence typing (MLST)). Sequencing an entire bacterial genome, and, using microarray technologies, comparing strains to a reference strain (comparative genomic hybridization) is now technically feasible; however, the cost and time required limits the applicability for most epidemiologic studies. For example, in 2005, total genomic sequencing costs roughly 100 to 500 times more per strain than comparative hybridization (~$100,000 to $500,000 versus ~$1000 to $2000), and MLST (~$140) is quite costly compared to PFGE (~$20). Further, we have yet to characterize the range of variability among bacterial strains of a single species by various techniques, and thus lack an appropriate context for interpreting the observed variation.

Understanding the strengths and weaknesses of the chosen bacterial typing technique enhances interpretation and generalization of study results. A summary of common typing techniques and the relative discriminatory power, repeatability (same test result, given random error, for same analysis on same sample in the same laboratory), reproducibility (same test result, given random error, for same analysis on same sample in a different laboratory), timing and cost is presented in Table 1; techniques have been recently reviewed elsewhere [13]. We have ordered techniques from those with the highest to lowest discriminatory power, that is, ability to distribute strains into the greatest number of groups. Thus, if the entire genome of a bacteria is sequenced we will be able to detect even very small differences between strains, for example, changes in gene sequence that do not cause changes in the expressed proteins, such as point mutations that naturally occur over time as the bacteria divides. Common typing techniques used in epidemiologic studies sequence one or more genetic regions, for example multi-locus sequence typing (MLST), or use enzymes to cut part or all of the genome into pieces, for example, pulsed-field gel electrophoresis. The number and size of the pieces correspond to the number and location of restriction sites cut by the enzymes, and thus are an indirect measure of sequence. Other common techniques use the polymerase chain reaction targeted to specific sequences, for example ERIC-PCR; the resulting reactions yield fragments of different sizes, which can be used to discriminate between bacterial types. Generally speaking, sequence-based methods are most repeatable and reproducible. Gel-based methods are less so, because of the inherent variability of the technique [2, 3].

Table 1 Comparison of Common Bacterial Typing Techniques by Relative Discriminatory Power, Reproducibility, Repeatability, and Whether They Give Information on Dispersed or Focal Parts of the Genome, Time Required and Cost

Our intention is not to focus on a particular technique, as the techniques continue to change rapidly. Instead, we discuss the strengths and weaknesses of current bacterial typing techniques for particular epidemiologic applications, and provide some insight into what characteristics a typing technique should have when applied to a specific research question. We recognize that choice of a molecular tool is often up to laboratory personnel and not the epidemiologist; however, laboratorians are not always involved in study design or the interpretation of study results (although this is highly desirable). A laboratorian, whose expertise is in a particular typing technique, cannot be expected to give appropriate advice if s/he does not understand the research question asked. Similarly, an epidemiologist cannot appropriately analyze and interpret results of a typing technique if s/he does not understand what it is measuring. Furthermore, if there is a mismatch between typing technique and research question, the study results are less likely to answer the research question. Unfortunately, epidemiologists and laboratorians often have little training in each other's fields, do not share a common vocabulary, and have very different research perspectives. Thus, our goal is to provide guidance for the epidemiologist about working collaboratively with laboratories to choose the appropriate bacterial typing technique, and for interpreting the results.

Epidemiologic Applications of Bacterial Typing Techniques

Discriminatory power is the average probability that a typing system will assign the same strain type to strains randomly sampled from the same group. In a typical analysis, epidemiologists use questionnaire data to discriminate between groups. For example, if investigating a foodborne outbreak associated with a picnic, then the variable 'ate food at the picnic' will be a poor discriminator of disease risk (as probably all ate), but 'ate potato salad' or even 'ate potatoes' might accurately classify individuals into high and low risk groups (if an ingredient in the potato salad, such as the eggs or mayonnaise, was the culprit). If we classify individuals into groups by all variables measured simultaneously (e.g., age, gender, food preferences, medical history, etc.), then our measure will be highly discriminatory (as each individual might fall into a separate group) – although not necessarily informative with respect to disease risk. Thus, the most discriminatory grouping is not necessarily the most informative, particularly if the groupings are not associated with the outcome of interest.

Bacterial typing techniques are analogous, but may or may not provide an appropriately discriminatory grouping (similar to 'ate potato salad'). We have identified three purposes where molecular typing techniques are applied in epidemiologic studies (Table 2). We give an example of a research goal that relates to each purpose, provide an assessment of the required discriminatory power and need to infer genetic relationships and/or population structure for that particular application. Each purpose is discussed, in turn, below.

Table 2 Required Discriminatory Power and Need to Infer Genetic Relationships and/or Population Structure for Various Epidemiologic Applications of Bacterial Typing Techniques

First, however, we wish to point out that bacterial typing is not always the correct classification tool, as outbreaks are not always caused by a single, virulent clone. Contamination of the water or food supply by sewage can lead to an outbreak of diarrhea caused by a variety of different agents [46] although clonal outbreaks also occur following sewage contamination [7]. Other examples are the breakdown of abattoir procedures that lead to contamination from cows colonized with diverse agents, or of nursery hygiene procedures allowing transmission from visitors to children.

Further, strain typing results must be interpreted in the context of epidemiologic evidence as well as the characteristics of the bacteria. Neither laboratory nor epidemiologic evidence is definitive, but each validates the other. When epidemiologic evidence suggests contamination arising from diverse sources, stricter molecular typing criteria should not be used to classify cases as epidemic related. If typing data suggests a high degree of similarity, epidemiologic evidence should be sought relevant to a single contamination episode.

Confirm Epidemiologic Linkage

One of the most common applications of bacterial typing in an epidemiologic study is in the context of an outbreak investigation. Bacterial typing is used to confirm or refute epidemiologic evidence that cases are linked or that a particular food item, water source, or fomite was the source of infection. In this situation the laboratory data is essentially confirmatory and the required discriminatory power and need to infer genetic relationships or structure is low. If there is strong epidemiologic evidence linking a specific food item with disease (common or point source), for example, we often make public health decisions based on that evidence alone – even if there is no supporting laboratory evidence. In the vast majority of foodborne outbreaks, the suspected food is not available for culture and a definitive linkage cannot be demonstrated [8]. Nonetheless, these investigations often successfully identify correctable breaks in hygiene practice. However, even modestly discriminatory techniques are useful since the laboratory evidence confirms the epidemiologic findings. For this type of confirmation, using a rapid and inexpensive technique (like ERIC-PCR) might be preferred since the cost and time associated with a more definitive technique (like MLST) would add little to our understanding of the source of infection or the ultimate policy decision.

Generate hypotheses about epidemiologic relationships between bacterial strains in the absence of epidemiologic data

Molecular typing has increased the power of surveillance data to detect outbreaks. The Foodborne Diseases Active Surveillance Network (FoodNet) conducted by the Centers for Disease Control and Prevention uses pulsed-field gel electrophoresis to type surveillance isolates for several foodborne pathogens, including E. coli O157:H7, nontyphoidal Salmonella serotypes, Listeria monocytogenes and Shigella [9]. Bacterial typing of space-time clusters has identified unsuspected linkages triggering investigations, as well as demonstrating that apparent clusters were not related, ruling out need for investigation [10].

Molecular typing also facilitates the detection of chains of transmission. Molecular typing led to a reassessment of the epidemiology of tuberculosis in the United States by establishing that tuberculosis does not require prolonged contact but can be transmitted in casual settings [11]. Typing also allows us to relate clinical outcome to strain types, distinguishing recent tuberculosis infection from reactivation of disease, [12] and establishing that an individual can be infected with a second, different tuberculosis strain following initial infection [13].

When the investigator needs to identify potential outbreaks by typing surveillance isolates, or to distinguish between point source and propagated outbreaks, a more discriminatory technique is required. In a common or point source outbreak we expect the causative agent to be similar in all infected persons. Therefore, a more discriminatory technique is necessary to determine if a space-time cluster of isolates detected via surveillance represents a potential outbreak compared to a technique for typing isolates already epidemiologically linked. In a propagated outbreak or when tracking chains of transmission, the genetic sequence of the bacteria may be slightly different at the end compared to the beginning of the outbreak (how fast this occurs depends on the bacteria, however). If the bacteria are naturally competent, i.e., easily uptake DNA from other members of the species, such as non-typeable Haemophilus influenzae [14], a highly discriminatory typing technique may erroneously misclassify epidemic cases identified at the end of the epidemic as non-epidemic, particularly if there are no endemic strains available for comparison. Using a typing technique that allows classification consistent with phylogenetic relationships (e.g., MLST), or, if the bacteria is highly recombinant, with clonal complexes, is helpful as there is a biologically meaningful way to group strains (that is, logically collapse groups of related strains). Unfortunately, many typing techniques are analogous to nominal scales, e.g., ERIC: the groups are different from each other, but we cannot say which of the identified groups are more similar than others. Even for PFGE, which can be used to assess relatedness, similarity may vary by choice or number of restriction enzymes used. Further, the published criteria for PFGE relatedness (based on number of matching bands) were intended solely for outbreak situations and when isolates were collected over a short time period (<1 year) and there is an implied epidemiologic linkage [15].

Describe distribution of bacterial types and identify the determinants of that distribution

Advances in molecular genetics have facilitated the description of the genetic diversity of bacterial populations. Molecular genetic techniques have been used to distinguish if there have been independent spontaneous mutations leading to antibiotic resistance or if resistance was transmitted between strains via a mobile genetic element. In other applications molecular genetic techniques have determined the flow of infection from one group to another. These descriptive molecular epidemiologic studies often use strains collected from disparate areas and the epidemiologic and clinical information is minimal or non-contributory. In this case the chosen bacterial typing technique must be interpretable in terms of genetic distance (phylogeny) for the given time period and organism. Further, the technique should reflect whether the hypothesis is of clonal spread of a strain or of a mobile genetic element, (e.g., plasmid).

Some typing techniques are based on conserved genes within the bacterial genome, e.g., genes associated with metabolism or other 'housekeeping' functions, and others on more variable genes, e.g., genes associated with virulence. On average, when bacterial strains are compared using a genetic typing technique, there are fewer genetic differences between bacterial strains in the conserved genes than variable genes. Thus, typing techniques based on differences in conserved genes, such as MLST, will place strains into fewer, larger, groups, than typing techniques based on more variable genes, such as PFGE. Put another way, PFGE is generally more discriminatory than MLST.

For bacterial characteristics that are dependent both on the conserved and variable portions of the genome, such as virulence, the use of multiple typing techniques may be helpful, see, for example, [16]. Selection of the appropriate typing technique and a valid interpretation of the results for studies of distribution of bacterial types and the determinants of that distribution is easiest when at least some preliminary data are available. For example, knowledge of the rarity of the observed groups in the community, propensity of the species to acquire insertion elements or phage, the timing of strain collection and the evolutionary clock of the organism, that is, how quickly mutations occur or horizontal elements are acquired provides important information for both technique selection and interpretation of resulting findings.

The identification of pathogenic factors is an exercise in identifying what is different between strains causing and not causing disease. This identification proceeds in the manner of a case-control study with the bacterial agent as the unit of analysis [see, for example, [17]]. Standard epidemiologic study design issues apply: the study population must include both disease-causing and commensal isolates. Most disease-causing strains will predominate in a culture; non-pathogenic, or commensal organisms are often comprised of a mixture of strains of the same species. The investigator must select isolates for study accordingly. For example, E. coli is a common bowel inhabitant and is also the most common cause of urinary tract infection. Typically an individual has several E. coli strains in the bowel flora but urinary tract infection among outpatients is almost always caused by a single strain. The investigator must decide if the predominant isolate in the bowel flora is the one of interest or if several isolates should be selected for testing. If the objective were to link the bowel to the urinary tract flora, then choosing only the predominant bowel strain would not be sufficient. Identifying common elements generating pathogenicity may be the study objective: when the typing technique is unable to discriminate between pathogenic and diverse commensal isolates, epidemiologic and clinical information should be used to make that distinction, such as grouping together E. coli that cause urinary tract infection.

Pathogenicity determinants are often present on transferable genetic material, such as plasmids, pathogenicity islands, phages, etc. Transferable genetic material has a genetic history distinct from the rest of the host bacterial genome. In this case, phylogenetic analyses of these elements can provide useful information. For example, pathogenicity islands (PAIs) have been associated with a variety of conditions, including diarrhea and urinary tract infection [1820]; specific virulence factor genes found on the PAIs encode for proteins that contribute directly to disease.


The application and interpretation of bacterial typing tools in epidemiologic studies requires understanding of both the strengths and limitations of the chosen bacterial typing technique as well as the epidemiologic study design to answer the research question. Beyond standard reliability, validity and cost considerations, key characteristics of a typing technique are 1) the ability to discriminate between strains and 2) a biologic basis for grouping strains with apparently different types. The level of discrimination required and need to be able to group strains depends on the research question. Similar to the desirability of including a statistician in the design phase so that the study design will result in appropriate data for the desired analysis, integrating an expert in the different typing techniques during the design phase will improve how well the research protocol fits the question(s) of interest.


  1. 1.

    Tenover FC, Arbeit RD, Goering RV, the Molecular Typing Working Group of the Society for Healthcare Epidemiology of America: How to select and interpret molecular typing methods for epidemiological studies of bacterial infections: a review for healthcare epidemiologists. Infect Control Hosp Epidemiol 1997, 18:426–39.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Gurtler V, Mayall BC: Genomic approaches to typing, taxonomy and evolution of bacterial isolates. Int J Syst Evol Microbiol 2001,51(Pt 1):3–16.

    CAS  PubMed  Google Scholar 

  3. 3.

    van Belkum A: High-throughput epidemiologic typing in clinical microbiology. Clin Microbiol Infect 2003, 9:86–100.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Kapadia CR, Bhat P, Baker SJ, Mathan VI: A common-source epidemic of mixed bacterial diarrhea with secondary transmission. Am J Epidemiol 1984, 120:743–9.

    CAS  PubMed  Google Scholar 

  5. 5.

    Berkelman RL, Cohen ML, Yashuk J, Barrett T, Wells JG, Blake PA: Traveler's diarrhea at sea: two multi-pathogen outbreaks caused by food eaten on shore visits. Am J Pub Health 1983, 73:770–2.

    CAS  Article  Google Scholar 

  6. 6.

    Starko KM, Lippy EC, Dominguez LB, Haley CE, Fisher HJ: Campers' diarrhea outbreak traced to water-sewage link. Pub Health Rep 1986, 101:527–31.

    CAS  Google Scholar 

  7. 7.

    Rosenberg ML, Koplan JP, Wachsmuth IK, Wells JG, Gangarosa EJ, Guerrant RL, Sack DA: Epidemic diarrhea at Crater Lake from enterotoxigenic Escherichia coli . A large waterborne outbreak. Ann Intern Med 1977, 86:714–8.

    CAS  PubMed  Google Scholar 

  8. 8.

    Bean NH, Griffin PM, Goulding JS, Ivey CB: Foodborne Disease Outbreaks, 5-Year Summary, 1983–1987. MMWR 39(SS01):15–23. March 01, 1990

  9. 9.

    Centers for Disease Control and Prevention PulseNet Home Page [].

  10. 10.

    Bender JB, Hedberg CW, Besser JM, Boxrud DJ, MacDonald KL, Osterholm MT: Surveillance by molecular subtype for Escherichia coli O157:H7 infections in Minnesota by molecular subtyping. N Engl J Med 1997, 337:388–94.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Golub JE, Cronin WA, Obasanjo OO, Coggin W, Moore K, Pope DS, Thompson D, Sterling TR, Harrington S, Bishai WR, Chaisson RE: Transmission of Mycobacterium tuberculosis through casual contact with an infectious case. Arch Intern Med 2001, 161:2254–8.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Small PM, Hopewell PC, Singh SP, Paz A, Parsonnet J, Ruston DC, Schecter GF, Daley CL, Schoolnik GK: The epidemiology of tuberculosis in San Francisco. A population-based study using conventional and molecular methods. N Engl J Med 1994, 330:1703–9.

    CAS  Article  PubMed  Google Scholar 

  13. 13.

    van Rie A, Warren R, Richardson M, Victor TC, Gie RP, Enarson DA, Beyers N, van Helden PD: Exogenous reinfection as a cause of recurrent tuberculosis after curative treatment. N Engl J Med 1999, 341:1174–9.

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Gilsdorf JR, Marrs CF, Foxman B:H. influenzae virulence factors: epidemiology and diversity. Infect Immun 2004, 72:2457–61.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Tenover FC, Arbeit RD, Goering RV, Mickelsen PA, Murray BE, Persing DH, Swaminathan B: Interpreting chromosomal DNA restriction patterns produced by pulsed-field gel electrophoresis: criteria for bacterial strain typing. J Clin Microbiol 1995, 33:2233–39.

    CAS  PubMed  Google Scholar 

  16. 16.

    Beres SB, Sylva GL, Sturdevant DE, Granville CN, Liu M, Ricklefs SM, Whitney AR, Parkins LD, Hoe NP, Adams GJ, Low DE, DeLeo FR, McGeer A, Musser JM: Genome-wide molecular dissection of serotype M3 group A Streptococcus strains causing two epidemics of invasive infections. Proc Natl Acad Sci USA 101:11833–8. 2004 Aug 10

  17. 17.

    Zhang L, Foxman B, Manning SD, Tallman P, Marrs CF: Molecular epidemiologic approaches to UTI gene discovery in uropathogenic Escherichia coli . Infect Immun 2000, 68:2009–15.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    McDaniel TK, Jarvis KG, Donnenberg MS, Kaper JB: A genetic locus of enterocyte effacement conserved among diverse enterobacterial pathogens. Proc Natl Acad Sci USA 1995, 92:1664–68.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Lee CA: Pathogenicity islands and the evolution of bacterial pathogens. Infect Agen Dis 1996, 5:1–7.

    CAS  Google Scholar 

  20. 20.

    Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H: Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol 1997, 23:1089–97.

    CAS  Article  PubMed  Google Scholar 

Download references


The authors thank the members of the Center for Molecular and Clinical Epidemiology of Infectious Diseases faculty discussion group for their insights into this topic. This work was supported by RO1 DK35368 (BF), R21 AI44868 (BF) and R01 DK 55496 (CFM).

Author information



Corresponding author

Correspondence to Betsy Foxman.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

BF took the lead on drafting the manuscript, LZ took the lead on Table 1, and JSK outlined Table 2. All authors contributed to discussions leading to the manuscript, critiqued multiple drafts and approved the final manuscript.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Foxman, B., Zhang, L., Koopman, J.S. et al. Choosing an appropriate bacterial typing technique for epidemiologic studies. Epidemiol Perspect Innov 2, 10 (2005).

Download citation

  • molecular epidemiology
  • methods
  • bacteria
  • typing