Skip to main content

Table 1 Description of Scenarios Used in Two Sets of Simulation Studies.

From: The use of complete-case and multiple imputation-based analyses in molecular epidemiology studies that assess interaction effects

Table

Set

Scenario

Median % Missing X1

Nature of Missing

Auxiliary Relationship

Type of Missing Variable

3a. Impact of Auxiliary Relationship Under Condition 1

1

A

22%

Condition 1 a

None 3

Binary

 

1

B

22%

Condition 1 a

Moderate 4

Binary

 

1

C

22%

Condition 1 a

Strong 5

Binary

3b. Impact of Auxiliary Relationship Under Condition 2

1

D

20%

Condition 2 b

None 3

Continuous

 

1

E

20%

Condition 2 b

Moderate 6

Continuous

 

1

F

20%

Condition 2 b

Strong 2

Continuous

3c. Impact of Auxiliary Relationship Under Condition 3

1

G

20%

Condition 3 c

None 3

Continuous

 

1

H

20%

Condition 3 c

Moderate 6

Continuous

 

1

I

20%

Condition 3 c

Strong 2

Continuous

4a. Impact of Conditions Under Set of Auxiliary Variables with Varying Strength When Missing Genotype Data

2

J

21%

Condition 1 d

Realistic 7

Binary

 

2

K

24%

Condition 3 e

Realistic 7

Binary

4b. Impact of Conditions Under Set of Auxiliary Variables with Varying Strength When Missing Exposure Data

2

L

21%

Condition 1 d

Realistic 7

Binary

 

2

M

21%

Condition 2 f

Realistic 8

Ordinal

  1. aCondition 1: X1 is 12.2 times more likely to be missing if X1 = 1
  2. bCondition 2: Extreme values of X1 are more likely to be missing (probability of missing is a quadratic function of X1 or the log odds of missing X1 = γ0+γ1 X1 + γ2 X12, where γ1 = -1 and γ2 = 2.)
  3. cCondition 3: A 1-unit increase in X1 corresponds to a 7.4 times decrease in the probability of missing for controls, but a 7.4 times increase for cases
  4. dCondition 1: Those with fast metabolizing genotype are 12.2 times more likely to be missing data on genotype. Missingness is also related to other observed covariates (mammogram, education, race, breastfeeding oral contraceptive use, hormone therapy use, and smoking status) as informed by the real data set for all subjects.
  5. eCondition 3: Exposed controls with fast genotype are 7.4 times more likely to be missing genotype than those without, while exposed cases with fast genotype are 7.4 times less likely to be missing genotype. Unexposed subjects with fast genotype are 2.7 times more likely to be missing genotype. Missingness is also related to other observed covariates (mammogram, education, race, breastfeeding oral contraceptive use, hormone therapy use, and smoking status) as informed by the real data set for all subjects.
  6. fCondition 2:Extreme values of exposure are more likely to be missing (probability of missing is a quadratic function of X1 or the log odds of missing X1 = γ0+γ1 X1 + γ2 X12, where γ1 = -3 and γ2 = 1.) Missingness is also related to other observed covariates (mammogram, education, race, breastfeeding oral contraceptive use, hormone therapy use, and smoking status) as informed by the real data set for all subjects.
  7. 1Strong: Those with X1 = 1 have Z values (SD = 1), that are 3 units higher on average than those with X1 = 0
  8. 2Strong: Average correlation between X1 and Z is 0.97
  9. 3None: X1 and Z are independent variables
  10. 4Moderate: Those with X1 = 1 have Z values (SD = 1) that are 1 unit higher on average than those with X1 = 0
  11. 5Strong: Those with X1 = 1 have Z values (SD = 1) that are 4 units higher on average than those with X1 = 0
  12. 6Moderate: Average correlation between X1 and Z is 0.57
  13. 7Realistic: Auxiliary variables are informed by real data set and include all variables used to generate missing data mechanism of genotype (case-control status, exposure, mammogram, education, race, breastfeeding behavior, oral contraceptive use, hormone therapy use, and smoking status) as well as those that relate to genotype (exposure, race, breastfeeding behavior), a subset of the former.
  14. 8Realistic: Auxiliary variables are informed by real data set and include all variables used to generate missing data mechanism of exposure (case-control status, genotype, mammogram, education, race, breastfeeding behavior, oral contraceptive use, hormone therapy use, and smoking status) as well as those that relate to exposure (genotype, education, race, oral contraceptive use, hormone therapy use, and smoking status), a subset of the former.