5 • To understand what is meant by observational design. • To understand the difference between retrospective and prospective research designs. • To understand the various types of observational designs including cross-sectional, retrospective cohort, and case-control designs. • To understand threats to the validity of observational studies including confounding and effect modifiers. • To understand strategies for reporting observational designs. This chapter focuses on observational or “nonexperimental” designs. Nonexperimental methods are approaches whereby the investigator does not alter or manipulate the circumstances of the participants in the research study. Nonexperimental designs can be comparative, such as case-control studies, or noncomparative, such as case series. However, distinct from experimental studies (randomized controlled trials [RCTs]), in observational designs randomization of subjects to treatment groups, interventions, or exposures does not occur. For example, an investigator may be interested in comparing two treatments or procedures and retrospectively reviews the outcomes of patients who received these interventions. While this study is comparative, the researcher has no impact on how decisions were made in terms of which patients received the treatments or procedures being compared, nor does the investigator have control over how they were administered. Although nonexperimental designs have low internal validity and are generally not considered the optimal approach to compare therapies or procedures, they are frequently used. One of the reasons for this is that in some circumstances observational studies are the only practical strategy. This may occur when an RCT is impractical, for example, when the condition under investigation is rare. Observational approaches may also occur, earlier in the experience with a new treatment, where the ethical imperative of equipoise (evidence for similar effectiveness of both treatments to be compared) is not yet met for an RCT to be undertaken. Results from observational designs may in fact form the justification for RCTs. In addition, observational studies are less expensive than experimental studies and are therefore the most commonly utilized research design in health care. Observational studies may also be more reflective of “real life” as they often have less restrictive inclusion criteria than experimental designs and often include the entire spectrum of patients with a particular illness, rather that selecting patients with specific characteristics which limits the generalizability of the results.1 The distinction between retrospective versus prospective studies is often assumed to be inherent to a particular research design. While this is partly true, in that some types of studies can only be prospective, for example, a clinical trial, many study designs can be done in both a retrospective or prospective manner, for example, a cohort study. In contrast to a retrospective study, a prospective study is one that is planned prior to the treatment (and outcome) occurring, allowing the investigators to have greater control over the treatments or exposures evaluated in the study as well as what data will be collected. Since the specific variables to be collected can be determined in advance, the investigator can ensure that they have the necessary data in order to best evaluate the research question that they are investigating. On the other hand, since none of the treatments (or outcomes) have occurred or data are available at the outset of the study, the researcher may need to wait a number of years before they are able to address the question that they wish to answer. Therefore, the completeness of the data afforded by a prospective study needs to be balanced against the additional time and cost of this approach. There are different types of observational studies whose results yield different levels of evidence: case report and case series, cross-sectional, cohort, and case-control studies. Case reports and case series offer lower evidence than cross-sectional, cohort, and case-control studies. Prospective studies generally afford a higher level of evidence than retrospective designs. Evidence from nonexperimental designs is on a lower level than experimental designs, such as RCTs. Whenever possible, the strongest design should be used. Although studies evaluating diagnostic tests are an important type of nonexperimental design in radiology, this chapter focuses on other types of nonexperimental designs. Diagnostic tests are covered in Chapter 3. For each of the main types of nonexperimental designs, we briefly describe the key features of the design and provide an example of that type of design. While each of the designs mentioned is considered separately, studies may be a blend of different types of nonexperimental methods. The simplest observational study designs are case reports and case series. The key role of these studies is to report findings of a rare disease or a novel observation or the presentation of a common condition in one or more patients. The results of the studies generate hypotheses that require subsequent formal testing and are therefore considered to be “hypothesis-generating studies.” These studies are then used to design subsequent studies to confirm the observation seen in the case series. Without subsequent confirmatory studies, observations made in case series cannot be generalized to a larger population of patients. In radiology, case reports and case series can be educational tools displayed as pictorial reviews of the imaging characteristics of diseases or group of diseases, such as those published in Radiographics which represent an example of a case series. Cross-sectional designs relate to observations of variables in a population where all measurements are obtained on a single point in time. Therefore, no follow-up period is expected. Subjects are not selected based on exposure or outcome since both occur at the same time point (Fig. 5.1). Studies that use this design are relatively inexpensive and easy to carry out since there is no waiting for outcomes to occur. They can only establish an association, not a cause–effect relationship, and are therefore not practical for assessment of rare diseases and cannot estimate disease incidence.2,3,4 Cross-sectional studies may generate hypotheses that require further investigation. Studies on diagnostic tests typically follow a cross-sectional design as discussed in Chapter 3. An example of a cross-sectional study is provided as follows. A group of investigators wish to answer the question, “Is the frequency of ionizing (CT, fluoroscopy, and X-ray) and nonionizing (ultrasound and MRI) diagnostic examinations performed in children age 10–18 similar in India when compared with Canada?” They designed a cross-sectional study, in which they enrolled 20,000 Canadian and 20,000 Indian 10- to 18-year-old randomly selected children from national medical insurance databases (which include all imaging studies performed) and obtained data on all imaging examinations performed. Although the imaging studies were not performed at a single time point (e.g., same day) they were done within a limited time frame (12 months prior to survey). This was done as the investigators felt that a single day of assessment would not provide a meaningful comparison, and a year was decided to be the optimal time frame. However, a longer time frame was not chosen and studies were all treated the same regardless of when in the year they occurred as the investigators were not interested in comparing predictors and outcomes over time such as it would be done in cohort and/or case-control studies. The investigators then compared the frequencies of the various types of procedures, as well as the ionizing vs. nonionizing procedures between the two countries. In cohort designs, subjects are selected based on an exposure of interest (e.g., a type of intervention procedure, a disease, a drug, an imaging technique). Subjects are then assigned to a group or cohort (exposed vs. nonexposed) based on this. Both exposed and nonexposed cohorts are then followed backward (if retrospective) or forward in time (if prospective) to verify how many patients in each group develop/developed the outcome of interest (Fig. 5.2). The exposure is an independent variable and the outcome, a dependent variable.3 This design is suitable to explore causality and can be used to evaluate incidence, cause, and prognosis. For example, if investigators are interested in examining whether a CT scan during childhood leads to future impairment in cognitive function, they could perform a cohort study. In this case, the investigators decide to perform a retrospective study whereby they enroll 4,000 15- to 18-year-old adolescents attending grades 10 to 12, collect data on the number of head CT scans that they had undergone in their childhood (0–7 years), and obtain tests of cognitive function on these children. The investigators had hypothesized that the radiation from previous CT scans could be associated with future reduced cognitive function and this hypothesis was tested using the data available. The steps performed by the investigators as part of this study were as follows: 1. Identify a suitable cohort: Ask questions to the population of interest on previous medical history which can be systematically collected and recorded. 2. Collect data about predictor variables: Ask questions about the cohort’s previous medical history (prior CT scans) and other potential risk factors. 3. Collect data about subsequent outcomes that occurred at a later time: Retrospectively explore the relationship between cognitive performance (outcomes) and history of head of CT scan in childhood (exposure). This same question could be addressed with a prospective cohort study, although the steps required for a prospective study would be slightly different, as follows: 1. Assemble a cohort: Enroll children of a certain age range (e.g., 0–7 years) who would have the potential to undergo CT heads (for head injury, seizures, etc.) in the future during the conduct of the study. 2. Measure exposure variable(s) and potential confounders: Follow-up on the number of CT scans and other potential risk factors that the cohort are exposed to during the follow-up period. This could be done by review of the medical records, or ongoing contact with the subjects (e.g., periodic telephone calls or surveys). 3. Follow-up to measure outcomes: Follow-up the patient cohort until the age of 18 years and collect data on cognitive performance at age 18. As you can see, while the retrospective and prospective cohort studies address the same question, the prospective study would take many years to complete, whereas the retrospective study could be fairly quickly done. However, recall of number of head CT scans done in childhood may be difficult for parents and especially for adolescents themselves. Therefore, this may limit the accuracy of the data collected as part of the retrospective study. The prospective study may allow for more complete data, although one of the major challenges of a long-term study is loss to follow-up and attrition, whereby subjects who enroll in these studies may move or decide that they no longer want to be part of the study over time. Case-control studies are a design that aims to compare groups with and without an outcome of interest and to explore the role of different exposures and the potential association between these variables and outcome of interest (Fig. 5.3).3 Since cases (experienced outcome) and controls (did not experience outcome) are defined by the occurrence of the outcome, by definition case-control studies are all retrospective as the outcome has to have occurred prior to the study beginning and exposures all must be in the past.5 Case-control studies are a design, therefore, that can be used to identify associations between exposures and outcome in rare diseases, or in conditions with rare outcomes or when there is a long time lag between the exposure and the outcome. Case-control studies are relatively inexpensive, but are vulnerable to unmeasured confounding variables. Confounding will be discussed later in this chapter and in Chapter 6. This can be addressed, to some extent, by matching cases and controls on particular characteristics. Matching refers to enrolling controls in the study who have similar characteristics as cases, such as age or sex. However, variables used to match cases and controls cannot be included in analyses as potential predictors as the investigator has determined the distribution of these variables within the groups. Similarly, since the number of subjects who experience an outcome is determined by the researcher (i.e., number of cases enrolled relative to number of controls enrolled), case-control studies are not suitable for estimation of incidence of disease. Case-control studies are, however, useful for exploring associations between exposures and outcomes and for hypothesis generation for further studies to explore the relationships seen in case-control studies. As an example, an investigator is interested in exploring the relationship between cognitive impairment in Canadian children and exposure to head CT. These investigators have identified 332 subjects, between 15 and 18 years old, with cognitive impairment (intelligence quotient [IQ] < 85) who will be designated as cases. The investigator has identified 3,668 age- and sex-matched control subjects from an existing database without cognitive impairment (IQ > 100). The investigators collect data on both cases and controls on a number of variables that may be associated with cognitive deficits in adolescence, including whether the subject had undergone head CT scans between ages 0 and 7 years. The steps performed by the investigator as part of this study were as follows: 1. Select cases: Identify 332 adolescents with cognitive deficit from institutional database. 2. Select controls: Identify 3,668 adolescents without cognitive deficit matched for age and sex. 3. Measure the predictor variable: Look back to determine how many individuals (cases and controls) were exposed to the potential predictor (head CT) in childhood. Case-control studies can be performed on data collected in a cohort study, in what is referred to as a “nested case-control study” because the case-control study is nested within the cohort study.6 In a cohort study both exposed and nonexposed subjects are included and data collected on them as part of the follow-up for the cohort study. However, within the cohort relatively few will experience the disease (i.e., outcome). Using the data of those who did and did not experience the outcome, the investigator can define cases and controls and perform a nested case-control study. One of the advantages of this approach is that since cases and controls are part of a prospective cohort study, the data available may be more complete than had it just been collected as part of a retrospective case-control study rather than a nested case-control study. Nested case-control studies are potentially powerful designs and important findings can be derived from them. For example, the relationship between mammographic density and breast cancer risk was investigated as part of a nested case-control study using data from the 14,291 women in the New York University Women’s Health Study, a cohort design. Investigators estimated the relations of mammographic patterns/densities and breast size to breast cancer risk using the archived mammograms of 197 women who developed breast cancer (cases) and of 521 age- and sex-matched controls from same cohort. Using this design, a significantly increased risk for specific mammographic patterns was found.7 Bias is an error in research methodology that may cause the results of the study to deviate from the “truth.” Although numerous types of bias have been reported in the literature,8 the main sources of bias in radiology studies are the patient, the intervention/exposure, the data gathering, and the interpretation/reporting approaches. Bias can occur at any point during the study including recruitment, intervention, data collection, analysis, and publication (Table 5.1).9,10,11,12,13 It is important to understand the potential sources of bias when designing a study, so that their impact can be minimized. While there are some advanced statistical techniques that can be used to ameliorate the impact of bias introduced by errors in study design, it is best to avoid introducing the bias as it may not be possible to minimize the impact of bias on the results once introduced. Confounding is a particular type of bias that results from a factor being introduced that is associated with the exposure and the outcome, but is not part of the causal pathway between the exposure of interest and the outcome, yet impacts the outcome (Fig. 5.4).14 The concept of confounding is discussed in Chapter 6. As discussed in that chapter, confounding is a particular concern in nonexperimental designs, whereby factors other than random chance determines which treatment a subject receives and it is these factors (confounders) that may impact outcome, rather than differences in the treatments received. This type of confounding is referred to as confounding by indication, and is a key reason why randomization, by avoiding confounding by indication, allows for unbiased assessments of treatment response. Although randomization is one method to control for confounding, there are other approaches that can be taken such as specification (restricting enrollment of subjects with only certain characteristics), matching of subjects on potential confounders as discussed in case control studies, and various analytic techniques.
Observational Designs
Learning Objectives
Introduction
Nonexperimental Designs
Retrospective versus Prospective Studies
Types of Nonexperimental Designs
Case Reports and Case Series
Cross-Sectional Designs
Cohort Designs
Case-Control Designs
Bias and Confounding
Confounding
Type | Systematic error/bias in |
Focus: Patient/exposure-intervention | |
• Selection bias | Including study subjects (sicker, milder cases, health care workers, volunteers) |
• Sampling bias, referral bias | Some members of society are more likely to be included, referred to the center than others (profession, socioeconomic status, access) |
• Image-based selection bias | Study enrollment mandated a specific image, patients are included based on the availability of such imaging study, study population is selected from a true target population |
• Study examination bias | Study enrollment limited to technically excellent studies, resulting in overestimation of sensitivity and specificity (removal of false-negatives and false-positives, respectively) |
• Disease spectrum bias | Within patient groups, one end of the spectrum gets investigated only |
• Self-selection bias, “healthy volunteer bias” | Study enrollment on the basis of self-selection, limits generalizability |
• Channeling bias | Patient prognostic factors or degree of illness leads to inclusion into one study over another |
• Participation bias | Unequal response to additional factors required to actually join a study (distance to center, financial burden, other personal constraints) |
• Transfer bias/loss to follow-up bias | Unequal loss to follow-up between groups gets treated similarly in the analysis |
• Language bias | Inappropriate definition of the eligible population |
• Confounding by indication | Association with the exposure without being the consequence of the exposure and with the outcome independently of the exposure |
Focus: Data generation/gathering | |
• Recall bias | Outcomes of treatment rely on subjects’ recollections of events prior to or during the treatment process, information of exposure is systematically classified differently between groups |
• Information bias | Wrong, incomplete, or inexact recording of variables |
• Interviewer bias | Interviewer makes systematic distinctions on how information is solicited, recorded, or interpreted between patient groups, typically when the interviewer is also an investigator |
• Misclassification bias | Exposure itself is poorly defined or if proxies of exposure are utilized, leading to wrong assignment to grouping |
• Verification bias, workup bias | Systematic difference in the manner in which the disease status is defined between groups causes: dependency of tests, unblinding to results of reference standard |
• Performance bias | Experience of one or more people delivering exposure measures systematically different from experience of remainder operators or readers |
• Chronology bias | Historic controls are used as a comparison group |
• Follow-up bias/medical surveillance bias | Groups have systematically different follow-up regimens based on test results |
Missing data are not randomly distributed, disease-free cases were less thoroughly investigated, motivation to respond to all questions may differ between groups | |
• Contamination bias | Execution of an intervention, by using one agent as part of a diagnostic test (e.g., contrast material), may impact the result of another diagnostic test |
Focus: Interpretation/reporting | |
• Reviewer bias | Inappropriate blinding of person reviewing the results, reviewer of a new test may be aware of the results of the reference test |
• Diagnostic-review bias | Reference tests are not definite, reviewer utilizes the new diagnostic test result to modify the interpretation of the reference test |
• Test-review bias | Diagnosis is known to reviewer at time of review, which may influence the test interpretation |
• Incorporation bias | Reviewer incorporates the new diagnostic test result to make a diagnosis |
• Imperfect standard bias | Reference tests are imperfect, therefore some study subects will have a second, surrogate reference test based on their characteristics (socioeconomic status, age) |
• Reader-order bias | Reviewer retained knowledge of the first test and incorporated it in the interpretation of the second test |
• Measurement bias | Discrepancy in the measurement approach between two groups of tests |
• Cluster bias, repeat measurement bias | Multiple measurements are taken from the same subject and are considered independent |
• Context bias | Altered disease prevalence influences test characteristics, important for interpretation of equivocal test results |
• Publication/citation bias | Researchers and sponsors unwilling to publish unfavorable results |
The issue of confounding by indication is of concern in the previous example provided for the case-control study. In this study investigators enrolled adolescents with cognitive impairment (cases) as well as controls without cognitive impairment and retrospectively explored the association between cognitive impairment in adolescence and performance of head CT scans in childhood. In this hypothetical example, investigators determined that those with cognitive impairment were more likely to have had a head CT as a child. It is important to recognize that since this is a case-control study, this finding is merely an association and one should not conclude that the performance of head CT scans leads to cognitive impairment. In fact, it is likely that this result can be explained by confounding by indication, whereby adolescents with cognitive problems were more likely to have undergone a head CT scan and that these same adolescents are more likely to have a lower IQ because of their comorbid neurological concerns. Neurological disease is a confounder in this study as it is associated with both the outcome (lower IQ) as well as the exposure (head CT).