7 1973: “It is surely a great criticism of our profession that we have not organized a critical summary, by specialty or subspecialty, adapted periodically, of all relevant randomized controlled trials.” —Professor Archibald Cochrane • To provide basic concepts on definitions of unstructured reviews, systematic reviews, meta-analyses, pooled analyses, evidence-based imaging, and guidelines. • To outline and describe the steps for conducting a systematic review and meta-analysis and for developing practical clinical guidelines based on evidence derived from available systematic reviews/meta-analyses. • To discuss the available tools for assessing the quality of reporting and methodology of papers and systematic reviews in diagnostic imaging. • To introduce concepts on implementation and knowledge translation, T1 and T2 translational research, key guideline questions for knowledge translation activities, and the role of this new science on accelerating the dissemination of knowledge in the radiological sciences. Evidence-based medicine is “the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence-based medicine means integrating individual clinical expertise with the best available external evidence from systematic research.”1,2 Evidence-based imaging (EBI), in contrast to the traditional paradigm, acknowledges that intuition, unsystematic clinical experience, and pathophysiologic rationale are insufficient grounds for clinical decision making, and stresses the examination of evidence from clinical research in a critical manner. EBI suggests that a formal set of rules must complement medical training and common sense for clinicians to effectively interpret the results of clinical research. Finally, EBI places a lower value on authority than the traditional paradigm of medical practice.3,4 The evidence-based process involves a series of steps: (1) formulation of the clinical question, (2) identification of the medical literature, (3) critical appraisal of the literature, (4) summary or synthesis of the evidence, and (5) application of the evidence to derive an appropriate course of action.5 An evidence-based practitioner must be able to understand the patients’ circumstances or predicament (including issues such as their social supports and financial resources); identify knowledge gaps, and frame questions to fill those gaps; conduct an efficient literature search; critically appraise the research evidence; and apply that evidence to patient care.6,7 The overall number of structured review articles in medicine has increased more than 40 times in the last two decades, according to a search of the publication terms “meta-analysis” (MeSH or tw) or “systematic review” (tw) in MEDLINE, derived from Ovid: from 3,255 articles published prior to 1994 to 22,302 articles up to 2004 to 122,232 articles up to 2015 (searched in late December 2015). However, of these review articles catalogued up to 2015, only approximately 4,470 (3.7%) have evaluated radiology-related topics, including conventional radiography, ultrasonography, computed tomography, magnetic resonance imaging, and radionuclide imaging. The proportion of systematic review/meta-analysis articles in radiology compared to the overall number in the entire field of medicine has slowly increased over the last two decades (1.6% in 1994 and 2.6% in 2004). However, these numbers still indicate a paucity of articles that summarize the best estimates of procedures’ effects, imaging as outcome measures in clinical effectiveness studies, diagnostic tests, or economic evaluations in radiology.8 This could be secondary to an insufficient body of primary evidence that can be reviewed for some topics in radiology, and/or relate to the fact that some reviews that contain a substantial number of low-quality primary studies may provide contradictory evidence on the effectiveness of interventions or accuracy of diagnostic tests. Reviews are essential tools for researchers and clinicians who want to keep up with the evidence that has been accumulated in their fields. They can be unstructured (narrative reviews or commentary) or structured (systematic reviews, meta-analysis, pooled analysis). The latter type of reviews enables assessment of existing evidence on a topic of interest and concludes to support a practice, refute a practice, or identify areas for which additional studies are needed. The most generic types of review articles are narrative overviews, followed by systematic reviews, meta-analyses, and pooled analyses. A narrative overview is a potentially biased nonstructured literature review on a specific topic that raises a broad research question. It provides a qualitative summary of the literature in the field.9 The Radiology series “State of the Art” and “How I Do It” are typical examples of narrative reviews where experts are invited to write an article for the journal.2 A systematic review is a review of a clearly formulated question that uses systematic and explicit methods to identify, select, critically appraise relevant research, collect, and analyze data from studies included in the review.9 They are classified as “secondary” literature and should be distinguished from original published journal articles, which are classified as the “primary” literature.10,11 Table 7.1 compares the characteristics of narrative and systematic reviews. Systematic reviews aim at estimating summary effects (synthetic goal) from a qualitative perspective and may or may not include a meta-analysis. The term meta-analysis is used when an attempt is made to estimate summary effects (synthetic goal) and differences (analytic goal) from a quantitative perspective by applying statistical methods.12 A pooled analysis is a meta-analysis based on individual-level patient or study data.13 Although pooling results of multiple studies reduces random error and increases the applicability of results across a broad range of patients, it risks violating the initial assumption of the analysis, which is to provide a nonbiased single best estimate of a patient’s prognosis, the effect of a treatment or diagnostic procedure, or the accuracy of a diagnostic test. The solution to this dilemma is to evaluate the extent to which results differ from study to study; namely, the heterogeneity of study results14 which is further discussed in the Chapter 16. The art of conducting and evaluating a systematic review or meta-analysis requires previous knowledge on evidence-based concepts. Unsystematic observations of clinicians constitute one source of evidence, and physiologic experiments another. Unsystematic clinical observations are limited by small sample size and by limitations in human processes of making inferences.15 Predictions about intervention effects on clinically important outcomes from physiologic experiments are usually right, but can be wrong. Observational studies are inevitably limited by the possibility that apparent differences in treatment effect are really due to differences in patients’ prognosis in the treatment and control groups.3 Given the limitations of unsystematic clinical observations and physiologic rationale, EBI is desirable.
Systematic Reviews, Evidence-Based Imaging, and Knowledge Translation
(1909–1988)
Learning Objectives
Introduction
Definitions and Types of Reviews
Relevance
| Narrative review | Systematic review |
Overarching research question(s) scope | Broad | Narrow (focused) |
Authorship | Typically one or a small number of authors from a given discipline | Typically multiple authors from different disciplines that relate to the review scope |
Source | Data provided by the authors, therefore often biased | Data obtained systematically to identify all relevant literature, therefore less prone to bias |
Data appraisal | No objective assessment of the quality of the data reviewed | Objective assessment of the quality of the primary studies |
Data synthesis | No statistical analysis of primary studies | If possible statistical analysis by meta-analysis; if heterogeneity of primary studies then meta-analysis is not possible and reasons for data heterogeneity should be explained |
Data reliability | Replication of results not expected | Expected replication of results by using the presented methods in primary studies |
Inferences | Conclusions reflect a small group of experts’ perspectives, therefore may be biased | Conclusions of overarching questions based on objective data analysis |
The overall goal of a systematic review or meta-analysis is to combine results of previous studies to arrive at summary conclusions about a body of research.16 A properly conducted systematic review/meta-analysis can summarize large amounts of data. For health care providers, consumers, and policy makers who are interested in the bottom line of evidence, systematic reviews can help outline conflicting results of research. In radiology, systematic reviews or meta-analyses can be used to provide a summary estimate of effect size of a treatment that used imaging data to assess outcomes in observational or randomized controlled clinical trials, to estimate the clinical effectiveness of an imaging-guided therapy procedure, to synthesize results of economic evaluations that have used imaging data, or to evaluate the summary diagnostic accuracy of an imaging test. With regard to the latter purpose, clinicians, policy makers, and patients would like to know if the application of the test improves the outcome, what test to use or to recommend in practice guidelines, and how to interpret test results.17 Well-designed diagnostic accuracy studies can help in making these decisions, provided that they fully report their participants, tests, methods, and results.
Steps for Conducting a Systematic Review
Steps involved in a systematic review are similar to the phases of any other research undertaking formulation of the problem to be addressed, collection, critical appraisal (quality assessment), analysis of data from observational or randomized studies, and interpretation of results (assessment of heterogeneity, sensitivity, and subgroup analyses).
Protocol Phase
The initial step in the protocol phase is to define the main outcome of interest, such as, clinical effectiveness of diagnostic procedures or drugs in studies that use imaging as outcome measures, performance of diagnostic tests, or cost–benefit, cost-effectiveness, or cost-utility of treatment strategies or health care programs that involve diagnostic imaging tools. Before the start of the study, a detailed review protocol should be established that clearly states the question to be addressed, subgroups of interest, methods, and criteria to be used to identify and select relevant studies, and extract and analyze information. However, in some circumstances, unexpected or undesired results can be excluded by post hoc changes to the inclusion criteria, which should be documented in the review. Eligibility criteria for the review should define study participants, interventions, outcomes, study designs, and method quality of studies to be included in the review. An example on how authors can establish a systematic review/meta-analysis protocol to evaluate the diagnostic performance of ultrasonography (US) and computed tomography (CT) for the diagnosis of appendicitis in pediatric and adult populations18 is available in Table 7.2.
Review Phase
Identification of Studies
The search strategy for identifying relevant studies should be clearly defined considering multiple database sources: MEDLINE, EMBASE, EBM reviews, Cochrane Controlled Clinical Trials Register (CCTR), and bibliographic databases specific to such disciplines as nursing (CINAHL), behavioral sciences (PsycINFO), alternative medicine (MANTIS, AMED), physiotherapy (PeDRO), and oncology (CANCERLIT); checking of reference lists and personal files; hand searching of key journals; and personal communication with experts in the field. The search process should include MeSH terms pertaining to the population, intervention, comparison groups, and outcomes of interest as described in Chapter 10.
A comprehensive search should consist of the following steps:
1. Meta-analysis (pt)
2. Meta-anal: (textword)
3. Metanal: (textword)
4. Quantitative: review: OR quantitative: overview: (textword)
5. Systematic: review: OR systematic: overview: (textword)
6. Methodologic: review: OR methodologic: overview: (textword)
7. Review (pt) AND Medline (text word) [and other databases]
1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7
It is highly recommended that authors prepare a flow diagram revealing the search and selection process used for the identification and quality assessment of articles (Fig. 7.1). Moreover, when feasible, investigators should use a “topic-only” search strategy and avoid restrictions with regard to articles written in certain languages as an attempt to prevent language bias.19,20
Selection of Studies
Decisions regarding the inclusion or exclusion of individual studies often involve some degree of subjectivity. It therefore is useful to have at least two readers checking eligibility of candidate studies, with disagreements resolved by consensus or by a third reviewer.
It is recommendable to keep a log of excluded studies with reasons for exclusions, which should be available on request from the authors of the review.
Assessment of “Risk of Bias” for Study Quality
Independent assessment of method quality of individual studies by more than one reader is recommended. Blinding of readers to investigators’ names and institutions, journal names, and acknowledgments is controversial because it is time consuming and potential benefits may not always justify the additional costs.21 The quality of primary studies can be measured with scales or checklists. Use of scales involves assigning each item a numerical score; the sum of scores of these items then determines the overall quality of the study.22 Use of checklists involves scoring items as “yes” or “no” and assigning one point for each “yes” item; the final score for a study corresponds to the sum of these “yes” items.23 Theoretical considerations24,25 suggest that scales generally not be used to assess the quality of trials in meta-analyses, and sample checklists should be preferable. A grading display of the strength of evidence level according to type of design of the studies is shown in Fig. 7.2. In general, the type of design has an effect on the overall quality of the primary study. Double-blind, randomized, controlled, clinical trials provide the strongest evidence for causality relationship. Conversely, indirect evidence shown in case reports, expert opinion, and consensus committees provides the weakest evidence.26 Some potential sources of bias in primary studies include selection bias (caused by incomplete randomization or allocation of patients in the alternative and standard care groups), performance bias (caused by differences in care provided to patients exposed or not to an intervention or diagnostic procedure), detection bias (caused by differences in outcome assessment between two groups of patients), and attrition bias (caused by differences in withdrawal or participation rates of patients in randomized controlled trials).27
Study Feature | Inclusion Criteria | Exclusion Criteria |
Participants | Segmentation of results according to age groups, with a maximum age of 20 years for children and young adults and a minimum age of 13 years for adults, and if this criterion was not fully met, the proportion of patients with outlying ages could not exceed 5% of the total sample size Inclusion of both female and male patients (i.e., ratio of one sex to the other, < 3:1) | Data for pregnant women |
Target disorder | Appendicitis Studies with a 15%–75% sample prevalence of appendicitis derivable from the reported results (i.e., true-positive plus false-negative results, divided by the total of true-positive, true-negative, false-positive, and false-negative results), as arbitrarily determined and checked by means of sensitivity analysis |
|
Research design of primary studies | Prospective or retrospective studies evaluating the performance of abdominal ultrasound and/or CT | Case reports, case series, reviews, pictorial essays, unpublished data, abstracts, and letters to editor |
| Availability of data for the absolute number of true-positive, true-negative, false-positive, and false-negative findings either reported, derivable from the results, or communicated by the authors in response to our request | Focus on topics other than diagnostic test assessment, such as management decision issues or cost-effectiveness analyses |
| No language restriction | When more than one study uses the same data or when the durations of studies overlap, the study with the larger sample size is selected to avoid duplication of data |
Prior tests | Exclude studies where patients had prior diagnosis of appendicitis (interval appendectomies) |
|
Ultrasound test methods | Criteria for positive and negative test results defined. Imaging criteria for positivity for appendicitis that included visualization of an inflamed appendix (diameter > 6 mm) noncompressible appendix at ultrasound, or, in the case of nonvisualization of the appendix, presence of inflammatory signs of appendicitis, such as an appendicolith, cecal thickening, arrowhead sign, or cecal bar (as seen on CT images) Experience of operators described | Performance of more than one ultrasound examination per patient |
Criteria for positive and negative test results defined. In studies evaluating the performance of CT scanning, a description of the technique used—namely, the use of oral, rectal, and intravenous contrast material with a limited or complete scan Experience of operators described | Performance of more than one CT examination per patient | |
Reference test | Surgical/anatomopathologic or follow-up results Criteria for positive and negative test results defined |
|
• Study question
• Study population
• Randomization
• Blinding
• Intervention
• Outcomes
• Statistical analysis
• Results
• Discussion
• Funding (if appropriate)
Data Synthesis of Studies of Clinical or Technology Implementation Effectiveness
For data synthesis on randomized controlled clinical trials as primary studies, checklists of risk of bias (The Cochrane Collaboration’s tool for assessing risk of bias)28 (Supplementary Table 2) can assess the quality of primary studies. After studies have been selected and critically appraised and data have been extracted, characteristics of included studies and individual results should be expressed in a standardized format to allow for comparison between studies. If the outcome is binary (e.g., disease vs. no disease; intervention vs. standard practice procedure), odds ratios, relative risks, or risk differences can be calculated. If the outcome is continuous (e.g., percentual enhancement of a tissue after contrast administration), mean difference, standardized mean difference, or correlation coefficients can be applied. Odds ratios have convenient mathematical properties because they do not have inherent range limitations associated with high baseline rates29 and are suitable for statistical manipulation as the antilog of coefficients. Details on this are available in Chapter 16. Nevertheless, relative risks usually are preferred over odds ratios because they are more intuitively understandable.30,31 Before pooling results of individual studies using an effect measure (e.g., odds ratio or relative risk), the investigator should evaluate for the presence of heterogeneity within and between studies.
Data Synthesis of Studies of Diagnostic Tests
Methodologic quality assessment of individual studies in systematic reviews is therefore necessary to identify potential sources of bias and to limit the effects of these biases on the estimates and the conclusions of the review. Methodologic quality of a study has been defined as “the extent to which all aspects of a study’s design and conduct can be shown to protect against systematic bias, non-systematic bias that may arise in poorly performed studies, and inferential error.”32
The Standards for Reporting of Diagnostic Accuracy (STARD) checklist was not developed as a tool to assess the quality of diagnostic studies.33 This 25-item checklist has been used to evaluate the quality of reporting of diagnostic studies by ensuring that all relevant information is present.
However, many items in the checklist are included in recently developed tools for quality assessment of diagnostic accuracy (the Quality Assessment of Diagnostic Accuracy Studies [QUADAS-2] tool34) and reliability (the Quality Appraisal of Reliability Studies [QAREL] tool,35 Supplementary Table 3). The QUADAS-2 tool is structured as a list of two domains (risk of bias and applicability) and 14 questions on diagnostic accuracy that should each be answered “yes,” “no,” or “unclear.” Under both domains items cover patient selection, index test, reference standard, verification and review bias, clinical review bias, incorporation bias, test execution, study withdrawals, and intermediate results. Additionally, the risk of bias domain also covers flow/timing. The QAREL tool includes 11 items that explore seven principles. Items cover the spectrum of subjects, spectrum of examiners, examiner blinding, order effects of examination, suitability of the time interval among repeated measurements, appropriate test application and interpretation, and appropriate statistical analysis.
The results of quality appraisal can be summarized to offer a general impression of the validity of the available evidence. Review authors should not use an overall quality score, as different shortcomings may generate different magnitudes of bias, even in opposing directions, making it very hard to attach sensible weights to each quality item. A way to summarize the quality assessment is shown in Fig. 7.3, where stacked bars are used for each QUADAS-2 item. Another way of presenting the quality assessment results is by tabulating the results of the individual QUADAS-2 items for each single study. The effects of the STARD guidelines for complete and transparent reporting are only gradually becoming visible in the literature.36,37