A Systematic Approach to Breast Imaging
Basic Principles: Detection Versus Diagnosis
To use breast imaging technologies appropriately, the difference between detection and diagnosis should be understood. Detection is the process of finding anomalies among which a significant number prove to be malignant. Diagnosis is the ability to characterize a detected anomaly as benign or malignant. A useful detection technique may have no value for diagnostic evaluation; conversely, a diagnostic test may be of little use until an anomaly has been detected.
Screening and Threshold Sensitivity
Screening is an evaluation to detect unsuspected disease. There are various levels of screening. For a woman who has a lump detected on clinical examination, the uninvolved areas of breast tissue can be screened for breast cancer. An individual, asymptomatic woman can be screened, as can large populations.
The term screening is derived from the process of filtering or sifting. A screen over a window is used to filter out bugs while letting air through. It is common to use a screen to sift a material such as soil to remove rocks. To accomplish this, a screen is used with openings that are chosen to allow the desired-size particles to pass through while holding back the undesirable particles. The size of the openings will determine the size of the rocks that will be trapped and prevented from passing through the screen. If the soil clumps are similar to small rocks, then openings that are too small will trap good soil as well as the unwanted rocks. If the openings are made larger, more soil will pass through, but rocks will also be allowed to pass through the screen.
The size of the openings in a sifter can be compared to the threshold sensitivity of a breast cancer screening program. The thresholds that are used by the radiologist to raise concern over a lesion are like the openings in the sifter. Just as with the stones and soil, the lower the radiologist’s threshold, the more cancers will be detected, but the higher the percentage of benign lesions (soil) that will be trapped by the screen. If the threshold for intervention is raised (the openings enlarged), fewer benign lesions will be investigated (false positives), but more cancers will be missed (1,2) as false negatives (see “Positive Predictive Value”).
If screening is to be successful, cancers must be found at a smaller size and earlier stage than they would be without screening. Ultimately, screening for breast cancer is efficacious only if mortality can be deferred or prevented.
Detection must precede diagnosis, and detection of breast cancer at an earlier stage is the most important function of an imaging technique. Mammography is an excellent detection technique, but it is not diagnostic unless the lesion has the typical characteristics of a malignant process or characteristic benign calcifications, or is an encapsulated fat-containing lesion defining a benign mass. If intervention is to be undertaken earlier, many benign lesions will require biopsy because they frequently cannot otherwise be distinguished from cancer. Until preventive measures can be discovered or improvements in therapy developed, reductions in mortality from breast cancer will come only with improvements in detection capability.
Improvements in diagnostic capability will benefit women by reducing the number of biopsies of benign lesions that must be performed. Ultrasound is an example of a useful diagnostic test in its capacity to determine that a mass is a cyst. However, the fact that it is useful to further characterize a lesion once it has been detected has no bearing on its ability to detect a lesion de novo. Diagnostic studies are of use only once the lesion has been detected. Because breast biopsy (needle or excisional) is a relatively safe procedure with very low morbidity, a noninvasive diagnostic test must be extremely accurate in separating benign from malignant lesions if, by using it, a biopsy is to be avoided.
X-ray mammography is currently the only imaging modality
with proven efficacy for screening in that it is the only imaging test that has been shown in properly performed trials to decrease the death rate from breast cancer. Given the increasing prevalence of breast cancer with age, the concomitant decreasing radiation risk with increasing age, and the evidence of mortality reduction, it is reasonable for screening to begin for women by age 40. An annual mammographic study in conjunction with a careful physical examination should be coupled with reinforcement of the potential benefits of breast self-examination.
with proven efficacy for screening in that it is the only imaging test that has been shown in properly performed trials to decrease the death rate from breast cancer. Given the increasing prevalence of breast cancer with age, the concomitant decreasing radiation risk with increasing age, and the evidence of mortality reduction, it is reasonable for screening to begin for women by age 40. An annual mammographic study in conjunction with a careful physical examination should be coupled with reinforcement of the potential benefits of breast self-examination.
Providing Low-Cost, High-Quality Screening
Efforts should be made to reduce the cost of mammographic screening so that all women can be tested, but image quality must not be sacrificed. Low-cost screening does not imply low-quality mammography. Because the screening study is the earliest opportunity for detection, it should use the highest-quality mammography. Well-trained technologists are required to position the breast properly, and experienced radiologists are needed to interpret the studies.
In an effort to reduce the expense of screening and make it available to all segments of the population, efficiency must be improved while maintaining quality. The screening session is the time at which the full benefit of mammography is realized, and mammographic imaging at screening must be optimized. Although the psychological aspects of breast cancer detection and diagnosis are unique, the mammography screening study is identical to cervical cancer screening using the Papanicolaou test: the study is negative, requires close follow-up, requires recall of the patient for additional evaluation, or is sufficiently suspicious to require a biopsy.
Educating Women on Approaches to Screening
Because of the fears associated with breast cancer, there is great psychological pressure on the screened woman. Unfortunately, the media have promoted the idea that the screening mammogram should be reviewed immediately by the radiologist and a report should be immediately available to the screened woman. Given the relatively slow progression of most breast cancers, there is no biologic reason to rush the interpretation of the mammogram. It is more important to avoid overlooking an early breast cancer than it is to provide an immediate report. Unfortunately, in an effort to promote screening, women have been led to believe that a negative mammogram is reassurance that they do not have breast cancer. This is the unjustified reason for the pressure for immediate interpretation.
The reality is that a negative mammogram does not ensure that a woman does not have breast cancer. In fact, as many as 20% of cancers become clinically evident over the course of 1 year having not been seen on a mammogram at the start of that year (3). Mammography has little if any efficacy for excluding cancer, and a negative mammogram provides little reassurance. Its value is its ability to demonstrate many cancers at an earlier stage, not its ability to exclude cancer. Having a radiologist on site during the screening phase will reduce the necessity to recall patients with questionable abnormalities, but it will significantly increase the cost of screening, and rushing the review of the mammogram to provide an immediate report could result in early cancers being overlooked.
Accuracy and Cost-Effectiveness of On-Site Reading
Ghate et al found that their recall rate decreased from 18% to 14% when they changed from online reading to batch reading (4). They had high recall rates for both approaches, but reading on line resulted in far more recalls with no significant increase in cancer detection. Most well-trained radiologists can read 40 to 50 mammograms in a 2-hour “batch-reading” session if those studies are organized on an alternator along with previous available examinations so that film handling by the radiologist is minimized. If the interpreting radiologist is experienced, between 93% and 96% of the studies will be read as negative, and 4% to 7% of women screened will be recalled for additional study. Having a radiologist on site reading the same cases, as they are performed, and discussing the mostly negative (93% to 96%) results with the patient would save the 4% to 7% recalls, but it would mean that the same number of cases would take 8 hours of radiologist time. This inefficient approach is not cost-effective.
Batch reading is not only more efficient but also permits more than one review, which facilitates double reading at little added expense with a demonstrated benefit in detection. In our practice, we found that our detection of clinically occult malignancy increased by over 7% just by having a second radiologist quickly review the batch and look for abnormalities that were overlooked by the primary reader.
If a radiologist is asked to consult on each study as it is taken, he or she will be involved for a full 8-hour day to interpret the same number of studies. The cost will have to be increased to account for this inefficient use of radiologist time as well as the delay in patient throughput created by such a system. Double reading in such an “online” system will lead to further increase in cost.
It is more cost-effective to separate the screening component from the diagnostic or lesion evaluation function of breast imaging. The less frequently used, often more expensive modalities, such as ultrasound- and imaging-guided biopsy technologies, are more efficiently used when grouped centrally. Multiple, highly efficient peripheral screening sites should ultimately be linked to a central comprehensive evaluation center to expedite the analysis of any abnormality found at screening. The comprehensive center will permit the rapid evaluation of a woman with any type
of breast problem in a cost-effective manner. The separate screening centers need only produce high-quality contact mammographic studies. Film/screen images can be sent to the main center for review. This may be facilitated electronically if a digital mammogram is obtained. Magnification mammography, ultrasound, computed tomography (CT), magnetic resonance imaging (MRI), fine-needle aspiration cytology, core needle biopsy, localization, and excisional biopsy capabilities can be concentrated in a centralized breast evaluation center fed by multiple screening sites. The evaluation center should also be used to assess women with clinically evident signs or symptoms of breast cancer and can facilitate the multidisciplinary care of women diagnosed with breast cancer. Organizing breast cancer detection and diagnosis as separate functions is likely to be more cost-effective in the long run.
of breast problem in a cost-effective manner. The separate screening centers need only produce high-quality contact mammographic studies. Film/screen images can be sent to the main center for review. This may be facilitated electronically if a digital mammogram is obtained. Magnification mammography, ultrasound, computed tomography (CT), magnetic resonance imaging (MRI), fine-needle aspiration cytology, core needle biopsy, localization, and excisional biopsy capabilities can be concentrated in a centralized breast evaluation center fed by multiple screening sites. The evaluation center should also be used to assess women with clinically evident signs or symptoms of breast cancer and can facilitate the multidisciplinary care of women diagnosed with breast cancer. Organizing breast cancer detection and diagnosis as separate functions is likely to be more cost-effective in the long run.
The negative implication of this approach is that 4% to 7% of women may be asked to return for additional study after an abnormal screening test, with the attendant increased anxiety and some additional cost. The number of recalls decreases with repetitive screening (5). The breast, by mammography, is quite stable from year to year, and it is only changes, with their higher likelihood of malignancy, that elicit recall among women who are repetitively screened. Women and physicians must be educated to understand these issues so that costs may be kept down, permitting high-quality screening to be available to all women.
Perception and Double Reading
Double reading is strongly recommended. It is a psychovisual fact that all observers (even the most experienced) fail to perceive significant abnormalities. This has been demonstrated in other types of image interpretation, and mammographic interpretation is no exception. Studies have shown that skilled observers fail to detect pulmonary abnormalities that are visible in retrospect (6), and even experts will fail to perceive significant bowel lesions on barium enemas and fractures on bone films.
This phenomenon is clearly demonstrated in daily life in the common occurrence of searching for a known object such as car keys. The harder one searches, the more frustrated one may become. However, another observer might easily point out that the keys are lying on a table in clear view. The reason for the failure to perceive by the first observer is unclear but common to all. The failure to see a cancer that in retrospect may even be obvious is not negligence but appears to be an immutable psychovisual threshold of the human visual process.
Failure to Detect a Cancer
The fact that a radiologist is not negligent for not having seen a cancer when it is visible in retrospect is not always clear to those outside of radiology. It is helpful to explain the problem by using a more commonly understood comparison. The caricatures drawn by the artist Al Hirschfeld are a good example. The artist periodically included his daughter’s name (Nina) in his work (Fig. 11-1). He facilitated the search for the name by indicating the number of times her name was incorporated in his drawings, and he wrote this beside his signature. The visual search for “Ninas” is similar to the radiologist’s search for cancers. The difference lies in the fact that the radiologist is not provided with any indication as to which images contain the abnormality or its definition. Compounding the difficulty is that there will be only 2 to 10 cancers found among every 1,000 women screened. This means that the radiologist must review 4,000 images (two of each breast) to find the small number that contain a visible cancer. It is virtually inevitable that cancers will be missed, despite careful image review.
Most observers fail to see different lesions on mammography. Fortunately, just as the second observer easily saw the keys on the table, the failure to perceive a cancer can be reduced by having a study reviewed by a second reader (double reading). Using a double-reading system, Bird (7) had a 5% improvement in cancer detection. Tabar et al (8) and Thurfjell et al (9) found that double reading increased the detection rate of breast cancer by 15%.
Although there are benefits from double reading, it is not clear that it should be the standard of care. Double reading can lead to an increase in the number of false-positive results. Unless it is practiced in an efficient manner, double reading will increase the cost of screening, and increased cost may force a reduction in access for many women. To keep costs to a minimum, efficient interpretation with a minimum of added effort is required. Double reading with little or no increased cost can be accomplished if screening is provided in a rapid-throughput, high-quality setting, with the film interpretation deferred and performed by batch reading.
Massachusetts General Hospital System
As noted above, in our study at Massachusetts General Hospital, a second reader increased the detection rate of malignancy by 7%. Our system involves mounting the screening studies on an alternator or reviewing digital images on the monitor. The main reader reviews the studies methodically and uses the previous studies as well as the history to render an interpretation using short codes into a computer beside the multiviewer. When given the completion command, the computer can easily translate the codes into a full written report. When uninterrupted, it takes the main reader 1 to 2 hours to review approximately 30 to 40 studies (this can be faster if computer interactions are kept to a minimum). The main reader circles any suspicious areas with a wax marker (or circles the finding on the digital examination and prints out the images showing the lesion; we do not save the circles in the PACS).
Once the main reader’s review is complete, the “quick reader” reviews the cases, trying to find significant abnormalities
that the main reader may have overlooked. The quick reader can review the same cases very rapidly because the quick reader has no paperwork to do and does not even interact with the computer unless a problem is found. The quick reader does not routinely look at previous studies. The quick reader’s review can be completed in 10 to 15 minutes or less. If the quick reader finds an abnormality overlooked by the main reader, the computer code is changed, and the quick reader becomes the interpreter and is responsible for that case.
that the main reader may have overlooked. The quick reader can review the same cases very rapidly because the quick reader has no paperwork to do and does not even interact with the computer unless a problem is found. The quick reader does not routinely look at previous studies. The quick reader’s review can be completed in 10 to 15 minutes or less. If the quick reader finds an abnormality overlooked by the main reader, the computer code is changed, and the quick reader becomes the interpreter and is responsible for that case.
When the quick review is complete, the computer is directed to finalize the reports and they are sent to the hospital reporting system as the permanent record and to the referring physicians, and a letter is sent with the results to the patient. We double read all studies but do not list the two readers. Double reading is not reimbursed. We double read at no charge for the benefit of the patient but do not provide two names, since there is no reason to provide a lawyer with two people to sue if a cancer is missed.
In our prospective study of 5,899 women screened by mammography, a total of 39 malignant lesions were detected (6 to 7 per 1,000 women screened). The main reader detected 36 of the 39; an additional 3 were detected by the quick reader. This clearly showed us that the double reading can improve cancer detection rates. Even experienced readers periodically fail to detect a lesion that is visible to a second reader. When we reviewed the results of double reading among radiologists with a minimum of 4 years of staff experience, we still picked up an additional 2% more cancers with double reading (presented at the RSNA, 2004).
Our prospective study of double reading also pointed out an important additional phenomenon: not surprisingly, reading quickly can lead to mistakes. The quick (second) reader, who read faster than the main reader, missed eight cancers that were detected by the slower, more methodical, main reader. Clearly, reading too quickly can lead to overlooking cancers. These data show that if a single reader is used, methodical review is needed (slow is better than fast), but it is preferable to have a second observer because this second rapid review can improve the detection rate of breast cancer at minimal cost. These data also argue against online reading. We found that when we read online, the patients became impatient in the waiting room waiting for their results. They complained to the technologists, who would pressure the radiologists in the reading room to interpret the studies more quickly, raising the risk of overlooking a lesion. Rushing the radiologist can lead to errors,
and this is another reason to discourage online interpretation in favor of batch reading.
and this is another reason to discourage online interpretation in favor of batch reading.
Diagnostic Mammography
Although the term diagnostic mammography has come into general use, it is really a misnomer. The greatest benefit from mammography is derived from its use as a screening test in the evaluation of asymptomatic women. Capable of detecting breast cancers 1.5 to 4.0 or more years earlier than they might ordinarily be found (10,11), mammographic detection at a smaller size and earlier stage has been shown to interrupt the natural history and lethal progression of the disease and result in reduced or delayed mortality (8,12). Mammography can be of some diagnostic value and can be used to heighten the concerns raised by clinical examination when there is concordance between the mammogram and physical examination and both are suspicious, but it cannot be used to reliably differentiate benign from malignant processes and cannot be used to exclude cancer.
In the symptomatic patient, mammography occasionally provides a useful check on a suspicious clinical finding. There is the theoretical possibility that a palpable lesion may not be of sufficient concern on clinical examination to prompt a biopsy by itself. If the mammogram demonstrates a suspicious finding, it might prompt earlier intervention. There are anecdotal cases of this happening, but it has never been documented in a prospective scientific fashion and in our experience is very rare.
Cancer, even when palpable, can be missed at biopsy. The simultaneous demonstration of a mammographically suspicious lesion can be used to reduce a possible delay in diagnosis when, for whatever reasons, the biopsy of a suspicious mass produces benign results (13). If the mammogram suggests a highly suspicious lesion (Fig. 11-2) and the biopsy fails to confirm malignancy, the mammogram should be repeated as soon as possible to confirm that the suspicious abnormality has indeed been removed or to permit rebiopsy if necessary.
Mammography can on occasion be used to avoid the biopsy of a benign mass. The demonstration of a calcifying fibroadenoma (Fig. 11-3) or an encapsulated lesion containing fat, such as a lipoma (Fig. 11-4), oil cyst, or hamartoma, is sufficiently diagnostic that biopsy can be avoided. However, given the importance of early breast cancer diagnosis, the relatively low morbidity from a tissue diagnosis, and the fact that mammography does not reveal all cancers and that some cancers have benign characteristics on mammography, it is usually unsafe to rely on a mammogram to exclude the possibility of cancer.
As many as 60% of palpable abnormalities are not visible on the mammogram (14). Even with tangential, spot-compression views, palpable cancers may not be defined on the mammogram. Ultimately, just as a mammographically detected abnormality must be resolved satisfactorily, a clinically detected abnormality must be resolved satisfactorily. In reality, among women in whom malignancy is clinically suspected, mammography is used primarily to screen the remainder of the breast in question as well as the contralateral breast for clinically unsuspected cancer (15).
The major benefit from mammography is the ability to detect clinically occult, nonpalpable anomalies in the breast. Many of these prove to be malignant, but others with similar morphologies prove to be benign. Because mammography and physical examination provide information based on different tissue characteristics (16,17), concerns raised by one can rarely be negated by lack of corroboration from the other and, as has been often reiterated, a negative mammogram should not result in delayed intervention when a clinically suspicious lesion is present.
Similarly, a negative physical examination should not delay diagnosis in the breast containing a mammographically suspicious abnormality.
Similarly, a negative physical examination should not delay diagnosis in the breast containing a mammographically suspicious abnormality.
Figure 11-3 Calcified fibroadenoma. A calcifying, involuting fibroadenoma requires no additional evaluation, even if palpable. |
Figure 11-4 Lipoma. This radiolucent, encapsulated lesion (arrows) is a benign lipoma and requires no further evaluation, even if palpable. |
Screening Terminology
Many of the measures of a screening program involve statistical concepts. The following is a short summary of several important terms:
Prevalence is the number of cancers in a population at a given point in time. If there has been no screening, a number of cancers will be growing undetected in the population. When screening begins, the cancers that have been building in the population (over many years) will be found, as well as the cancers that could only be detected for the first time in the population in that year (the incident cancers). Thus, in the first year of screening (prevalence year), there should be more cancers detected than in subsequent years (incident years).
Incidence is the number of cancers that are diagnosed each year subsequent to a prevalence screen. It is a measure of the number of cancers that rise to the level of detectability for the threshold of detection that is in effect at the time and over the time period (usually 1 year). The number of incident cancers depends on the ability of the screen to detect small cancers and the growth rate of the cancers in the population.
Lead time is a measure of how much earlier a screening method can detect cancer over a comparison method (e.g., usual medical care). The lead time can be gauged by subtracting the number of cancers found at the incidence screen from the number found at the prevalence screen and dividing the result by the number found at the incidence screen. In other words, approximate lead time (in years) = (prevalence – incidence)/incidence, or the number of years earlier that a cancer can be detected by the new threshold = ([old cancers that could have been detected years earlier + the new cancers detectable for the first time this year] – the new cancers detectable for the first time this year)/the new cancers detectable each subsequent year at the new threshold. For example, assume the prevalence screen detects six cancers per 1,000 women, and the incidence screen detects two cancers per 1,000 women. The approximate lead time for the screen is (6 – 2)/2 = 2 years. Put another way, the lead time is a measure of how many incidence years (as measured by the screen) of growth occur before a lesion is detected in a prevalence screen.
Prior probability of cancer is how many cancers are expected to occur in a given population over a period of time. This is a measure of the expected prevalence and incidence of cancer in a population. For example, because cancer is more common among older women, the prior probability of cancer is higher among older women than among younger women.
Positive predictive value (PPV) is a measure of the discriminating power of the method being analyzed. It is defined as the number of true cancers divided by the number of tests called positive for cancer. This may be used to measure the discriminating power of the mammogram as a measure of what percentage is read as abnormal and needs additional imaging, or as a measure of the aggressiveness of intervention when biopsies are recommended. For the latter, PPV for biopsy recommended = the number of cancers diagnosed divided by the number of lesions recommended for biopsy.
True-positive results are the lesions that are called cancer and prove to be cancer.
False-positive results are lesions that are called cancer that prove to be benign.
False-negative results are mammograms or lesions that are called negative or benign but prove to be cancer.
True-negative results are mammograms or lesions that are called negative and prove to be negative.
Interval cancers are cases where no cancer is detected at screening but a lesion is diagnosed before the next screen. The interval is usually 1 or 2 years.
Sensitivity is a measure of the test’s capability of finding cancers that are in the population. It is calculated by dividing the number of cancers correctly diagnosed by the screen by the total number of cancers that are actually present in the population: sensitivity = true positives divided by (true positives + false negatives). Sensitivity decreases as false negatives increase.
Specificity is a measure of how successful a screen is at saying that cancer is not present when it really is not present. It is calculated by dividing the number of cases correctly called negative by the number of cases that actually are negative for cancer: specificity = true negatives divided by (true negatives + false positives). Specificity diminishes as false positives increase.
Gauging The Success of A Screening Program