Evaluating intra and inter-observer bias in the cosmetic rating for random vs. serial assessment of breast photographs

Abstract

Background

This study was done to assess inter and intra-rater bias in rating of cosmesis, when breast photographs were evaluated serially or randomly by a panel of six members having varying years of experience.

Methods

Cosmetic assessment was done subjectively for 175 images [of 50 unilateral breast cancer patients for whom at least 3 images were collected], that were arranged serially from baseline to follow up in chronological order termed ‘serial assessment setting’ [SAS]. For ‘random assessment setting’ [RAS], all images was randomly arranged for assessment. Objectively assessment was also done using BCCT.core. Kappa index was calculated for agreement between the RAS and SAS rating for the 3 panellists’ groups and with BCCT.core.

Results

Good agreement [kappa 0.659] was found between the mean panel cosmetic scores for both SAS and RAS. Fair agreement was found when subjective RAS [k = 0.301] and SAS [k = 0.343] scores were compared with the BCCT.core, which was highest for the most experienced panellists with SAS k = 0.387 and RAS k = 0.436. Both SAS and RAS had good intra-rater reliability.

Conclusions

SAS improves the agreement with BCCT.core rating and may be used if validated in a larger cohort. The clinical experience of the panellist impacts cosmetic rating and must be considered before forming a panel.

1 Introduction

The impact of Breast Conserving therapy [BCT] has been monumental on quality of life. Assessment of Cosmesis and Quality of life forms an important endpoint for clinical research. Various studies have demonstrated high correlation between poor cosmetic outcome and low scores on quality of life indicators [ , ].

Subjective or objective cosmetic evaluation is integral to clinical trials related to breast conserving therapy. Subjective assessment may be done utilizing the Harris scale [ ] The Harris scale reports the outcome on a 4 point-Likert scale [excellent, good, fair and poor] based on the degree of asymmetry between the treated and untreated breasts. Even though the Harris scale is the simplest method used to subjectively score cosmesis, it is prone to high variability due to various factors like the professional background of the rater, method of capturing and processing the image, etc.

The BCCT.core is a commonly employed software for objective evaluation that processes images and rates the cosmesis on a 4-point scale [Excellent, Good, Fair or Poor] using a semi-automatic software utilizing predetermined points placed on a frontal image. It is developed and owned by Cardoso J, Cardoso M and INESC Porto Breast Research Group. The main objective was to develop a reproducible and widely available methodology for evaluation of aesthetic results in breast cancer conservative treatment, enabling effective comparison of outcome between centers [ ].

The BCCT.core software is easy to use and user friendly. It requires an upright breast photograph with a light background and good contrast resolution. The outcome of the assessment is based on various different factors, predominantly the asymmetry between the treated and untreated breasts. The software utilizes fixed points on the image (sternal notch, and a fixed distance below the sternal notch), and breast contour as set by the user on the image, to give a global cosmetic rating. An illustration of the software is shown in the supplementary file as figure no. 1.

Breast Reconstruction Assessment [BRA] is also an objective method of cosmetic evaluation [ ]. However, all of these assessment scales come with inherent inconsistency and bias. Pezner et al. have demonstrated that it is difficult to obtain consensus of cosmetic outcome with the commonly employed scales [ ].

Cosmetic rating on breast photographs is commonly done in clinical trials, mostly retrospectively by a panel of experts like in the UK START Trial B [ ]. If the rating is done using breast photographs, factors affecting the image quality will also affect the cosmetic rating. Use of standardized methods to acquire the photographs under uniform room lighting conditions, distance of the camera from the patient, and specifications of the camera can minimize this variability. The rating done by a clinician is also often different from the rating done by patients [ ]. It also varies with the experience [ ], and fatigue of the rater [when a bulk of images have to be rated].

It is also possible that if a clinician rates the images in serial order chronologically, one patient at a time, the rating of consequent images might change after rating the first image, either negatively or positively, thereby creating a bias. It is noteworthy that routine cosmetic evaluation done in high volume clinics is random since the clinician is unlikely to compare the current cosmetic status with a previously taken image. Hence, when cosmetic analysis is done retrospectively, and images of the patient taken at different time points are available to influence the decision of the rater, the resultant rating might be different from actual assessments done in clinics [which are random]. At the same time, serial images may also guide the clinician to give a better informed global cosmetic rating to ultimately know the temporal changes [worsening or improvement] in the treated breast.

In an IEC approved study being conducted in our center [CTRI/2020/01/022871], images of breast cancer patients who underwent breast conserving surgery [BCS] and adjuvant Radiotherapy have been collected at baseline and follow up visits, for cosmetic evaluation by a panel. These images were utilized in the current study with the objective of evaluating inter and intra-rater bias for assessment of cosmesis in the serial vs. random assessment settings, and if the experience of the rater impacts the global cosmetic rating.

2 Materials and Method

2.1 Study population and photographs

The study population consisted of 50 women diagnosed with unilateral breast cancer treated with Breast conserving surgery and adjuvant Radiation Therapy at Tata memorial Centre Mumbai, who had at least 3 photographs [baseline and any 2 follow up visits], taken at least 6 months apart. The demographic details are given in table no 1 .

Table no. 1

Demographic data of the study population [n = 50].

Laterality	Age	Type of BCS	Axillary Surgery
Right: 26	<40 years: 15	Open Cavity: 33	Axillary sampling: 11
Left: 24	41–50 years: 22	Type I Oncoplasty: 13	Axillary clearance [I-II]: 8
	51–60 years: 8	Type II Oncoplasty [LD flap]: 2	Axillary clearance [I-III]:31
	61–70 years: 5	Type II Oncoplasty [LICAP flap]: 1
		Type III Oncoplasty [Reduction]: 1

BCS: Breast Conserving Surgery, LD: Latissimus Dosri, LICAP: Lateral Intercostal Artery perforator.

Frontal breast photographs in sitting position, light background, hands on hip, excluding the face were uniformly captured using a 13 megapixel camera on the trial device by the trial coordinator. No professional photographer was hired. The images were incorporated in Google forms for rating by the panel.

2.2 Sample size

Cardoso et al. found a kappa value of 0.59 between the rating given by experienced panellists using the Harris scale, and the BCCT.core software. The sample size was 55 patients [1 image for each patient] [ ]. Similarly, Heil et al. found the inter-rater agreement as 0.41 [mk] when the same panellists evaluated the same images one hour apart, for the images taken immediately after surgery. The sample size was 50 patients [2 images for each patient, just after surgery and 1 year after surgery] [ ].

In a test for agreement between two raters using the Kappa statistic, a sample size of 174 subjects achieves 81% power to detect a true Kappa value of 0.55 based on a significance level of 0.05. Therefore, in the current study, 175 photographs were taken for assessment.

2.3 Panel

The assessment was done by a total of 6 panellists, 2 Senior Breast RO [experience of over 10 years in Breast Oncology], 2 Junior RO [less than 2 years of experience in Breast Oncology], and 2 registered nurses [RN]. Thus three groups of panellists were created with varied levels of experience.

2.4 Evaluation

Subjective Assessment: This was performed using Harris scale [shown in Table no 2 ] by the 6 panellists.

Table no. 2

Harris scale for Cosmetic Rating.

Rating	Score	Interpretation
Excellent	1	Treated breast nearly identical to untreated breast
Good	2	Treated breast slightly different from untreated breast
Fair	3	Treated breast clearly different from untreated breast but not seriously distorted
Poor	4	Treated breast seriously distorted

The assessment was done using Google forms in which the Harris scale was populated and breast photographs were incorporated as per the study identification number.

When the photographs of each patient were arranged serially from baseline to follow-up, it was termed as ‘serial assessment setting’ [SAS] whereas in the ‘random assessment setting’ [RAS], all 175 images were randomly arranged. Both SAS and RAS were done by the six panellists independently using the 4 point Harris scale.

To avoid fatigue, the images were rated in 2 sets of 25 patients each [SAS followed by RAS]. A washout time of one week was given between SAS and RAS.

Objective Assessment: Cosmetic rating was done using BCCT.core software for each image by one of the panellists and the global rating noted. Various other indices were also produced by the software which are beyond the utility of this study and hence not reported.

2.5 Statistical analysis

●

The assessments acquired using subjective and objective methods were entered into IBM SPSS Version 24.
●

Mean Panel score was calculated for all images for SAS and RAS. Fractions were rounded off to the nearest integer.
●

Cohen Kappa [shown in table no 3 ] was used to determine the agreement between the following [A p value of 0.05 was considered statistically significant]:
- ○
  
  Mean SAS and Mean RAS
- ○
  
  Mean SAS with BCCT.core
- ○
  
  Mean RAS with BCCT.core
- ○
  
  Mean SAS and RAS for 3 panellists groups with BCCT.core
- ○
  
  Mean SAS and RAS with BCCT separately for simple BCS (Open cavity) and Oncoplasty
Table no. 3

Only gold members can continue reading. Log In or Register to continue
Share this:
X
Facebook
Related posts:

Assessing the reliability of ChatGPT4 in the appropriateness of radiology referrals Treatment of follicular lymphoma with a focus on radiotherapy Dual energy CT for the diagnosis of gout: Evaluating the optimal Hounsfield unit setting for dual energy processing Determinants of aesthetic outcome after breast conserving surgery: A prospective cohort study Age is a risk factor for long-term effects of chemotherapy on dental development in childhood cancer survivors Percutaneous biopsy of bowel lesions: Technique, efficacy and safety

Stay updated, free articles. Join our Telegram channel

Tags: The Royal College of Radiologists Open Volume 2

May 20, 2025 | Posted by drzezo in GENERAL RADIOLOGY | Comments Off

Radiology Key

Fastest Radiology Insight Engine

Evaluating intra and inter-observer bias in the cosmetic rating for random vs. serial assessment of breast photographs

Abstract

Background

Methods

Results

Conclusions

1

Introduction

2

Materials and Method

2.1

Study population and photographs

2.2

Sample size

2.3

Panel

2.4

Evaluation

2.5

Statistical analysis

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

Radiology Key

Fastest Radiology Insight Engine

Evaluating intra and inter-observer bias in the cosmetic rating for random vs. serial assessment of breast photographs

Abstract

Background

Methods

Results

Conclusions

1

Introduction

2

Materials and Method

2.1

Study population and photographs

2.2

Sample size

2.3

Panel

2.4

Evaluation

2.5

Statistical analysis

Share this:

Related posts:

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree