Research methodology and biostatistics

1.21: Research methodology and biostatistics

Amarnath C, Arun Murugan, Sam Kamelsen, Rajashree R

Statistics is the Grammar of Science.

Karl Pearson

The goal of any clinical research is to improve patient care. The performance and publication of research and studies evaluate the utility of imaging modalities and their applications. These are important for directing clinical recommendations and developing practice guidelines.

Biostatistics is defined as the application of statistical principles and techniques to biological, medical and public health research. Knowledge of biostatistical concepts will help the practicing radiologist to conduct scientifically rigorous studies and to critically evaluate the literature and to make informed clinical decisions. Exposure to research methodology, in general and statistics in particular, is limited over the course of radiology training. The clinical workload of radiologists, limited resources on research methods tailored for radiologists and the lack of interest to learn greatly hinder development of these skills. Poor knowledge of biostatistics and its inappropriate application can lead to studies with incorrect results and potentially wrong clinical implications. This chapter is an effort to prime the radiologists regarding research methodology with a focus on biostatistics. Guidelines for manuscript writing are discussed in Chapter 1.31.

A good study design forms the foundation for a research from which clinical implications can be made. A poorly performed statistical analysis can even be repeated, whereas a carelessly done study cannot be redeemed. The steps in planning and execution of a study/research include (Table 1.21.1):

  1. 1. Deciding a research topic: A good research topic should focus to address one single question. It should include details on the population, primary objective and study design.
  2. 2. Performing a review of available literature: An exhaustive search of literature should be done in order to find novel ideas, compare previous studies, outcomes and limitations of previous studies.
  3. 3. Framing a research question: PICO approach can be useful for formulating research question – Population, Intervention, Control, Outcome.
  4. 4. Formulating a research hypothesis: A research question has to be converted into a specific research hypothesis that summarizes the main elements of the study that establishes the basis for statistical tests of significance. There should be a research hypothesis in case of an analytical study which is called alternate hypothesis. There is a null hypothesis which refutes the alternate hypothesis which should be tested statistically to find out the truth.
  5. 5. Planning a research design: Should include the type of study – whether descriptive or analytical or experimental. The target population should be defined along with the sample size. The inclusion and exclusion criteria should be stated. The study design should comply with ethical standards.
  6. 6. Collection of data: Should be organized and include questionnaires, informed consent forms. If retrospective data are used, consent may or may not be necessary – adequate care should be taken to maintain patient’s confidentiality.
  7. 7. Data processing and presentation: Presentation should be simplified for better understanding using appropriate tables, graphs, diagrams.
  8. 8. Data analysis: It consists of choosing the appropriate statistical test and performing the analysis.
  9. 9. Data interpretation: Statistical significance should be interpreted in the light of the clinical information.
  10. 10. Discussion: The study discussion should include study outcomes – which may be new conclusion, refute or confirm previous conclusions, discuss limitations, scope for further studies.

TABLE 1.21.1

Research Design – Outline

Planning Execution

Research topic

Review of literature

Research question

Research hypothesis

Research Design

Data collection

Data processing and presentation

Data analysis

Data interpretation


A detailed checklist for Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement is enclosed in the annexure.

Study types

Research studies can be broadly classified as Observational and experimental, based on whether the investigator assigns the exposure/intervention (Fig. 1.21.1).

  • In observational studies, the investigator does not assign an exposure/intervention. Observational studies can be further classified as descriptive and analytical based on whether the study involves analyzing relationship between variables.
  • Descriptive studies are limited to a description of the occurrence/prevalence of a disease in a population and are at the bottom of the pyramid of evidence-based medicine. However, a descriptive study is often the first step in an investigation which can lead to the formation of a hypothesis. Case reports (of rare observations) and case series are descriptive studies.
  • An analytical study (Table 1.21.2) goes further by analyzing relationships between health status and other variables such as exposure and outcome or cases and controls. Analytical studies are used to arrive at inferences by testing of hypothesis. Case–control studies and cohort studies are purely analytical, whereas cross-sectional studies usually have limited analytical value.
  • Experimental or interventional studies involve an active attempt to change a disease determinant – such as an exposure or the progress of a disease through treatment. Experimental studies help in confirmation of a hypothesis Table 1.21.2 presents a comprehensive summary of the study designs.

Fig. 1.21.1 Overview of study types.

TABLE 1.21.2

Summary of Study Designs and Their Characteristics

Study Design and Example Characteristics Temporality Applications Advantages

Prevalence of radiological changes in the cervical spine of patients with rheumatoid arthritis.

Data collection is carried out in a single point of time; there is no follow-up period. One point in time – also called transverse study. Good for determining the prevalence of a disease in a population and for judging diagnostic procedures.

Short study period and economical.


A study to identify whether head injury is associated with cervical spine fractures. Cases: patients with head trauma along with cervical spine fracture, Controls: randomly selected trauma patients with head trauma, but without cervical spine fracture.

A group of patients and a suitable control group of people, not affected by the disease are compared regarding the occurrence of the possible causative factors of the disease or outcome of interest. Retrospective. Useful for studying rare diseases and diseases with a long latency period between exposure and outcome.

Ideal for small sample sizes.

Odds ratio can be calculated.

Maybe performed as a pilot study which forms the basis for a more complete study later.


Study to compare long-term graft and patient survival after percutaneous angioplasty (PTA) or stent placement for transplant renal artery stenosis (TRAS) with a control cohort without TRAS

A group of people who are disease free are selected and then classified into subgroups according to their common exposure to a potential causative factor of the outcome/disease. May be retrospective or prospective. Ideal study for understanding the causative factors of a disease and the risk factors – same group is followed prospectively or retrospectively.

Enables calculation of incidence, relative risk and attributable risk.

Enables estimation of time between exposure and outcome.

Randomized controlled trial (RCT)

Robotic versus Freehand Needle Positioning in CT-guided Ablation of Liver Tumours: A Randomized Controlled Trial

Subjects in the study population are randomly allocated to intervention and control groups, and the results are assessed by comparing outcomes. Prospective. Ideal study for evaluating the efficacy’ of’ an intervention (procedure or treatment of interest).

Strong evidence-based design, unbiased when blinded, strong’ causal’ relationship.

Experimental study

Preclinical imaging trials involve the visualization of living animals for research purposes, such as disease models and drug development. Imaging modalities help in observing changes, either at the molecular, cell, tissue or organ level in animals responding to physiological or pathological stimuli. Imaging modalities have an advantage of being non-invasive and in vivo. They can be broadly classified into anatomical and functional imaging techniques. Anatomical imaging is done using ultrasound, magnetic resonance imaging and computed tomography. Whereas optical imaging, perfusion and other functional cross-sectional imaging, positron emission tomography (PET) and single photon emission computed tomography (SPECT) are usually used for functional imaging.

Clinical Trials are experimental studies used for the introduction of a new intervention, conventionally a new drug, but can also be extended to include trials used for the evaluation of a new stent or type of coil for embolization. They consist of four phases (Table 1.21.3).

TABLE 1.21.3

Phases of a Clinical Trial

Phase Population Purpose
Phase 1 Healthy volunteers Document the safety of the intervention in humans
Phase 2 Small group of real patients Focus on efficacy while still providing information on safety
Phase 3 Randomized controlled trials involving large group of real patients Efficacy of different modes of administration of the intervention to patients
Phase 4 Post-marketing studies of the intervention Possible long-term adverse events not yet documented

Example: Comparison of uterine artery embolization and high intensity focussed ultrasound for the management of uterine fundal leiomyomas.

Multi-centric study: A multi-centric study is a study that is conducted at more than one research or medical centre. The advantages include a large number of geographically varied subjects and the ability to compare results from all of these groups. The most common disadvantage of randomized clinical trials is their high expense. Hence, multi-centric trials that utilize cooperation between many research centres are becoming more common. For example, MDCT characterization of the Severe Acute Respiratory Syndrome coronavirus 2(SARS-CoV-2) pneumonia has been done as a multi-centric trial in order to obtain and analyze information on the pathogenesis of this novel infectious disease across geographic boundaries.

Systematic reviews and meta-analysis

Reviews are vital instruments for researchers and clinicians who want to be up to date with the evidence that has been accumulated in their fields. They can be structured-systematic reviews, meta-analysis and pooled analysis or unstructured-narrative reviews/commentary. The structured reviews enable evaluation of existing evidence on a topic of interest and concludes to confirm/support a practice, refute a practice or identify areas for which additional investigations are needed.

A systematic review involves selecting, assessing and synthesizing all available data. Whereas meta-analysis employs statistical methods in combining the evidence/data derived from a systematic-review of multiple studies that aim to address the same clinical and scientific question. For example, MRI for Detecting Root Avulsions in Traumatic Adult Brachial Plexus Injuries: A Systematic Review and Meta-Analysis of Diagnostic Accuracy Ryckie G. Wade et al. 11 studies of 275 adults (mean age, 27 years; 229 men) performed between 1992 and 2016 were included for analysis.

Planning a study

The initial steps in planning a study include

  1. 1. Defining the research question, which should be novel, feasible and relevant.
  2. 2. Conducting an exhaustive literature search to find similar studies and to chart out the methodology of proposed study.
  3. 3. Sample size estimation to assess the practicability of the proposed study in a single centre or the necessity for extension into a multi-centric study if a large sample size is required.

Research question

PICO (Population, Intervention, Control group and Outcome) is a useful acronym for formulating a research question for various study designs (Table 1.21.4).

TABLE 1.21.4

PICO Approach for Formulating a Research Question: (Control Group not Necessary for all Studies)

Population Who is the study population?
Intervention What is the intervention/diagnostic test of interest?
Control or comparison What is the intervention/diagnostic test for comparison?
Outcome What is the outcome of interest?

Characteristics of a good research question can be summarized using the acronym (FINER): Feasible-adequacy of sample, affordability in time and money, I: Interesting-getting the answer excites the investigator, peers, community, N: Novel-confirms/refutes/extends previous findings, E: Ethical-beneficial without causing harm and in accordance with ethical principles, R: Relevant to advance scientific knowledge, health policy, to enable future research. Table 1.21.5 presents study designs with research questions formulated using PICO approach.

Reproduced from D. Sackett, W.S. Richardson, W. Rosenburg, R.B. Haynes. How to practice and teach evidence based medicine, second ed., Churchill Livingstone, 1997.

TABLE 1.21.5

Study Designs With Research Questions Formulated Using PICO Approach

Population/Cases Intervention/Exposure Control Outcome Design

“Is ultrasound (index test) as accurate as MRI (reference standard) to determine the position of the placenta in pregnant women with clinically suspected placenta previa?”

Pregnant women with clinically suspected placenta previa

None None Position of placenta-whether placenta previa Cross sectional

“Is Gadobenate Dimeglumine able to cause nephrogenic systemic fibrosis in patients with chronic kidney disease?”

Patients with chronic kidney disease

Gadobenate Dimeglumine None Nephrogenic systemic fibrosis Prospective cohort

“Is fetal MRI performed during the third trimester of gestation able to cause learning disability in this population by the time they reach school age?”

Children of school age who underwent a fetal MRI in their third trimester of gestation

Fetal MRI performed during the third trimester of gestation None Learning disability at school age Retrospective cohort

“Does hormone replacement therapy increase the risk of having a mammographic density?”

Patients with mammographic density

Risk factor: Hormone replacement therapy Age/sex matched persons without mammographic density Mammographic density Case control

“Does the use of radiofrequency ablation for solitary unresectable hepatocellular carcinoma result in better outcome compared to trans-arterial chemoembolization?”

Experimental intervention:

Radiofrequency ablation

Standard intervention: Trans-arterial chemoembolization

Primary endpoint (outcome): Shrinkage of tumour size after 3 months Patients with unresectable hepatocellular carcinoma are randomized to (i) radiofrequency ablation or (ii) trans-arterial chemoembolization treatment groups Randomized trials

Sampling techniques

A sample is a group of individuals who are the study subjects and should be representative of the entire population. As one cannot practically study the entire population, sampling is done. Sampling should be done in such a manner that the results can be generalized to the entire population.

Samples can be drawn from the entire population through various methods, broadly classified into two groups (Fig. 1.21.2).

  1. 1. Non-random sampling – No known probability is attached in selecting a sample, and is at least partly subjective. Subtypes include: convenience – e.g. outpatients to the ultrasound OPD. Most commonly convenience sampling is used in radiological studies. Snowball sampling involves finding a few individuals initially and they in turn refer more similar individuals.
  2. 2. Random/probability sampling – Every item in the population has a known chance or probability of being chosen.

Fig. 1.21.2 Types of sampling.

Every individual has an equal chance of being selected for the study in simple random sampling. Usually a computer-created list of random numbers is used to select a simple random sample.

A systematic random sample is one in which every nth item is selected. For example, every fifth patient scanned is included in a study. In stratified random sampling, the population is first divided into subgroups and then sampled. A cluster sample is used more often in epidemiologic research than in clinical studies.

Both random samples and randomization involve the use of the probability sampling method, but are different concepts (Table 1.21.6).

TABLE 1.21.6

Difference Between Random Sampling and Randomization

Random Sampling Randomization
Random sampling determines which individual will be included in the sample. Randomization, or random assignment, states which subjects to be assigned to the treatment or control group.
It is related to sampling and external validity. It is related to design and internal validity.

Sample Size: It is vital to design studies with the suitable prospect of detecting effects when they are really present, and not sensing effects when they are not. Statistical power is the probability that a statistical test will indicate a significant difference when there truly is one. Statistical power is in turn determined by sample size. The sample size must be large enough for the study to have appropriate statistical power. A number of study design factors influence the calculation of sample size, like prevalence of the disease, the minimum expected difference between the two means and acceptable error. There are many formulae and computer applications that simplify the task considerably.

Sample size estimation for descriptive studies

In studies designed to estimate a mean, the equation for sample size is:

N = 4 σ 2 ( z crit ) 2 ÷ D 2 (Eq. 1.21.1)



  1. 1. N is the sample size of the single study group,
  2. 2. σ is the assumed SD for the group,
  3. 3. the zcrit value varies for the desired significance criterion, and
  4. 4. D is the total width of the expected CI.

Example, suppose a radiologist wants to determine the mean fetal biparietal diameter in a group of patients in the second trimester of pregnancy. The required limits of the 95% confidence interval is mean biparietal diameter ±1 mm. From existing studies, it is known that the SD for the measurement is 4 mm. Based on these assumptions,

D = 2 mm, σ = 4 mm, and zcrit = 1.960 (for 95% CI). N = 4 × (4)2 × (1.96)2/4 = 4 × 16 × 3.84/4 = 61.4.

Eq. (1.21.1) yields a sample size of N = 61. Therefore, 61 fetuses should be examined in the study.

In studies where the scale of measurement is in terms of a proportion, the equation for sample size is given by Eq. (1.21.2)

N = 4 ( z crit ) 2 × p ( 1 − p ) ÷ D 2 (1.21.2)


where p is a pre-study estimate of the proportion to be measured, and N, zcrit and D are defined as they are for Eq. (1.21.1). For example, suppose an investigator would like to determine the accuracy of a diagnostic test with a 95% CI of ±10%. Consider that on the basis of results of earlier studies, the estimated accuracy is 80%. With these assumptions, D = 0.20, P = .80 and zcrit = 1.960.

N = 4 × ( 1.96 ) 2 × 0.8 ( 1 − 0.8 ) / 0.04 = 4 × 3 . 84 × 0 . 8 × 0 . 2 / 0 . 04 = 61 . 4


Eq. (1.21.2) yields a sample size of N = 61. Therefore, 61 patients should be examined in the study.

Sample size estimation for comparative studies.

For a study in which two means are compared, sample size can be determined using Eq. (1.21.3)

N = 4 σ 2 ( z crit + z pwr ) 2 ÷ D 2 (1.21.3)



  1. 1. N denotes the total sample size (the total sizes of both comparison groups),
  2. 2. σ is the assumed Standard Deviation of each group (assumed to be equal for both groups),
  3. 3. Both zpwr and zcrit are derived from a standard normal probability distribution that defines probabilities matching the specified statistical power and significance criterion, respectively,
  4. 4. D denotes the minimum expected difference between the two means.

Example: Consider a study planned to compare renal artery stenting versus medical therapy in lowering the mean blood pressure of patients with secondary hypertension due to renal artery stenosis. Based on existing literature, the investigators estimate that renal artery stenting may help lower blood pressure by 20 mm Hg, while medical therapy may help lower blood pressure by only 10 mm Hg. Based on existing literature, the standard deviation (SD) for lowering blood pressure is estimated to be 20 mm Hg.

According to the normal distribution, this SD indicates an expectation that 95% of the patients in either group will experience a blood pressure lowering within 40 mm Hg (2 SDs) of the mean. A significance criterion of.05 and power of 0.80 are chosen.

With these assumptions, D = 20 – 10 = 10 mm Hg, σ = 20 mm Hg, zcrit = 1.960, and z = 0.842.

N = 4 × (20)2 × (1.96 + 0.842)2/102

= 4 × 400 × (2.802)2/100.

= 1600 × 7.85/100 = 125.6.

Eq. (1.21.3) yields a sample size of N∼126.

Therefore, a total of 126 patients should be enroled in the study: 63 to undergo renal artery stenting and 63 to receive medical therapy.

For a study in which two proportions are compared with Chi square test, the equation for sample size estimation is Eq. (1.21.4)

N = 2 [ z crit 2P ¯ ( 1 − P ¯ ) + z pwr ( P 1 ( 1 − P1 ) + P2 ( 1 − P2 ) ) ] 2 ÷ D 2 (1.21.4)


where P1 and P2 denote pre-study estimates of the two proportions to be compared,

the minimum expected difference, D = P1 – P2.

Image = (P1 + P2)/2, and N, zcrit and zpwr are as previously defined.

Assumptions of this calculation include equal samples of N in the two groups and that two-tailed statistical analysis will be used. It is to be noted that in this case, N depends on the magnitude of the two proportions in addition to their difference.

Hence, Eq. (1.21.4) requires the estimation of P1 and P2, as well as their difference, before performance of the study. However, Eq. (1.21.4) does not require a separate estimate of standard deviation because it is calculated within the equation using P1 and P2.

Example, suppose USG has an accuracy of 80% for the diagnosis of focal liver lesions. A study is proposed to evaluate contrast-enhanced USG that may have greater accuracy. On the basis of their experience, the investigators decide that contrast-enhanced USG would have to be at least 90% accurate to be considered significantly better than the standard USG.

A significance criterion of.05 and a power of 0.90 are chosen. With these assumptions,

P1 = 0.80, P2 = 0.90, D = 0.10, Image = (P1+P2)/2
= 0.85, zcrit = 1.960, and zpwr = 0.842.

Eq. (1.21.2) yields a sample size of N = 398. Therefore, a total of 398 patients should be enroled: 199 to undergo USG and 199 to undergo contrast-enhanced USG.

Potential errors in studies

Bias is persistent non-random error. Unlike random error, bias is systematic. For example, an inaccurately standardized CT scanner that gives CT attenuation values of +10 HU greater than the actual values for all tissues.

Bias reduces the accuracy of study results and is not affected by a change in the size of the sample. Source of bias are multiple, right from selection of sample, performance of the diagnostic test/reference standard, interpretation of test result and analyzing the results. This makes it very difficult to design and interpret studies. Researchers should implement ways to circumvent bias or to minimize its effect while still in the initial phase of planning.

Some common bias encountered include, but not limited to, the following:

  1. 1. Selection bias: The intended sample does not adequately reflect the target population. It can also occur in control group when the controls are not matched appropriately.
  2. 2. Loss to follow-up/attrition bias: In case of cohort studies, subjects are lost to follow-up and differ from those who remain.
  3. 3. Measurement bias happens when groups of patients are assessed by different measurement methods. Tests performed without appropriate blinding result in review bias, which is a subtype of measurement bias.
  4. 4. Pygmalion effect (bias) arises when higher expectations lead to an increase in performance.
  5. 5. In general, patients who volunteer for screening studies tend to be healthier and have better outcomes than non-volunteers. This is known as screening bias.
  6. 6. A screening test may recognize a disease that would have remained asymptomatic before a person’s death from other causes resulting in Overdiagnosis bias.
  7. 7. In screening techniques, lead time is the interval between the diagnosis of a disease at screening and the time at which it would have been detected via the onset of clinical symptoms. Early diagnosis always appears to improve survival by at least this interval, even when treatment is ineffective. This is called lead time bias.
  8. 8. Screening tests usually detect diseases with long preclinical phases which usually have favourable prognoses. This preferential detection of diseases with a long preclinical phase is called length-time bias.

Confounder is a factor that is associated with both the outcome and the exposure. Confounders change or distort real associations between the exposure and outcome variables. Example: consider a study done to assess if alcohol use (exposure) is related to increased incidence of lung cancer. It may be confounded by the effects of smoking which is associated with both the exposure and the outcome. Confounding can be reduced by measures such as matching and stratified analysis.

Effect modifiers are associated with the outcome but not the exposure. For example, in a study to assess association of thorotrast (exposure) with hepatic malignancy (outcome), Hepatitis B viral infection may be considered an effect modifier as it is associated with the outcome but not the exposure factor.

Basic terms and definitions used in biostatistics

Rate is a ratio where the denominator is in units of time, e.g. mortality rate. Rates and proportions are ratios, though not synonyms.

Ratio is the value obtained by dividing one quantity by another, e.g. ratio of male to female patients visiting USG OPD, signal to noise ratio.

Proportion is a ratio where the numerator is included in the population defined by the denominator. It may be used for measured data as well as count/frequency. As the numerator is part of the denominator, it has no dimension. However, it can be expressed as a percentage, e.g. Proportion of male patients is obtained by dividing the male patients by the total number of patients; Proportion of patients with a positive diagnosis, proportion of lung area affected in CT.

Incidence is the number of new patients per unit of time (usually per year). It is a rate.

Prevalence is the total number of existing patients in a population at a point of time. It is a proportion.

Prevalence of a disease = Incidence rate × average duration of the disease

Odds is the ratio of the probability of occurrence of an event to that of nonoccurrence, or the ratio of the probability that something is one way to the probability that it is another way. Odds = probability/1 − probability. Consider the hypothetical situation: If 60 out of 100 interventional radiologists develop a cataract and remaining 40 don’t develop cataract, the odds in favour of developing a cataract are 60 to 40, or 1.5; this is different from the risk that they will develop a cataract, which is 60 over 100 or 0.6.

Odds ratio is the ratio between two odds.

Odds ratio = ( a/c ) / ( b/d ) = ad / bc


The exposure-odds ratio for data obtained from cross-sectional or case–control studies is the ratio of the odds in favour of exposure among the cases (a/b) to the odds in favour of exposure among controls (c/d). This can be simplified as ad/bc (Table 1.21.7). The odds ratio obtained in a case–control study accurately estimates risk ratio when the outcome in the population being studied is rare.

TABLE 1.21.7

Two-By-Two Table for a Case–Control Study

Group Cases Controls
Exposed a b
Unexposed c d

The relative risk (also called the risk ratio) is the ratio of the risk of occurrence of a disease among exposed people to that among the unexposed. Table 1.21.8 demonstrates tabulation of data in a cohort study to obtain the relative risk.

Risk ratio = [ a / ( a + b ) ] / [ c / ( c + d ) ]


TABLE 1.21.8

Two-By-Two Table for a Cohort Study

Yes No
Exposed a b
Unexposed c d

The risk ratio (RR) provides an estimate of the strength of association between the exposure and outcome. RR > 1 indicates positive association between outcome and exposure, RR > 1 indicates positive association between outcome and exposure, RR < 1 indicates that a given exposure is associated with a decreased risk of the outcome. RR = 1 indicates that a given exposure is not associated with a risk of the outcome.

Attributable risk is the rate of an outcome in exposed individuals that can be attributed to the exposure. This is a more useful from a public health perspective.

Association and causation: An association is a measurable change in one variable that occurs simultaneously with changes in another variable. Statistical methods establish evidence of a relationship between exposure and outcome, which must be validated by clinical judgement and reasoning of the study in order to establish causation. Causation includes the elements of precedence, non-spuriousness and plausibility in addition to a strong association between the variables in question.

Data and descriptive statistics

A variable is a measure whose value can vary, e.g. age, gender, weight, liver size, vessel diameter, etc. The values of these variables can change from one individual to another. A dependent/outcome variable depends on variations in another variable. An independent/predictor variable can be manipulated to affect variations or responses in another variable. Covariates are independent variables that function along with predictors to influence outcomes and might be confounders and hence should be taken into account during the planning phase of the study.

Data are simply a collection of variables. Data and descriptive statistics form the building blocks for inferential statistics and are essential for choosing appropriate advanced analytical methods. Descriptive statistics provides cursory insight on almost any type of data and helps identify patterns which lead to generation of hypothesis.

Unorganized data are known as raw data. The data collected in studies can be very voluminous and be difficult to comprehend. In order to be meaningful and for further analysis, data have to be summarized using certain tools such as tables, graphs and measures of central tendency and dispersion as appropriate. This process of summarizing data is known as descriptive statistics. The data collected from studies can be analyzed further as depicted in Fig. 1.21.3.

Fig. 1.21.3 Data presentation and analysis.

Data collected from studies can be classified into two types: Quantitative data and qualitative data. Qualitative/Categorical data can be classified further as nominal, binary, ordinal. Quantitative data can be classified as discrete or continuous (Table 1.21.9).

TABLE 1.21.9

Types of Data

Data/ Variable Type Examples Best Measure of Central Tendency Best Measure of Dispersion
Qualitative Nominal More than two categories without meaningful order such as an imaging modality (US, CT, MR), ethnicity, months in a year, days of the week, etc.


Dichotomous/Binary Only two categories such as a disease being present/absent


Ordinal More than two categories with logical order, e.g. TNM tumour staging, USG echogenicity (anechoic, hypoechoic, hyperechoic), BI-RADS® score system, grading severity as mild, moderate and severe, Arnold–Hilgartner radiographic scale for assessment of severity of haemophilic arthropathy


Inter-quartile range

Quantitative Discrete Can take only a limited number of values. Usually involves countNumber of CT scans performed in 1 year, number of liver lesions



Standard deviation

Inter-quartile Range

Continuous Can take unlimited number of values, e.g. lesion size, organ volume, artery diameter, Signal to noise ratio (SNR), CT dose, time duration



Standard deviation

Quartile Range

Interval scale and Ratio scale are alternative classification of Quantitative data.

Interval scale: On an interval scale, the quantitative values have an order and there is meaningful difference between the variables. However, there is no clearly defined zero which is arbitrary. Ratio scale: It is similar to interval scale and, in addition, has a clearly defined zero. For example, temperature is not ratio data because one temperature cannot be twice as hot as another.

If the distribution of the data is normal, the data is described by calculating the mean and SD. Mean and standard deviation are pointless in describing qualitative (categorical) data as they not have meaningful numeric values. Qualitative data are usually described by using proportions and are summarized by frequency (in percentage). The measure of central tendency used for categorical data is the mode.

Numerical data has specific numbered values and may be classified as continuous or discrete variables. Continuous data can take all possible numerical value, e.g. weight can be 45, 45.1, 45.2, etc., whereas discrete variables are usually whole numbers. The same data can be measured using different scales. For example, The NASCET criteria (North American Symptomatic Carotid Endarterectomy Trial, 1991) commonly used for the classification of carotid artery stenosis has the following categories: less than 30% denotes mild stenosis; 30%–69% indicates moderate stenosis; greater than 70% denotes severe stenosis. Here percentage of occlusion, which is a continuous variable, is converted into three categories (i.e. ordinal data) – a different scale of measurement.

Descriptive statistics – displaying statistical data

Visual representation of data can be done using graphs/charts (Table 1.21.10), which can be classified into the following four categories based on their specific purpose:

  1. 1. Comparison – bar charts, line charts and their variants.
  2. 2. Relationship – scatter chart, bubble chart.
  3. 3. Composition – pie chart, stacked column chart, stacked area chart.
  4. 4. Distribution – histogram, line histogram, scatter chart, 3D area chart.

TABLE 1.21.10

Displaying Data

Bar chart

Used to depict comparison between categories of data, the gaps indicate that the data is categorical or discrete. The error bars are measures of uncertainty and may represent standard deviation/confidence interval or standard error – they should not be used to draw conclusions.

Fig. 1.21.4
Clustered/stacked bar chart

Used to represent multiple categories of data, e.g. male and female patients with varying grades of disease. They can also be used to denote multiple categories.

Fig. 1.21.5
Frequency table of aortic diameters in adults 20–40 years. Table 1.21.11

Similar to a bar chart, but there are no gaps between the bars as the data is continuous.

It is helpful in analyzing distribution of the data – whether normal distribution or not.

Fig. 1.21.6
Line diagram

Helps in visualization of the trends in data over intervals of time, e.g. number of monthly MRI examinations in a year.

Fig. 1.21.7
Box and Whisker plot

Box-and-whisker plots consist of a box with a horizontal dividing line representing the median. The top of the box represents the 75th percentile, bottom the 25th percentile (the first and third quartiles). The box represents the interquartile range. The range of values and outliers are also represented.

Fig. 1.21.8
Scatter diagram

Points show the relationship between two sets of continuous data.

Fig. 1.21.9
Pie chart

Used to represent numerical proportions of categorical data. The quantitative estimation for each category is proportional to the angle made at the centre of the circle and thus to the area of the sector.

Fig. 1.21.10
Forest Plot

A forest plot is a plot used to represent the list of studies in the systematic review.

The outcome of multiple studies are represented on the vertical axis and the outcome measure on the horizontal axis. The outcome measure might be odds or risk ratios. The data may be displayed by using a box-shape whose position represents the point estimate, with a horizontal line through it whose length represents the width of the 95% confidence interval for whatever outcome measure is being used. The area of each box is proportional to its sample size/power. The diamond at the bottom represents the combined result. If it does not cross the vertical line of no effect, then the study is statistically significant.

Fig. 1.21.11

TABLE 1.21.11

Frequency Table of Aortic Diameters in Adults 20-40 Years-shows Normal distribution.

Aortic Diameter (Frequency) Number of People
26.0–26.9 1
27.0–27.9 3
28.0–28.9 4
29.0–29.9 13
30.0–30.9 17
31.0–31.9 7
32.0–32.9 3
33.0–33.9 2

Other descriptive methods/measures include frequency distribution is a table that shows a body of data grouped according to numeric values. Frequency polygon is a graphic method of presenting a frequency distribution.

Range: It is the difference between the largest and smallest values. It is a measure of dispersion.

Percentile: Data points that cut the data into 100 segments. Quartile represents 25% of data and interquartile range (IQR) is the measure of dispersion that is equal to the difference between the 75th and 25th percentiles. IQR = Q3 – Q1.


A distribution is a representation of the frequencies of the values of a measurement obtained from specified groups. Out of all the possible values that the measurement can have, the distribution tells what proportion of the group was found to have each value (or each range of values).

Normal distribution

Distributions have certain properties as depicted in Table 1.21.12.

TABLE 1.21.12

Properties of a Distribution

Measures of Central Tendency Measures of Dispersion (to what degree are the observations spread or dispersed) Shape of the Distribution

Mean = sum of values/total number of values-most affected by outliers (extreme values)

Population mean is represented by µ

Sample mean is represented by Image, e.g. consider a set of CT attenuation values: 4, 5, 6, 7, 8, 10, 13, 13, 15, 17, 23

Mean = 121/11 = 11

Standard deviation (SD, σ) denotes variability in a set of values around the mean of the set. It is a range


n = The number of date points

Image = The mean of the xi

xi = Each of the values of the data

An asymmetric distribution also called a skewed distribution. The tail of the distribution determines the direction of skewness – tail to the right indicates positive skew, tail to the left indicates negative skew (Fig. 1.21.12)

Median = middle value of a sorted list from least to greatest value

Variance – square of the standard deviation


It is the extent to which a unimodal distribution is peaked

Mode = most repeated value.

Standard error – an estimate of variability that exists in a set of sample means around the true population mean

Standard error of the sample means SE (mean) = Image

Fig. 1.21.12 Skewness of a distribution.

The normal distribution (Fig. 1.21.13) commonly describes clinical data, and is symmetrical and bell-shaped, with mean, median and mode equal to each other. It is also called a Gaussian distribution. About two-thirds of the values under a normal distribution curve fall within one standard deviation (SD) of the mean, and approximately 95% fall within two standard deviations of the mean. Normal Distributions have tails which get closer to the horizontal axis, but never touch it.

Fig. 1.21.13 A Normal Distribution with standard deviations represented in the horizontal axis and probability distribution in the vertical axis.

It takes two parameters to specify an individual: Normal Distribution – the Mean, μ, and the Standard Deviation, σ. The Standard Normal Distribution (whose Test Statistic is z) has μ = 0 and σ = 1.

There is an empirical rule that the cumulative probabilities bounded by standard deviations are the same for all normal distributions – roughly 68%, 95% and 99.7% for 1, 2 and 3 standard deviations, respectively.

Normal Distributions are by far the most common type of distribution one encounters in statistics and in our daily lives. They are common in natural and human physiological processes that are influenced by many small and unrelated random effects. Some examples include height or weight of individuals of the same gender, test scores and blood pressure. Other distributions include binomial, t-distribution, chi-square, Poisson, exponential, lognormal, uniform distribution, etc.

Properties of Diagnostic Tests: Diagnostic studies examine the ability of diagnostic tests to discriminate correctly between patients with and without particular medical conditions. Properties of a diagnostic test can be broadly classified under two categories:

  1. 1. Validity/Accuracy
  2. 2. Reliability/Precision.

The ideal diagnostic test should be accurate as well as precise (Fig. 1.21.14A).

Fig. 1.21.14 The concept of accuracy and precision can be illustrated using the following figure: (A) Accurate Precise, (B) Precise, Inaccurate, (C) Accurate, Imprecise, (D) Inaccurate, Imprecise.

Reliability and validity can be illustrated by the four scenarios using a target. The centre of the target denotes the true value and the stars represent the measurements being tested. Scenario (a) shows a measurement that is accurate and precise, (b) shows a measurement that is precise but inaccurate; scenario (c) shows a measurement that is accurate but imprecise and (d) shows a measurement that is both inaccurate and imprecise.


Validity, also called accuracy, is the ability of a test to correctly identify patients with the disease and those without the disease (Table 1.21.16).

Face validity evaluates whether the test appears to measure the concept. As an example, an MR study of the abdomen will not facilitate a diagnosis for seizure. Predictive validity refers to the ability of a test to correctly predict (or correlate with) an outcome (e.g. an infiltrative lesion and malignancy on subsequent histopathology). Content validity is the extent to which the indicator reflects the full area of interest (e.g. extra-dural haemorrhage resolution might be indicated by haemorrhage height, width or both). Construct validity is the extent to which one measure correlates with other measures of the same concept (e.g. correlation of positive MR study for demyelination with clinical findings).

Disease status is usually determined by a reference standard, which is expected to be the best existing measure for determining accurate disease status. Usually tissue diagnosis or an invasive investigation such as angiography are reference standards. Diagnostic accuracy is determined by comparing the test results to the appropriate reference standard.

Sensitivity and specificity are intrinsic measures of diagnostic accuracy, i.e. they are not affected by prevalence of the disease. However, disease factors, like the grade of severity, anatomic features (e.g. smaller lesions are less likely to be detectable by imaging) and patient factors (e.g. the efficacy of transabdominal USG in evaluating uterine and adnexal lesions is influenced by the degree of bladder filling) influence the sensitivity and specificity. For this reason, the sensitivity and specificity of individual diagnostic tests should not be considered absolute and should be considered in light of both factors.

Example: The sensitivity (95% Confidence Interval within brackets) of ultrasonography (USG) for the diagnosis of appendicitis in children is 85% (82%, 90%). That is, the ability of the USG to classify children as positive when the child really has appendicitis is about 85% on average. Similarly, the specificity is 94% (92%, 96%). That is, the ability of the USG to identify children as negative when the child does not have appendicitis is about 94% on average.

Sensitivity and specificity estimation can be performed for tests with ordinal or continuous results, only after a decision threshold (termed a cut-off point). A cut-off is necessary to separate positive from negative results. For example, a mammogram may be interpreted as normal, benign, probably benign, suspicious or malignant. A positive test result may be defined anywhere along this spectrum, other than normal. A cut-off is decided by the research context. A screening test will have threshold values set for more sensitivity (a higher decision threshold), while a confirmatory/diagnostic test will have cut-offs set for more specificity (a lower decision threshold).

The sensitivity and specificity of a diagnostic test with binary (positive or negative) results are represented by a 2 × 2 contingency table, where columns show test results and the rows show the disease status. The disease status may be assessed using a reference standard. The table has four cells:

  1. 1. True-positive (TP) is the number of individuals who actually have the disease and a positive test result.
  2. 2. False-positive (FP) is the number of individuals who have a positive test result, but truly are free of the disease.
  3. 3. False-negative (FN) is the number of individuals with the disease, but a negative test result.
  4. 4. True-negative (TN) is the number of individuals without the disease and have a negative test result.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Mar 25, 2024 | Posted by in CARDIOVASCULAR IMAGING | Comments Off on Research methodology and biostatistics

Full access? Get Clinical Tree

Get Clinical Tree app for offline access