top of page

Biostatistics and Evidence-Based Medicine: A Comprehensive Study Guide

  • Writer: Dr Lavanya Narayanan
    Dr Lavanya Narayanan
  • Aug 3
  • 19 min read
Section 1: Fundamentals of Biostatistics: Data and Descriptive Measures

1.1 Defining and Differentiating Measures of Central Tendency and Dispersion


Measures of Central Tendency: These are single values that represent the center of a dataset.

  • Mean: The arithmetic average. Calculated by summing all values and dividing by the count. Sensitive to outliers.

  • Median: The middle value in an ordered dataset. Robust to outliers and skewed data.

  • Mode: The most frequently occurring value. Useful for categorical data.


Measures of Dispersion: These quantify the spread or variability of data points.

  • Range: Difference between maximum and minimum values. Limited utility as it only considers extremes.

  • Interquartile Range (IQR): The range of the middle 50% of data (Q3 - Q1). Robust to outliers.

  • Variance: Average squared deviation from the mean. Units are squared, making direct interpretation difficult.

  • Standard Deviation (SD): Square root of the variance. Returns spread to original units. Most common measure for normally distributed data.


1.2 A Taxonomy of Data: Understanding Categorical and Numerical Variables


Categorical (Qualitative) Data: Describes qualities or attributes.

  • Nominal Data: Categories with no intrinsic order (e.g., blood type, gender). Dichotomous variables are a special type with two categories.

  • Ordinal Data: Categories with a meaningful order, but unequal intervals between ranks (e.g., pain scales, cancer stages).


Numerical (Quantitative) Data: Represents measurable quantities.

  • Discrete Data: Countable, distinct values, typically whole numbers (e.g., number of hospital admissions).

  • Continuous Data: Can take any value within a range, including decimals; measured on a continuous scale (e.g., height, blood pressure).


1.3 The Whole and The Part: Distinguishing Populations from Samples


  • Population: The entire group of individuals, items, or events a researcher wants to study. Described by parameters.

  • Sample: A smaller, manageable subset of the population from which data is collected. Described by statistics.

  • Inferential Statistics: Using sample data to make conclusions about a population.

  • Sampling Error: The difference between a sample statistic and the true population parameter. Minimized by increasing sample size.


1.4 Analytical Application: Selecting the Appropriate Measure of Central Tendency in Clinical Datasets

  • Symmetrical (Normally Distributed) Numerical Data: Mean is preferred as it's comprehensive.

  • Skewed Numerical Data: Median is preferred as it's robust to outliers. (e.g., hospital length of stay, income).

  • Ordinal Data: Median is most suitable as it identifies the middle rank without assuming equal intervals.

  • Nominal Data: Mode is the only applicable measure, identifying the most frequent category.

Section 2: The Normal Distribution in Clinical Data

2.1 Properties of the Gaussian (Normal) Distribution


  • Shape and Symmetry: Symmetrical, bell-shaped curve. 50% of values below the mean, 50% above.

  • Measures of Central Tendency: Mean, median, and mode are all equal and at the center.

  • Defining Parameters: Completely described by its mean (μ) (location) and standard deviation (σ) (spread).

  • Area Under the Curve: Total area equals 1 (100% of outcomes).

Normal distribution curve illustrating the empirical rule, showing that approximately 68.3% of data falls within one standard deviation, 95.4% within two, and 99.7% within three standard deviations from the mean.
Normal distribution curve illustrating the empirical rule, showing that approximately 68.3% of data falls within one standard deviation, 95.4% within two, and 99.7% within three standard deviations from the mean.

2.2 The Empirical Rule (68-95-99.7): A Practical Guide


For a normal distribution:

  • 68% of data lies within ±1 SD of the mean.

  • 95% of data lies within ±2 SD of the mean. (Commonly used for "normal" or reference ranges in clinical labs).

  • 99.7% of data lies within ±3 SD of the mean.


Clinical Implication: 5% of healthy individuals will have results outside the statistically derived 95% "normal" range. Results must be interpreted in clinical context.


2.3 Applied Biostatistics: Calculating Population Ranges from Mean and Standard Deviation


Example: Mean SBP = 120 mmHg, SD = 10 mmHg. 95% range = 120 ± 2(10) = 100 to 140 mmHg.


2.4 Analyzing Distributions: Differentiating Normal vs. Skewed Data

  • Normal Distribution: Mean ≈ Median (and Mode).

  • Positive Skew (Right-skewed): Long tail to the right; Mean > Median. Outliers pull the mean up (e.g., income, hospital stay).

  • Negative Skew (Left-skewed): Long tail to the left; Mean < Median. Outliers pull the mean down.


Section 3: Evaluating Diagnostic and Screening Tests

3.1 Core Metrics of Test Accuracy: Sensitivity, Specificity, PPV, and NPV


2×2 Contingency Table and Test Characteristics


Disease Present

Disease Absent

Total

Test Positive

a (True Positive)

b (False Positive)

a + b

Test Negative

c (False Negative)

d (True Negative)

c + d

Total

a + c

b + d

a + b + c + d

Diagnostic Accuracy Measures

Measure

Formula

Interpretation

Sensitivity

a / (a + c)

True Positive Rate: Ability to correctly identify those with the disease. “SnOUT” – Sensitive test, Negative result rules OUT.

Specificity

d / (b + d)

True Negative Rate: Ability to correctly identify those without the disease. “SpIN” – Specific test, Positive result rules IN.

Positive Predictive Value (PPV)

a / (a + b)

Probability that a person with a positive test truly has the disease.

Negative Predictive Value (NPV)

d / (c + d)

Probability that a person with a negative test truly does not have the disease.

3.2 The Influence of Prevalence on Predictive Values

  • Prevalence: Proportion of individuals with the disease in a population (pre-test probability).

  • Relationship:As prevalence increases, PPV increases, and NPV decreases.

  • As prevalence decreases, PPV decreases, and NPV increases.

  • Clinical Importance: A highly accurate test can have low PPV in low-prevalence settings due to many false positives. Test utility depends on pre-test probability.


3.3 Practical Application: Calculating Test Metrics from a 2x2 Contingency Table


Influenza Positive (by gold standard)

Influenza Negative

Total

Test Positive

True Positive (TP) = 80

False Positive (FP) = 20

100

Test Negative

False Negative (FN) = 10

True Negative (TN) = 90

100

Total

90

110

200

📊 Calculations:

  • Sensitivity = TP / (TP + FN)= 80 / (80 + 10) = 88.9%

  • Specificity = TN / (TN + FP)= 90 / (90 + 20) = 81.8%

  • Positive Predictive Value (PPV) = TP / (TP + FP)= 80 / (80 + 20) = 80.0%

  • Negative Predictive Value (NPV) = TN / (TN + FN)= 90 / (90 + 10) = 90.0%


3.4 The Receiver Operating Characteristic (ROC) Curve: Interpreting Overall Diagnostic Accuracy

  • Plots: Sensitivity (y-axis) vs. 1 - Specificity (False Positive Rate) (x-axis) across all possible cut-off points.

  • Interpretation: Closer to the upper-left corner (100% sensitivity, 0% FPR) indicates better accuracy.

  • Area Under the Curve (AUC): Single metric of overall diagnostic performance.

  • Ranges from 0.5 (chance) to 1.0 (perfect discrimination).

  • AUC is independent of disease prevalence.


3.5 Evaluative Framework: Justifying Test Selection (SnOUT vs. SpIN)

  • SnOUT (High Sensitivity, Negative rules OUT): Choose when the cost of missing the disease (false negative) is high (e.g., screening for a dangerous, treatable condition like PE with D-dimer).

  • SpIN (High Specificity, Positive rules IN): Choose when the cost of a false diagnosis (false positive) is high (e.g., confirmatory test for cancer with a biopsy).


Section 4: A Primer on Common Statistical Tests

4.1 An Overview of Common Statistical Tests and Their Purpose

  • T-test: Compares the means of exactly two groups.

  • ANOVA (Analysis of Variance): Compares the means of three or more groups. (Requires post-hoc tests to identify specific differences).

  • Chi-squared (χ²) Test: Analyzes categorical data to determine association between two categorical variables.

  • Correlation (e.g., Pearson's r): Measures strength and direction of linear relationship between two continuous variables (r ranges from -1 to +1).


4.2 Parametric vs. Non-Parametric Tests: Assumptions and Use Cases

  • Parametric Tests:Assume data are normally distributed, homogeneity of variances, independence.

  • Used for continuous data. Examples: t-test, ANOVA.

  • More powerful if assumptions are met.

  • Non-Parametric Tests (Distribution-free):Used when parametric assumptions are violated (non-normal, skewed, ordinal data).

  • Robust to outliers, often work with ranks.

  • Alternatives: Mann-Whitney U (independent t-test), Wilcoxon Signed-Rank (paired t-test), Kruskal-Wallis (ANOVA), Spearman's Correlation (Pearson's).


4.3 Paired vs. Unpaired Tests: Comparing Related and Independent Groups

  • Unpaired (Independent) Test: Compares two separate, unrelated groups (e.g., Drug A group vs. Placebo group).

  • Paired Test: Compares related measurements.

  • "Before-and-After" studies (same individuals measured twice).

  • Matched-Pairs studies (individuals in groups are matched).


4.4 Applied Analysis: Selecting the Correct Statistical Test from a Study Abstract


🧘‍♀️ Example: Mindfulness Program and SBP (Paired T-test)


🎯 Research Question:

Does participation in a 4-week mindfulness program significantly reduce systolic blood pressure (SBP) in hypertensive patients?


🧪 Study Design:

  • Before-and-after measurements on the same participants

  • Sample size: 10 hypertensive patients

  • Outcome: Systolic BP before and after the program


Participant

SBP Before (mmHg)

SBP After (mmHg)

Difference (Before - After)

1

148

138

10

2

142

135

7

3

150

140

10

4

145

138

7

5

160

148

12

6

155

142

13

7

149

140

9

8

152

144

8

9

151

140

11

10

153

145

8

🧮 Paired T-test Applied:

  • Null Hypothesis (H₀):There is no mean difference in SBP before and after the mindfulness program.

  • Alternative Hypothesis (H₁):The mindfulness program reduces SBP (mean difference > 0).

  • Mean of differences (d̄): 9.5 mmHg

  • Standard deviation (SD): ~2.0

  • T statistic: Calculated using paired t-test formula

  • p-value: Typically < 0.001 (based on software or table)


Conclusion:Since p < 0.05, we reject H₀ and conclude that the mindfulness program significantly reduced systolic BP.


4.5 Deconstructing a Clinical Vignette: Identifying Variables to Determine the Correct Statistical Test


Clinical Scenario:

A hospital is conducting a study to evaluate whether the type of ward (Medical vs Surgical) affects the incidence of hospital-acquired pneumonia (HAP).

They collect data over 6 months:


Developed HAP

Did Not Develop HAP

Total

Medical Ward

30

120

150

Surgical Ward

20

130

150

Total

50

250

300

Step-by-Step Deconstruction:

  1. Identify the Independent Variable (IV):

    • Ward Type (Categorical: Medical vs Surgical)

  2. Identify the Dependent Variable (DV):

    • Occurrence of HAP (Categorical: Yes or No)

  3. Nature of Both Variables:

    • Categorical (Nominal) × Categorical (Nominal)

  4. Statistical Test:

    • Chi-squared test: Used to determine if there is a significant association between ward type and HAP occurrence.


Interpretation:

If the p-value < 0.05, we reject the null hypothesis and conclude that the type of ward is significantly associated with hospital-acquired pneumonia.


Summary Table for Test Selection:

Research Goal

Independent Variable Type

Dependent Variable Type

Number of Groups / Conditions

Parametric Test

Non-Parametric Alternative

Compare 1 mean to hypothesized value

N/A

Numerical

1

One-Sample T-test

One-Sample Wilcoxon Signed-Rank Test

Compare means of 2 independent groups

Categorical (2 levels)

Numerical

2 (Independent)

Independent T-test

Mann-Whitney U Test

Compare means of 2 related measurements

Categorical (2 levels)

Numerical

2 (Paired)

Paired T-test

Wilcoxon Signed-Rank Test

Compare means of 3+ independent groups

Categorical (3+ levels)

Numerical

3+ (Independent)

One-Way ANOVA

Kruskal-Wallis Test

Assess association between 2 categorical variables

Categorical

Categorical

N/A

Chi-Squared (χ²) Test

Fisher's Exact Test (for small samples)

Assess linear relationship between 2 numerical variables

Numerical

Numerical

N/A

Pearson Correlation

Spearman Correlation

5.1 The Language of Inference: Defining P-Value, Confidence Interval, and Errors (Type I & II)

  • P-value: Probability of observing a result as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true (no effect). p < 0.05 is typically significant.

  • Confidence Interval (CI): A range of values likely to contain the true population parameter. 95% CI is common.

  • Magnitude of effect: The range itself.

  • Precision of estimate: Narrow CI = precise; Wide CI = imprecise.

  • Preferred over p-values as they provide more information.

  • Type I Error (α): False positive; incorrectly rejecting a true null hypothesis (concluding an effect exists when it doesn't). Probability = α (e.g., 0.05).

  • Type II Error (β): False negative; failing to reject a false null hypothesis (concluding no effect exists when one does).


5.2 Understanding Statistical Power: The Probability of Detecting a True Effect

  • Definition: The probability a statistical test will correctly detect a true effect when one exists (1 - β).

  • Adequate Power: Generally ≥ 80% (meaning ≤ 20% risk of Type II error).

  • Determinants:Sample Size: Larger sample = higher power.

  • Effect Size: Larger true effect = higher power.

  • Significance Level (α): Higher α increases power but also Type I error risk.


5.3 Quantifying Risk and Benefit: Calculating ARR, RR, OR, and NNT

2×2 Table: Intervention vs. Control for Outcome


Outcome Occurs

Outcome Does Not Occur

Total

Intervention Group

a

b

a + b

Control Group

c

d

c + d

Total

a + c

b + d

a + b + c + d

Relative Risk (RR) Formula

RR=a/(a+b)c/(c+d)RR = \frac{a / (a + b)}{c / (c + d)}RR=c/(c+d)a/(a+b)​

  • RR = 1: No difference between intervention and control

  • RR < 1: Intervention reduces risk of outcome

  • RR > 1: Intervention increases risk of outcome


Odds Ratio (OR): (a/b) / (c/d) = (ad) / (bc)

= Used in case-control studies; approximates RR for rare outcomes.

Absolute Risk Reduction (ARR): [c/(c+d)] - [a/(a+b)]

= Absolute difference in risk between groups.

Number Needed to Treat (NNT): 1 / ARR

= Number of patients to treat to prevent one additional adverse outcome.


5.4 Analyzing Trial Results: Determining Statistical Significance from P-Values and Confidence Intervals

  • P-value: Statistically significant if p < 0.05.

  • Confidence Interval: Statistically significant if the CI does NOT include the null value.

  • Null value for differences (ARR, mean difference): 0

  • Null value for ratios (RR, OR): 1


5.5 The Evaluative Conclusion: Differentiating Statistical Significance from Clinical Significance

  • Statistical Significance: Reliability of the result against chance (influenced by sample size).

  • Clinical Significance (Clinical Importance): Magnitude of the effect and its practical relevance to patients.

  • Pitfall: A large study can find a statistically significant but clinically trivial effect. A small, underpowered study might miss a clinically important effect that is not statistically significant.

  • Recommendation: Focus on effect size and its confidence interval (CI) for comprehensive interpretation, as CI provides both magnitude and precision.


Section 6: Understanding Epidemiological Study Designs

6.1 A Catalogue of Study Designs: From Case-Control to RCTs

  • Cross-Sectional Study: Observational "snapshot" at one point in time. Measures prevalence. Cannot infer causality due to lack of temporality.

  • Case-Control Study: Retrospective observational. Starts with outcome (cases vs. controls), looks back at past exposures. Efficient for rare diseases. Susceptible to recall and selection bias.

  • Cohort Study: Observational, follows groups over time.

  • Prospective: Identifies exposure in present, follows into future for outcomes. Establishes temporality.

  • Retrospective: Uses historical data to identify past exposure and current outcomes.

  • Strongest observational for aetiology/prognosis. Vulnerable to loss to follow-up bias.

  • Randomised Controlled Trial (RCT): Experimental, gold standard for interventions. Randomly assigns participants to intervention/control. Minimizes bias, allows strong causal inference.


6.2 The Hierarchy of Evidence: Ranking Study Designs by Rigor

  • Peak: Systematic Reviews & Meta-Analyses (of RCTs) - Most reliable.

  • High: Randomised Controlled Trials (RCTs) - Strongest for cause-effect of interventions.

  • Middle: Observational Studies (Cohort > Case-Control > Cross-Sectional).

  • Base: Anecdotal Evidence & Expert Opinion (Case Series, Case Reports) - Weakest.


6.3 Key Concepts in Randomized Controlled Trials: Randomisation, Allocation Concealment, Blinding, and Intention-to-Treat Analysis

  • Randomisation: Chance-based assignment to balance baseline characteristics, minimizing selection bias and confounding.

  • Allocation Concealment: Preventing investigators from knowing the next assignment before enrollment. Prevents selection bias.

  • Blinding (Masking): Withholding treatment allocation knowledge after randomization. Prevents performance and ascertainment bias.

  • Single: Participant unaware.

  • Double: Participant & investigator unaware.

  • Triple: Participant, investigator, & data analyst unaware.

  • Intention-to-Treat (ITT) Analysis: Analyze all participants in their originally randomized group, regardless of adherence or dropout. Preserves randomization's balance, provides pragmatic estimate.


6.4 Applied Appraisal: Identifying Study Designs from Research Abstracts

(Example provided in source: Burnout survey - Cross-Sectional Study).


6.5 Analyzing Methodological Flaws: Identifying Potential Sources of Bias

  • Case-Control Studies: Recall Bias (differential memory of exposure), Selection Bias (unrepresentative control group).

  • Cohort Studies: Selection Bias (uncomparable exposed/unexposed groups), Loss to Follow-up Bias (differential dropout).

Note: Quality of execution matters more than just design type.


6.6 The Final Evaluation: Justifying the Optimal Study Design for a Clinical Question

  • Therapy/Intervention Efficacy: RCT (strongest for causality).

  • Aetiology of Common Disease/Prognosis: Prospective Cohort Study (establishes temporality).

  • Aetiology of Rare Disease/Rare Exposure: Case-Control Study (most efficient).


Study Design

Primary Use

Starting Point

Time Direction

Strengths

Limitations

Randomized Controlled Trial (RCT)

Evaluate therapy/intervention efficacy

Assign intervention and control groups

Forward (prospective)

- Strongest for causality   - Minimizes bias/confounding

- Expensive, time-consuming - May lack generalizability - Ethical limits

Prospective Cohort Study

Assess aetiology of common diseases and prognosis

Start with exposed and unexposed groups

Forward (prospective)

- Establishes temporality  - Good for multiple outcomes

- Inefficient for rare diseases - Loss to follow-up possible

Case-Control Study

Assess aetiology of rare diseases or rare exposures

Start with diseased and non-diseased groups

Backward (retrospective)

- Most efficient for rare diseases  - Quick and inexpensive

- Prone to recall and selection bias  - Cannot calculate incidence directly

Cross-Sectional Study

Assess prevalence, explore associations

Snapshot of population at one point

None (simultaneous)

- Fast, inexpensive - Useful for hypothesis generation

- Cannot infer causality or temporality

Ecological Study

Generate hypotheses at population level

Group-level data

Variable

- Quick, uses existing data - Useful for public health trends

- Ecological fallacy risk - No individual-level data


Quiz: Biostatistics and Evidence-Based Medicine

Answer each question in 2-3 sentences.

  1. Explain why the median is often a more appropriate measure of central tendency than the mean for skewed numerical data, providing a clinical example.

  2. Differentiate between nominal and ordinal data, providing a unique clinical example for each.

  3. In the context of the empirical rule, what does it mean for a clinical laboratory test to define its "normal range" as values within two standard deviations of the mean?

  4. A new diagnostic test has very high sensitivity. If a patient tests negative with this test, what conclusion can a clinician draw, and what mnemonic supports this?

  5. Explain how disease prevalence influences the positive predictive value (PPV) of a diagnostic test.

  6. Describe the primary purpose of an ANOVA test and explain when it would be used instead of a t-test.

  7. What is the key difference between an "unpaired" and a "paired" statistical test, and when would each be applied?

  8. Define a p-value and explain its conventional threshold for statistical significance in medical research.

  9. Why is a confidence interval generally considered more informative than a p-value when interpreting study results?

  10. What is the main advantage of a Randomised Controlled Trial (RCT) over observational study designs like cohort studies for evaluating treatment efficacy?


Quiz Answer Key
  1. The median is preferred for skewed numerical data because it is not significantly affected by extreme values (outliers), unlike the mean. For example, in a dataset of hospital length of stay, a few patients with very long stays due to complications would inflate the mean, making the median a better representation of the typical patient's stay.

  2. Nominal data consists of categories without any intrinsic order or ranking, such as blood type (A, B, AB, O). Ordinal data, in contrast, has categories with a meaningful order, but the intervals between them are not necessarily equal, like a patient satisfaction rating scale from "very dissatisfied" to "very satisfied."

  3. Defining a "normal range" within two standard deviations of the mean means that approximately 95% of healthy individuals are expected to fall within this range, assuming the variable is normally distributed. This implies that 5% of perfectly healthy individuals will still have results outside this statistically defined normal range, warranting clinical interpretation rather than automatic pathology.

  4. If a patient tests negative with a test that has very high sensitivity, a clinician can be very confident that the patient does not have the disease. This is supported by the mnemonic "SnOUT," which stands for: if a test has high Sensitivity, a Negative result effectively rules OUT the disease.

  5. As disease prevalence increases, the positive predictive value (PPV) of a diagnostic test also increases. This is because in a high-prevalence setting, a greater proportion of positive test results are likely to be true positives, making the test more useful for confirming the presence of the disease. Conversely, in low-prevalence settings, PPV decreases.

  6. The primary purpose of an ANOVA test is to compare the means of three or more groups to determine if there is a statistically significant difference among them. It would be used instead of a t-test when the research question involves comparing more than two groups, as a t-test is strictly limited to two-group comparisons.

  7. An "unpaired" test compares two separate, independent groups where observations in one group have no direct relationship to the other, such as comparing the blood pressure of two different sets of patients. A "paired" test, however, is used when measurements are related, typically from the same individuals measured before and after an intervention, or from matched individuals.

  8. A p-value is the probability of observing results as extreme as, or more extreme than, those obtained in a study, assuming the null hypothesis (no effect/difference) is true. Conventionally, a p-value of less than 0.05 (p<0.05) is used as the threshold for statistical significance, meaning there's less than a 5% chance the result occurred by random chance.

  9. A confidence interval (CI) provides more information than a p-value because it not only indicates statistical significance (if it doesn't cross the null value) but also quantifies the plausible range for the magnitude of the effect. This allows clinicians to assess both the reliability and the clinical importance (precision) of the finding, unlike a p-value which only provides a dichotomous "significant/not significant" answer.

  10. The main advantage of a Randomised Controlled Trial (RCT) is its use of random assignment, which helps to balance both known and unknown confounding factors between the intervention and control groups. This minimizes bias and allows for the strongest causal inference, making RCTs the gold standard for evaluating treatment efficacy.

Glossary of Key Terms
  • Absolute Risk Reduction (ARR): The absolute difference in the risk of an event between an intervention group and a control group.

  • Allocation Concealment: The process of preventing researchers and participants from knowing which treatment arm a participant will be assigned to before they are enrolled in a study, to prevent selection bias.

  • ANOVA (Analysis of Variance): A statistical test used to compare the means of three or more groups.

  • Area Under the Curve (AUC): A measure of the overall diagnostic accuracy of a test, derived from the Receiver Operating Characteristic (ROC) curve. Ranges from 0.5 (no discrimination) to 1.0 (perfect discrimination).

  • Blinding (Masking): The process of keeping study participants, investigators, or outcome assessors unaware of treatment assignments after randomization to reduce bias.

  • Case-Control Study: A retrospective observational study design that starts by identifying individuals with an outcome (cases) and a comparison group without the outcome (controls), then looks back to compare past exposures.

  • Categorical Data (Qualitative Data): Data that describe qualities or characteristics, sorted into distinct categories rather than measured numerically. Includes nominal and ordinal data.

  • Chi-squared (χ²) Test: A statistical test used to determine if there is a statistically significant association between two categorical variables.

  • Clinical Significance: The practical importance or magnitude of a treatment effect, referring to whether a finding makes a real, palpable, and noticeable difference in a patient's life or clinical practice.

  • Cohort Study: An observational study design that follows a group of individuals (a cohort) over time, often classified by exposure status, to observe the incidence of an outcome. Can be prospective or retrospective.

  • Confidence Interval (CI): A range of values calculated from sample data that is likely to contain the true, unknown value of a population parameter. Provides an estimate of both the magnitude and precision of an effect.

  • Continuous Data: A type of numerical data that can take any value within a given range, including fractions and decimals, and is measured on a continuous scale (e.g., height, weight).

  • Correlation: A statistical measure (e.g., Pearson's r) that quantifies the strength and direction of the linear relationship between two continuous variables.

  • Cross-Sectional Study: An observational study design that collects data on both exposure and outcome simultaneously from a population at a single point in time, providing a "snapshot."

  • Descriptive Statistics: Statistical methods used to describe and summarize the key characteristics of a dataset, including measures of central tendency and dispersion.

  • Dichotomous Variable: A special type of nominal variable that has only two possible categories (e.g., yes/no, alive/dead).

  • Discrete Data: A type of numerical data consisting of distinct, separate, and countable values, typically whole numbers (e.g., number of hospital admissions).

  • Empirical Rule (68-95-99.7 Rule): A rule stating that for a normal distribution, approximately 68% of data falls within 1 standard deviation, 95% within 2 standard deviations, and 99.7% within 3 standard deviations of the mean.

  • Evidence-Based Medicine (EBM): The conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.

  • False Negative (FN): A test result that indicates the absence of a condition when it is actually present.

  • False Positive (FP): A test result that indicates the presence of a condition when it is actually absent.

  • Gold Standard: A definitive and accepted method for accurately determining the true disease status of a patient, used as a comparison for new diagnostic tests.

  • Hierarchy of Evidence: A conceptual model that ranks different study designs according to their methodological rigor and ability to minimize bias, typically depicted as a pyramid.

  • Inferential Statistics: Statistical methods used to draw conclusions or make inferences about a larger population based on data collected from a smaller sample.

  • Intention-to-Treat (ITT) Analysis: A principle in Randomized Controlled Trials where all participants are analyzed in the group to which they were originally randomized, regardless of whether they completed the treatment or adhered to the protocol.

  • Interquartile Range (IQR): A measure of dispersion representing the middle 50% of a dataset, calculated as the difference between the upper quartile (Q3) and the lower quartile (Q1).

  • Mean: The arithmetic average of a dataset, calculated by summing all values and dividing by the total number of values. Highly sensitive to outliers.

  • Measures of Central Tendency: Single values that attempt to identify the central position within a set of data (mean, median, mode).

  • Measures of Dispersion: Statistics that quantify the extent to which individual data points deviate from the central value and from each other (range, IQR, variance, standard deviation).

  • Median: The middle value in a dataset that has been arranged in order of magnitude. Robust to outliers and skewed data.

  • Mode: The value that appears most frequently in a dataset.

  • Nominal Data: A type of categorical data where categories have no intrinsic order or quantitative value (e.g., blood type, gender).

  • Normal Distribution (Gaussian Distribution/Bell Curve): A symmetrical, bell-shaped probability distribution widely used in statistics, characterized by its mean and standard deviation.

  • Number Needed to Treat (NNT): A highly practical measure representing the number of patients who need to be treated with an intervention for a specific time to prevent one additional adverse outcome. It is the reciprocal of the ARR.

  • Numerical Data (Quantitative Data): Data that represent measurable quantities and are expressed as numbers, suitable for arithmetic operations. Includes discrete and continuous data.

  • Odds Ratio (OR): A measure of association comparing the odds of an outcome occurring in one group to the odds of it occurring in another group. Commonly used in case-control studies.

  • Ordinal Data: A type of categorical data where categories have a natural, meaningful order or rank, but the intervals between the ranks are not necessarily equal (e.g., pain scales, cancer stages).

  • Outlier: An extreme value in a dataset that is significantly different from other values and can disproportionately influence measures like the mean.

  • Paired Test: A statistical test used when comparing related measurements, such as before-and-after measurements on the same individuals or measurements from matched pairs.

  • Parameter: A numerical characteristic that describes an entire population (e.g., population mean, μ).

  • Parametric Tests: Statistical tests that make certain assumptions about the population distribution from which the sample was drawn, typically assuming normal distribution (e.g., t-test, ANOVA).

  • Population: The entire group of individuals, items, or events that a researcher wants to study and about which they wish to draw conclusions.

  • Positive Predictive Value (PPV): The probability that a patient who tests positive actually has the disease.

  • Prevalence: The proportion of individuals in a population who have a particular disease or condition at a specific point in time.

  • P-value: The probability of obtaining a result as extreme as, or more extreme than, the one observed in a study, assuming the null hypothesis is true.

  • Randomised Controlled Trial (RCT): An experimental study design considered the gold standard for evaluating interventions, where participants are randomly assigned to intervention or control groups.

  • Randomization: The process of assigning participants to different treatment groups in a study using a method based on chance, to ensure groups are comparable and minimize bias.

  • Range: The simplest measure of dispersion, defined as the difference between the highest and lowest values in a dataset.

  • Receiver Operating Characteristic (ROC) Curve: A graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied, plotting sensitivity against (1 - specificity).

  • Relative Risk (RR): A measure of association comparing the risk of an outcome in an intervention group to the risk in a control group. It is the ratio of the two risks.

  • Sample: A specific, smaller, and manageable subset of the population from which data are actually collected.

  • Sampling Error: The difference between a sample statistic and the true population parameter, due to the fact that a sample is only a part of the whole population.

  • Sensitivity (True Positive Rate): The ability of a diagnostic test to correctly identify those individuals who truly have the disease. Often remembered by "SnOUT" (sensitive negative rules out).

  • Skewed Distribution: An asymmetrical data distribution where the tail of the distribution is longer on one side than the other (e.g., positive/right-skew, negative/left-skew).

  • Specificity (True Negative Rate): The ability of a diagnostic test to correctly identify those individuals who truly do not have the disease. Often remembered by "SpIN" (specific positive rules in).

  • Standard Deviation (SD): The most common measure of dispersion for normally distributed data, representing the typical or average distance of individual data points from the mean. It is the square root of the variance.

  • Statistical Power: The probability that a statistical test will correctly detect a true effect when one actually exists (1 - Type II error probability).

  • Statistic: A numerical characteristic that describes a sample (e.g., sample mean, xˉ).

  • Statistical Significance: An indication that an observed result is unlikely to have occurred by random chance, typically determined by a p-value below a certain threshold (e.g., p<0.05).

  • Systematic Review: A rigorous process of identifying, appraising, and synthesizing all available evidence on a specific research question. A meta-analysis may statistically combine the results.

  • T-test: A statistical test used to compare the means of exactly two groups.

  • True Negative (TN): A test result that correctly indicates the absence of a condition when it is truly absent.

  • True Positive (TP): A test result that correctly indicates the presence of a condition when it is truly present.

  • Type I Error (α): The error of incorrectly rejecting a true null hypothesis (a false positive). The probability of this error is denoted by alpha.

  • Type II Error (β): The error of failing to reject a false null hypothesis (a false negative). The probability of this error is denoted by beta.

  • Unpaired Test: A statistical test used when comparing two separate, independent, and unrelated groups.

  • Variance: A measure of dispersion that quantifies the average squared deviation of each data point from the mean.


bottom of page