Box plots

Video masterclass

Topic summary

Boxplots are graphical representations of a dataset’s distribution. They display key statistics, including the minimum, lower quartile (\(Q_1\)), median, upper quartile (\(Q_3\)), and maximum, along with any outliers. Boxplots are useful for visualising and comparing distributions.

1. Drawing a Boxplot:

To construct a boxplot, follow these steps:

  1. Organise the data: Arrange the data in ascending order and calculate the five key values:
    • Minimum: The smallest value (excluding outliers).
    • \(Q_1\): The lower quartile (25th percentile).
    • Median: The middle value (50th percentile).
    • \(Q_3\): The upper quartile (75th percentile).
    • Maximum: The largest value (excluding outliers).
  2. Identify outliers: Calculate the interquartile range (IQR): \[ \text{IQR} = Q_3 – Q_1 \] Use the formulas below to find thresholds for outliers:
    • Lower threshold: \(Q_1 – 1.5 \times \text{IQR}\).
    • Upper threshold: \(Q_3 + 1.5 \times \text{IQR}\).

    Any data points outside these thresholds are considered outliers.

  3. Plot the key values: Draw a number line, marking the minimum, \(Q_1\), median, \(Q_3\), and maximum. Connect \(Q_1\), median, and \(Q_3\) with a box, and draw whiskers from the box to the minimum and maximum (excluding outliers).
  4. Plot outliers: Represent outliers as individual points beyond the whiskers.

2. Interpreting a Boxplot:

Boxplots summarise a dataset’s key characteristics:

  • The box represents the middle 50% of the data, bounded by \(Q_1\) and \(Q_3\).
  • The median line within the box indicates the centre of the distribution.
  • The whiskers extend to the smallest and largest non-outlier values, showing the range.
  • Outliers, plotted as individual points, highlight unusual data values.

Key features to consider when interpreting a boxplot include:

  • The spread of the data (indicated by the length of the box and whiskers).
  • The presence and location of outliers.
  • Skewness: If the median is closer to \(Q_1\) or \(Q_3\), the data may be skewed.

3. Comparing Boxplots:

When comparing multiple boxplots, look for differences in:

  • Medians: Indicate differences in central tendency.
  • Spread: Compare the lengths of boxes and whiskers to identify variations in variability.
  • Outliers: Examine the number and positions of outliers for each dataset.
  • Skewness: Look for asymmetry in the box and whiskers to assess skewness in the data.

4. Example:

Consider the following dataset: \(2, 4, 5, 7, 9, 12, 14, 18, 22\).

  • Step 1: Arrange the data in ascending order (already arranged).
  • Step 2: Find the key values:
    • Minimum: \(2\)
    • \(Q_1: 5\)
    • Median: \(9\)
    • \(Q_3: 14\)
    • Maximum: \(22\)
  • Step 3: Calculate the IQR: \(Q_3 – Q_1 = 14 – 5 = 9\).
  • Step 4: Find thresholds for outliers:
    • Lower threshold: \(5 – 1.5 \times 9 = -8.5\).
    • Upper threshold: \(14 + 1.5 \times 9 = 27.5\).
  • Step 5: Identify outliers: There are no outliers, as all data points lie between \(-8.5\) and \(27.5\).

Plot the key values and connect them with a box and whiskers.

5. Summary:

  • Boxplots visualise the distribution of a dataset, highlighting key statistics and outliers.
  • Use the IQR method to identify outliers and ensure they are represented on the plot.
  • Compare boxplots to analyse differences in central tendency, spread, and outliers between datasets.

Extra questions

Ultimate members get access to four additional questions with full video explanations.

How did you find this topic?