Box plots

Video masterclass

Topic summary

Boxplots are graphical representations of a dataset's distribution. They display key statistics, including the minimum, lower quartile (\(Q_1\)), median, upper quartile (\(Q_3\)), and maximum, along with any outliers. Boxplots are useful for visualising and comparing distributions.

1. Drawing a Boxplot:

To construct a boxplot, follow these steps:

  1. Organise the data: Arrange the data in ascending order and calculate the five key values:
    • Minimum: The smallest value (excluding outliers).
    • \(Q_1\): The lower quartile (25th percentile).
    • Median: The middle value (50th percentile).
    • \(Q_3\): The upper quartile (75th percentile).
    • Maximum: The largest value (excluding outliers).
  2. Identify outliers: Calculate the interquartile range (IQR): \[ \text{IQR} = Q_3 - Q_1 \] Use the formulas below to find thresholds for outliers:
    • Lower threshold: \(Q_1 - 1.5 \times \text{IQR}\).
    • Upper threshold: \(Q_3 + 1.5 \times \text{IQR}\).
    Any data points outside these thresholds are considered outliers.
  3. Plot the key values: Draw a number line, marking the minimum, \(Q_1\), median, \(Q_3\), and maximum. Connect \(Q_1\), median, and \(Q_3\) with a box, and draw whiskers from the box to the minimum and maximum (excluding outliers).
  4. Plot outliers: Represent outliers as individual points beyond the whiskers.

2. Interpreting a Boxplot:

Boxplots summarise a dataset's key characteristics:

  • The box represents the middle 50% of the data, bounded by \(Q_1\) and \(Q_3\).
  • The median line within the box indicates the centre of the distribution.
  • The whiskers extend to the smallest and largest non-outlier values, showing the range.
  • Outliers, plotted as individual points, highlight unusual data values.

Key features to consider when interpreting a boxplot include:

  • The spread of the data (indicated by the length of the box and whiskers).
  • The presence and location of outliers.
  • Skewness: If the median is closer to \(Q_1\) or \(Q_3\), the data may be skewed.

3. Comparing Boxplots:

When comparing multiple boxplots, look for differences in:

  • Medians: Indicate differences in central tendency.
  • Spread: Compare the lengths of boxes and whiskers to identify variations in variability.
  • Outliers: Examine the number and positions of outliers for each dataset.
  • Skewness: Look for asymmetry in the box and whiskers to assess skewness in the data.

4. Example:

Consider the following dataset: \(2, 4, 5, 7, 9, 12, 14, 18, 22\).

  • Step 1: Arrange the data in ascending order (already arranged).
  • Step 2: Find the key values:
    • Minimum: \(2\)
    • \(Q_1: 5\)
    • Median: \(9\)
    • \(Q_3: 14\)
    • Maximum: \(22\)
  • Step 3: Calculate the IQR: \(Q_3 - Q_1 = 14 - 5 = 9\).
  • Step 4: Find thresholds for outliers:
    • Lower threshold: \(5 - 1.5 \times 9 = -8.5\).
    • Upper threshold: \(14 + 1.5 \times 9 = 27.5\).
  • Step 5: Identify outliers: There are no outliers, as all data points lie between \(-8.5\) and \(27.5\).

Plot the key values and connect them with a box and whiskers.

5. Summary:

  • Boxplots visualise the distribution of a dataset, highlighting key statistics and outliers.
  • Use the IQR method to identify outliers and ensure they are represented on the plot.
  • Compare boxplots to analyse differences in central tendency, spread, and outliers between datasets.

Extra questions (ultimate exclusive)

Ultimate members get access to four additional questions with full video explanations.