Boxplots are graphical representations of a dataset's distribution. They display key statistics, including the minimum, lower quartile (\(Q_1\)), median, upper quartile (\(Q_3\)), and maximum, along with any outliers. Boxplots are useful for visualising and comparing distributions.
1. Drawing a Boxplot:
To construct a boxplot, follow these steps:
- Organise the data: Arrange the data in ascending order and calculate the five key values:
- Minimum: The smallest value (excluding outliers).
- \(Q_1\): The lower quartile (25th percentile).
- Median: The middle value (50th percentile).
- \(Q_3\): The upper quartile (75th percentile).
- Maximum: The largest value (excluding outliers).
- Identify outliers: Calculate the interquartile range (IQR): \[ \text{IQR} = Q_3 - Q_1 \] Use the formulas below to find thresholds for outliers:
- Lower threshold: \(Q_1 - 1.5 \times \text{IQR}\).
- Upper threshold: \(Q_3 + 1.5 \times \text{IQR}\).
Any data points outside these thresholds are considered outliers.
- Plot the key values: Draw a number line, marking the minimum, \(Q_1\), median, \(Q_3\), and maximum. Connect \(Q_1\), median, and \(Q_3\) with a box, and draw whiskers from the box to the minimum and maximum (excluding outliers).
- Plot outliers: Represent outliers as individual points beyond the whiskers.
2. Interpreting a Boxplot:
Boxplots summarise a dataset's key characteristics:
- The box represents the middle 50% of the data, bounded by \(Q_1\) and \(Q_3\).
- The median line within the box indicates the centre of the distribution.
- The whiskers extend to the smallest and largest non-outlier values, showing the range.
- Outliers, plotted as individual points, highlight unusual data values.
Key features to consider when interpreting a boxplot include:
- The spread of the data (indicated by the length of the box and whiskers).
- The presence and location of outliers.
- Skewness: If the median is closer to \(Q_1\) or \(Q_3\), the data may be skewed.
3. Comparing Boxplots:
When comparing multiple boxplots, look for differences in:
- Medians: Indicate differences in central tendency.
- Spread: Compare the lengths of boxes and whiskers to identify variations in variability.
- Outliers: Examine the number and positions of outliers for each dataset.
- Skewness: Look for asymmetry in the box and whiskers to assess skewness in the data.
4. Example:
Consider the following dataset: \(2, 4, 5, 7, 9, 12, 14, 18, 22\).
- Step 1: Arrange the data in ascending order (already arranged).
- Step 2: Find the key values:
- Minimum: \(2\)
- \(Q_1: 5\)
- Median: \(9\)
- \(Q_3: 14\)
- Maximum: \(22\)
- Step 3: Calculate the IQR: \(Q_3 - Q_1 = 14 - 5 = 9\).
- Step 4: Find thresholds for outliers:
- Lower threshold: \(5 - 1.5 \times 9 = -8.5\).
- Upper threshold: \(14 + 1.5 \times 9 = 27.5\).
- Step 5: Identify outliers: There are no outliers, as all data points lie between \(-8.5\) and \(27.5\).
Plot the key values and connect them with a box and whiskers.
5. Summary:
- Boxplots visualise the distribution of a dataset, highlighting key statistics and outliers.
- Use the IQR method to identify outliers and ensure they are represented on the plot.
- Compare boxplots to analyse differences in central tendency, spread, and outliers between datasets.