Two-Sample t-Test in R

Statistics
R
Data Analysis
Hypothesis Testing
Author

Farhan Khalid

Published

February 6, 2025

Keywords

two-sample t-test, R programming, data analysis, hypothesis testing, mtcars

Introduction

The two-sample t-test is a statistical method used to compare the means of two independent groups. In this post, we explore whether manual and automatic cars in the mtcars dataset have different miles per gallon (mpg) values. The test examines the null hypothesis that the two group means are equal, against the alternative hypothesis that they differ.

Hypotheses

For the two-sample t-test, the hypotheses are formulated as follows:

  • Null Hypothesis (H₀): The means of both groups are equal (μ₁ = μ₂).
  • Alternative Hypothesis (H₁): The means differ (μ₁ ≠ μ₂).

Data Subsetting and Preparation

We first subset the data into two groups based on the transmission type. In the mtcars dataset, the variable am indicates the transmission type:
- am == 0 for automatic
- am == 1 for manual

# Subset data for manual and automatic cars
auto <- mtcars$mpg[mtcars$am == 0]
manual <- mtcars$mpg[mtcars$am == 1]

Performing the Two-Sample t-Test

Assuming equal variances between the two groups, we conduct the two-sample t-test using the following code:

Assumption of Normality

Verify that the data within each group are approximately normally distributed. When sample sizes are small, deviations from normality can affect the reliability of the t-test.

# Perform two-sample t-test assuming equal variances
t_test_result <- t.test(auto, manual, var.equal = TRUE)
t_test_result

    Two Sample t-test

data:  auto and manual
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.84837  -3.64151
sample estimates:
mean of x mean of y 
 17.14737  24.39231 

Interpretation of Results

  • t-Statistic and Degrees of Freedom:
    The t-statistic is -4.1061 with 30 degrees of freedom. The negative value indicates that the mean mpg for automatic cars is lower than that for manual cars.

  • p-Value:
    The p-value of 0.000285 is much smaller than the conventional significance level of 0.05. This provides strong evidence against the null hypothesis, suggesting that there is a statistically significant difference between the means of the two groups.

  • Confidence Interval:
    The 95% confidence interval for the difference in means (from -10.84837 to -3.64151) does not include 0, further reinforcing the conclusion that the group means differ.

Checking the Equality of Variances

Before finalizing our analysis, it is important to check whether the assumption of equal variances holds. We use Levene’s Test for this purpose. Note that Levene’s test is available in the car package.

Variance Equality

Always check for homogeneity of variances using tests like Levene’s Test. If the test indicates unequal variances, consider using Welch’s t-test by setting var.equal = FALSE.

# Perform Levene's Test to check homogeneity of variances
levene_result <- leveneTest(mpg ~ factor(am), data = mtcars)
levene_result
Levene's Test for Homogeneity of Variance (center = median)
      Df F value  Pr(>F)  
group  1  4.1876 0.04957 *
      30                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value is approximately 0.04957, which is right at the threshold of 0.05, it suggests that the variances may not be entirely equal. In such cases, it is advisable to use Welch’s t-test (by setting var.equal = FALSE) to account for unequal variances.

Comparison of MPG between Automatic and Manual Cars

We first calculated summary statistics for miles per gallon (mpg) based on transmission type (am). Using dplyr, we computed the mean, standard deviation, sample size, standard error, and 95% confidence intervals for each group.

Next, we created a comprehensive visualization using ggplot2, combining violin plots (to show the distribution), boxplots (to display quartiles), and jittered points (for individual observations). Additionally, the plot includes a title, labels, and a caption displaying the p-value from the two-sample t-test. The color scheme distinguishes between automatic and manual transmissions, making the comparison visually intuitive.

Interpretation of Effect Size

Beyond statistical significance, consider reporting the effect size to assess the practical significance of the difference between groups.

# Create summary statistics for each group
mpg_summary <- mtcars %>%
          group_by(am) %>%
          summarise(
                    mean_mpg = mean(mpg),
                    sd_mpg = sd(mpg),
                    n = n(),
                    se = sd_mpg / sqrt(n),
                    ci_lower = mean_mpg - qt(0.975, n - 1) * se,
                    ci_upper = mean_mpg + qt(0.975, n - 1) * se
          )

# Generate the visualization
ggplot(mtcars, 
       aes(x = factor(am, labels = c("Automatic", "Manual")), 
           y = mpg, 
           fill = factor(am))) +
          geom_violin(alpha = 0.4, trim = FALSE) +
          geom_boxplot(width = 0.1, outlier.shape = NA, alpha = 0.5) +
          geom_jitter(width = 0.15, alpha = 0.7, size = 2) +
          labs(title = "Comparison of MPG between Automatic and Manual Cars",
               subtitle = "Violin plots, boxplots, and 95% Confidence Intervals for Group Means",
               x = "Transmission Type",
               y = "Miles Per Gallon (MPG)",
               caption = paste("Two-sample t-test (equal variances) p-value:",
                               signif(t_test_result$p.value, 3))) +
          scale_fill_manual(values = c("orange", "skyblue"), guide = FALSE) +
          theme_minimal(base_size = 14)

For the Automatic cars group (am = 0), the average miles per gallon (mpg) is 17.15, with a standard deviation of 3.83 based on 19 observations. The estimated standard error of the mean is 0.88, yielding a 95% confidence interval ranging from 15.30 to 19.00 mpg. This interval suggests that we can be 95% confident that the true mean mpg for automatic cars lies between these values.

For the Manual cars group (am = 1), the average mpg is considerably higher at 24.39, with a standard deviation of 6.17 calculated from 13 observations. The standard error for this group is 1.71, and the corresponding 95% confidence interval spans from 20.67 to 28.12 mpg.

The clear separation between the confidence intervals for the two groups indicates a statistically significant difference in fuel efficiency between automatic and manual transmissions. Specifically, the mean mpg for manual cars is substantially higher than that for automatic cars.

Conclusion

The two-sample t-test conducted on the mtcars dataset suggests a statistically significant difference in mpg between automatic and manual cars. The t-test results, with a t-statistic of -4.1061 and a p-value of 0.000285, indicate that the mean mpg for manual cars is significantly higher than that for automatic cars. Although Levene’s Test for equality of variances indicates a borderline result (p ≈ 0.04957), it is advisable to confirm the analysis with Welch’s t-test if variance heterogeneity is a concern. Overall, the analysis supports the conclusion that transmission type is associated with a significant difference in fuel efficiency.

Back to top