# Subset data for manual and automatic cars
<- mtcars$mpg[mtcars$am == 0]
auto <- mtcars$mpg[mtcars$am == 1] manual
Two-Sample t-Test in R
two-sample t-test, R programming, data analysis, hypothesis testing, mtcars
Introduction
The two-sample t-test is a statistical method used to compare the means of two independent groups. In this post, we explore whether manual and automatic cars in the mtcars
dataset have different miles per gallon (mpg) values. The test examines the null hypothesis that the two group means are equal, against the alternative hypothesis that they differ.
Hypotheses
For the two-sample t-test, the hypotheses are formulated as follows:
- Null Hypothesis (H₀): The means of both groups are equal (μ₁ = μ₂).
- Alternative Hypothesis (H₁): The means differ (μ₁ ≠ μ₂).
Data Subsetting and Preparation
We first subset the data into two groups based on the transmission type. In the mtcars
dataset, the variable am
indicates the transmission type:
- am == 0
for automatic
- am == 1
for manual
Performing the Two-Sample t-Test
Assuming equal variances between the two groups, we conduct the two-sample t-test using the following code:
Verify that the data within each group are approximately normally distributed. When sample sizes are small, deviations from normality can affect the reliability of the t-test.
# Perform two-sample t-test assuming equal variances
<- t.test(auto, manual, var.equal = TRUE)
t_test_result t_test_result
Two Sample t-test
data: auto and manual
t = -4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.84837 -3.64151
sample estimates:
mean of x mean of y
17.14737 24.39231
Interpretation of Results
t-Statistic and Degrees of Freedom:
The t-statistic is -4.1061 with 30 degrees of freedom. The negative value indicates that the mean mpg for automatic cars is lower than that for manual cars.p-Value:
The p-value of 0.000285 is much smaller than the conventional significance level of 0.05. This provides strong evidence against the null hypothesis, suggesting that there is a statistically significant difference between the means of the two groups.Confidence Interval:
The 95% confidence interval for the difference in means (from -10.84837 to -3.64151) does not include 0, further reinforcing the conclusion that the group means differ.
Checking the Equality of Variances
Before finalizing our analysis, it is important to check whether the assumption of equal variances holds. We use Levene’s Test for this purpose. Note that Levene’s test is available in the car
package.
Always check for homogeneity of variances using tests like Levene’s Test. If the test indicates unequal variances, consider using Welch’s t-test by setting var.equal = FALSE
.
# Perform Levene's Test to check homogeneity of variances
<- leveneTest(mpg ~ factor(am), data = mtcars)
levene_result levene_result
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 4.1876 0.04957 *
30
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value is approximately 0.04957, which is right at the threshold of 0.05, it suggests that the variances may not be entirely equal. In such cases, it is advisable to use Welch’s t-test (by setting var.equal = FALSE
) to account for unequal variances.
Comparison of MPG between Automatic and Manual Cars
We first calculated summary statistics for miles per gallon (mpg) based on transmission type (am
). Using dplyr, we computed the mean, standard deviation, sample size, standard error, and 95% confidence intervals for each group.
Next, we created a comprehensive visualization using ggplot2
, combining violin plots (to show the distribution), boxplots (to display quartiles), and jittered points (for individual observations). Additionally, the plot includes a title, labels, and a caption displaying the p-value from the two-sample t-test. The color scheme distinguishes between automatic and manual transmissions, making the comparison visually intuitive.
Beyond statistical significance, consider reporting the effect size to assess the practical significance of the difference between groups.
# Create summary statistics for each group
<- mtcars %>%
mpg_summary group_by(am) %>%
summarise(
mean_mpg = mean(mpg),
sd_mpg = sd(mpg),
n = n(),
se = sd_mpg / sqrt(n),
ci_lower = mean_mpg - qt(0.975, n - 1) * se,
ci_upper = mean_mpg + qt(0.975, n - 1) * se
)
# Generate the visualization
ggplot(mtcars,
aes(x = factor(am, labels = c("Automatic", "Manual")),
y = mpg,
fill = factor(am))) +
geom_violin(alpha = 0.4, trim = FALSE) +
geom_boxplot(width = 0.1, outlier.shape = NA, alpha = 0.5) +
geom_jitter(width = 0.15, alpha = 0.7, size = 2) +
labs(title = "Comparison of MPG between Automatic and Manual Cars",
subtitle = "Violin plots, boxplots, and 95% Confidence Intervals for Group Means",
x = "Transmission Type",
y = "Miles Per Gallon (MPG)",
caption = paste("Two-sample t-test (equal variances) p-value:",
signif(t_test_result$p.value, 3))) +
scale_fill_manual(values = c("orange", "skyblue"), guide = FALSE) +
theme_minimal(base_size = 14)
For the Automatic cars group (am = 0), the average miles per gallon (mpg) is 17.15, with a standard deviation of 3.83 based on 19 observations. The estimated standard error of the mean is 0.88, yielding a 95% confidence interval ranging from 15.30 to 19.00 mpg. This interval suggests that we can be 95% confident that the true mean mpg for automatic cars lies between these values.
For the Manual cars group (am = 1), the average mpg is considerably higher at 24.39, with a standard deviation of 6.17 calculated from 13 observations. The standard error for this group is 1.71, and the corresponding 95% confidence interval spans from 20.67 to 28.12 mpg.
The clear separation between the confidence intervals for the two groups indicates a statistically significant difference in fuel efficiency between automatic and manual transmissions. Specifically, the mean mpg for manual cars is substantially higher than that for automatic cars.
Conclusion
The two-sample t-test conducted on the mtcars
dataset suggests a statistically significant difference in mpg between automatic and manual cars. The t-test results, with a t-statistic of -4.1061 and a p-value of 0.000285, indicate that the mean mpg for manual cars is significantly higher than that for automatic cars. Although Levene’s Test for equality of variances indicates a borderline result (p ≈ 0.04957), it is advisable to confirm the analysis with Welch’s t-test if variance heterogeneity is a concern. Overall, the analysis supports the conclusion that transmission type is associated with a significant difference in fuel efficiency.