Welch’s t-Test: The Reliable Way to Compare 2 Population Means with Unequal Variances | by Vito Rihaldijiran | Jun, 2024

Discover why Welch’s t-Test is the go-to method for accurate statistical comparison, even when variances differ.

Vito Rihaldijiran
Towards Data Science
Photo by Simon Maage on Unsplash

Part 1: Background

In the first semester of my postgrad, I had the opportunity to take the course STAT7055: Introductory Statistics for Business and Finance. Throughout the course, I definitely felt a bit exhausted at times, but the amount of knowledge I gained about the application of various statistical methods in different situations was truly priceless. During the 8th week of lectures, something really interesting caught my attention, specifically the concept of Hypothesis Testing when comparing two populations. I found it fascinating to learn about how the approach differs based on whether the samples are independent or paired, as well as what to do when we know or don’t know the population variance of the two populations, along with how to conduct hypothesis testing for two proportions. However, there is one aspect that wasn’t covered in the material, and it keeps me wondering how to tackle this particular scenario, which is performing Hypothesis Testing from two population means when the variances are unequal, known as the Welch t-Test.

To grasp the concept of how the Welch t-Test is applied, we can explore a dataset for the example case. Each stage of this process involves utilizing the dataset from real-world data.

Part 2: The Dataset

The dataset I’m using contains real-world data on World Agricultural Supply and Demand Estimates (WASDE) that are regularly updated. The WASDE dataset is put together by the World Agricultural Outlook Board (WAOB). It is a monthly report that provides annual predictions for various global regions and the United States when it comes to wheat, rice, coarse grains, oilseeds, and cotton. Furthermore, the dataset also covers forecasts for sugar, meat, poultry, eggs, and milk in the United States. It is sourced from the Nasdaq website, and you are welcome to access it for free here: WASDE dataset. There are 3 datasets, but I only use the first one, which is the Supply and Demand Data. Column definitions can be seen here:

Figure 1: Column Definitions by NASDAQ

I am going to use two different samples from specific regions, commodities, and items to simplify the testing process. Additionally, we will be using the R Programming Language for the end-to-end procedure.

Now let’s do a proper data preparation:


# Read and preprocess the dataframe
wasde_data %
select(-min_value, -max_value, -year, -period) %>%
filter(item == "Production", commodity == "Wheat")

# Filter data for Argentina and Australia
wasde_argentina %
filter(region == "Argentina") %>%

wasde_oz %
filter(region == "Australia") %>%

I divided two samples into two different regions, namely Argentina and Australia. And the focus is production in wheat commodities.

Now we’re set. But wait..

Before delving further into the application of the Welch t-Test, I can’t help but wonder why it is necessary to test whether the two population variances are equal or not.

Part 3: Testing Equality of Variances

When conducting hypothesis testing to compare two population means without knowledge of the population variances, it’s crucial to confirm the equality of variances in order to select the appropriate statistical test. If the variances turn out to be the same, we opt for the pooled variance t-test; otherwise, we can use Welch’s t-test. This important step guarantees the precision of the outcomes, since using an incorrect test could result in wrong conclusions due to higher risks of Type I and Type II errors. By checking for equality in variances, we make sure that the hypothesis testing process relies on accurate assumptions, ultimately leading to more dependable and valid conclusions.

Then how do we test the two population variances?

We have to generate two hypotheses as below:

Figure 2: null and alternative hypotheses for testing equality variances by author

The rule of thumb is very simple:

  1. If the test statistic falls into rejection region, then Reject H0 or Null Hypothesis.
  2. Otherwise, we Fail to Reject H0 or Null Hypothesis.

We can set the hypotheses like this:

# Hypotheses: Variance Comparison
h0_variance h1_variance

Now we should do the test statistic. But how do we get this test statistic? we use F-Test.

An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions about the error term.

Figure 3: Illustration Probability Density Function (PDF) of F Distribution by Wikipedia

we can generate the test statistic value with dividing two sample variances like this:

Figure 4: F test formula by author

and the rejection region is:

Figure 5: Rejection Region of F test by author

where n is the sample size and alpha is significance level. so when the F value falls into either of these rejection region, we reject null hypothesis.


the trick is: The labeling of sample 1 and sample 2 is actually random, so let’s make sure to place the larger sample variance on top every time. This way, our F-statistic will consistently be greater than 1, and we just need to refer to the upper cut-off to reject H0 at significance level α whenever.

we can do this by:

# Calculate sample variances
sample_var_argentina sample_var_oz

# Calculate F calculated value

we’ll use 5% significance level (0.05), so the decision rule is:

# Define significance level and degrees of freedom
alpha alpha_half n1 n2 df1 df2

# Calculate critical F values
f_value_lower f_value_upper

# Variance comparison result
if (f_calculated > f_value_lower & f_calculated cat("Fail to Reject H0: ", h0_variance, "\n")
equal_variances } else {
cat("Reject H0: ", h1_variance, "\n")
equal_variances }

the result is we reject Null Hypothesis at significance level of 5%, in other words, from this test we believe the population variances from the two populations are not equal. Now we know why we should use Welch t-Test instead of Pooled Variance t-Test.

Part 4: The main course, Welch t-Test

The Welch t-test, also called Welch’s unequal variances t-test, is a statistical method used for comparing the means of two separate samples. Instead of assuming equal variances like the standard pooled variance t-test, the Welch t-test is more robust as it does not make this assumption. This adjustment in degrees of freedom leads to a more precise evaluation of the difference between the two sample means. By not assuming equal variances, the Welch t-test offers a more dependable outcome when working with real-world data where this assumption may not be true. It is preferred for its adaptability and dependability, ensuring that conclusions drawn from statistical analyses remain valid even if the equal variances assumption is not met.

The test statistic formula is:

Figure 6: test statistic formula of Welch t-Test by author


and the Degree of Freedom can be defined like this:

Figure 7: Degree of Freedom formula by author

The rejection region for the Welch t-test depends on the chosen significance level and whether the test is one-tailed or two-tailed.

Two-tailed test: The null hypothesis is rejected if the absolute value of the test statistic |t| is greater than the critical value from the t-distribution with ν degrees of freedom at α/2.

One-tailed test: The null hypothesis is rejected if the test statistic t is greater than the critical value from the t-distribution with ν degrees of freedom at α for an upper-tailed test, or if t is less than the negative critical value for a lower-tailed test.

  • Upper-tailed test: t > tα,ν
  • Lower-tailed test: t

So let’s do one example with One-tailed Welch t-Test.

lets generate the hypotheses:

h0_mean h1_mean 

this is a Upper Tailed Test, so the rejection region is: t > tα,ν

and by using the formula given above, and by using same significance level (0.05):

# Calculate sample means
sample_mean_argentina sample_mean_oz

# Welch's t-test (unequal variances)
s1 s2 t_calculated df t_value

# Mean comparison result
if (t_calculated > t_value) {
cat("Reject H0: ", h1_mean, "\n")
} else {
cat("Fail to Reject H0: ", h0_mean, "\n")

the result is we Fail to Reject H0 at significance level of 5%, then Population mean of Wheat production in Argentina equals that in Australia.

That’s how to conduct Welch t-Test. Now your turn. Happy experimenting!

Part 5: Conclusion

When comparing two population means during hypothesis testing, it is really important to start by checking if the variances are equal. This initial step is crucial as it helps in deciding which statistical test to use, guaranteeing precise and dependable outcomes. If it turns out that the variances are indeed equal, you can go ahead and apply the standard t-test with pooled variances. However, in cases where the variances are not equal, it is recommended to go with Welch’s t-test.

Welch’s t-test provides a strong solution for comparing means when the assumption of equal variances does not hold true. By adjusting the degrees of freedom to accommodate for the uneven variances, Welch’s t-test gives a more precise and dependable evaluation of the statistical importance of the difference between two sample means. This adaptability makes it a popular choice in various practical situations where sample sizes and variances can vary significantly.

In conclusion, checking for equality of variances and utilizing Welch’s t-test when needed ensures the accuracy of hypothesis testing. This approach reduces the chances of Type I and Type II errors, resulting in more reliable conclusions. By selecting the appropriate test based on the equality of variances, we can confidently analyze the findings and make well-informed decisions grounded on empirical evidence.


Source link

[aisg_get_postavatar size=64]