Johnny Deng's Column: The Wilcoxon Rank Sum Statistic II

The Wilcoxon Rank Sum Statistic

Also called the Mann-Whitney U test, Mann-Whitney-Wilcoxon (MWW), or Wilcoxon-Mann-Whitney test.

Let W be the sum of the ranks of the observations in the first sample (for our example, weed-free plots.) If the two populations have the same continuous distribution and the observations in both samples take different values (i.e., there are no ties when the observations are ranked), the exact distribution of W has mean

m_w = n_1*(N+1)/2

and standard deviation:

s_w = SQRT[n_1*n_2*(N+1)/12]

The Wilcoxon rank sum test rejects the hypothesis that the two populations have identical distributions when the observed rank sum W is far from its mean.

The Normal Approximation

The rank sum statistic W becomes approximately normal as the two sample sizes increase. That is, we can use the z-statistic

z = (W - m_w)/ s_w = [W - n_1*(N+1)/2 ] / SQRT[n_1*n_2*(N+1)/12]

to carry out the test.

For a fixed level a test, reject Ho if:

z > z* when Ha : m₁> m₂
z < -z* when Ha : m₁< m₂
|z| > z* when Ha : m₁¹m₂

Note: z* represents the corresponding upper critical value of a standard normal distribution.

EXCEL Formulas for carrying out a Wilcoxon Rank Sum test using normal approximation, without the continuity correction.

Wilcoxon Rank Sum Test : EXAMPLE


Sample1	Sample2	Combined Sample	Population	Rank
166.7	158.6	153.1	2	1
172.2	176.4	156.0	2	2
165.0	153.1	158.6	2	3
176.9	156.0	165.0	1	4
		166.7	1	5
		172.2	1	6
		176.4	2	7
		176.9	1	8



User Input/Output			Excel Formulas
alpha	0.05
Summary Statistics
n_1	4		=COUNT(Sample1)
n_2	4		=COUNT(Sample2)
Calculations
W	23		=SUMIF(Population, "=1", Rank)
mu	18		=n_1*(n_1+n_2+1)/2
sigma	3.4641		=SQRT(n_1n_2(n_1+n_2+1)/12)
z	1.443		=(W-mu)/sigma
Lower Test
lower_z	-1.645		=NORMSINV(alpha)
Decision	Do Not Reject Ho		=IF(z
Pvalue	0.9255		=NORMSDIST(z)
Upper Test
upper _z	1.645		=NORMSINV(1-alpha)
Decision	Do Not Reject Ho		=IF(z>upper_z, "Reject Ho", "Do Not Reject Ho)
Pvalue	0.0745		=1-NORMSDIST(z)
Two-Sided Test
two_z	1.960		=ABS(NORMSINV(alpha/2))
Decision	Do Not Reject Ho		=IF(ABS(z)>two_z, "Reject Ho", "Do Not Reject Ho)
Pvalue	0.1489		=2*(1-NORMSDIST(ABS(z)))

Interpreting The Results

Since the alternative hypothesis for this test is

Ha : Yields are significantly higher in weed-free plots (i.e., m₁> m₂)

We read off the Excel output that corresponds to the Upper Test rows in the example output above.

Conclusion: At the 5% level, we conclude that the data are not significant since both the Pvalue of 0.0745 is not less than alpha (0.05 or 5%), and the observed z value of 1.443 is not greater than the upper_z value of 1.645, as required for rejecting the null hypothesis for this type of test.

The answer to the researcher's question: Does the presence of small numbers of weeds reduce the yield of corn? Our response should be that based on the data at hand, the presence of small numbers of weeds DOES NOT reduce the yield of corn.