Hi all, This Blog is an English archive of my PhD experience in Imperial College London, mainly logging my research and working process, as well as some visual records.

Tuesday 28 August 2007

The Wilcoxon Rank Sum Statistic II - Reinforce

The Wilcoxon Rank Sum Statistic


Also called the Mann-Whitney U test, Mann-Whitney-Wilcoxon (MWW), or Wilcoxon-Mann-Whitney test.

Let W be the sum of the ranks of the observations in the first sample (for our example, weed-free plots.) If the two populations have the same continuous distribution and the observations in both samples take different values (i.e., there are no ties when the observations are ranked), the exact distribution of W has mean

mw = n_1*(N+1)/2

and standard deviation:

sw = SQRT[n_1*n_2*(N+1)/12]

The Wilcoxon rank sum test rejects the hypothesis that the two populations have identical distributions when the observed rank sum W is far from its mean.

The Normal Approximation

The rank sum statistic W becomes approximately normal as the two sample sizes increase. That is, we can use the z-statistic

z = (W - mw)/ sw = [W - n_1*(N+1)/2 ] / SQRT[n_1*n_2*(N+1)/12]

to carry out the test.

For a fixed level a test, reject Ho if:

  • z > z* when Ha : m1 > m2
  • z < -z* when Ha : m1 < m2
  • |z| > z* when Ha : m1 ¹ m2

Note: z* represents the corresponding upper critical value of a standard normal distribution.

EXCEL Formulas for carrying out a Wilcoxon Rank Sum test using normal approximation, without the continuity correction.

Wilcoxon Rank Sum Test : EXAMPLE











Sample1
Sample2
Combined Sample
Population
Rank
166.7
158.6
153.1
2
1
172.2
176.4
156.0
2
2
165.0
153.1
158.6
2
3
176.9
156.0
165.0
1
4


166.7
1
5


172.2
1
6


176.4
2
7


176.9
1
8















User Input/Output

Excel Formulas
alpha
0.05


Summary Statistics



n_1
4
=COUNT(Sample1)
n_2
4
=COUNT(Sample2)
Calculations



W
23
=SUMIF(Population, "=1", Rank)
mu
18
=n_1*(n_1+n_2+1)/2
sigma
3.4641
=SQRT(n_1*n_2*(n_1+n_2+1)/12)
z
1.443
=(W-mu)/sigma
Lower Test



lower_z
-1.645
=NORMSINV(alpha)
Decision
Do Not Reject Ho

=IF(z
Pvalue
0.9255
=NORMSDIST(z)
Upper Test



upper _z
1.645
=NORMSINV(1-alpha)
Decision
Do Not Reject Ho

=IF(z>upper_z, "Reject Ho", "Do Not Reject Ho)
Pvalue
0.0745
=1-NORMSDIST(z)
Two-Sided Test



two_z
1.960
=ABS(NORMSINV(alpha/2))
Decision
Do Not Reject Ho

=IF(ABS(z)>two_z, "Reject Ho", "Do Not Reject Ho)
Pvalue
0.1489
=2*(1-NORMSDIST(ABS(z)))


Interpreting The Results

Since the alternative hypothesis for this test is

Ha : Yields are significantly higher in weed-free plots (i.e., m1 > m2 )

We read off the Excel output that corresponds to the Upper Test rows in the example output above.

Conclusion: At the 5% level, we conclude that the data are not significant since both the Pvalue of 0.0745 is not less than alpha (0.05 or 5%), and the observed z value of 1.443 is not greater than the upper_z value of 1.645, as required for rejecting the null hypothesis for this type of test.

The answer to the researcher's question: Does the presence of small numbers of weeds reduce the yield of corn? Our response should be that based on the data at hand, the presence of small numbers of weeds DOES NOT reduce the yield of corn.

No comments: