The Wilcoxon Rank Sum Statistic
Also called the Mann-Whitney U test, Mann-Whitney-Wi
Let W be the sum of the ranks of the observations in the first sample (for our example, weed-free plots.) If the two populations have the same continuous distribution and the observations in both samples take different values (i.e., there are no ties when the observations are ranked), the exact distribution of W has mean
mw = n_1*(N+1)/2
and standard deviation:
sw = SQRT[n_1*n_2*(N
The Wilcoxon rank sum test rejects the hypothesis that the two populations have identical distributions when the observed rank sum W is far from its mean.
The Normal Approximation
The rank sum statistic W becomes approximately normal as the two sample sizes increase. That is, we can use the z-statistic
z = (W - mw)/ sw = [W - n_1*(N+1)/2 ] / SQRT[n_1*n_2*(N
to carry out the test.
For a fixed level a test, reject Ho if:
- z > z* when Ha : m1 > m2
- z < -z* when Ha : m1 < m2
- |z| > z* when Ha : m1 ¹ m2
Note: z* represents the corresponding upper critical value of a standard normal distribution.
EXCEL Formulas for carrying out a Wilcoxon Rank Sum test using normal approximation, without the continuity correction.
Wilcoxon Rank Sum Test : EXAMPLE | ||||
Sample1 | Sample2 | Combined Sample | Population | Rank |
166.7 | 158.6 | 153.1 | 2 | 1 |
172.2 | 176.4 | 156.0 | 2 | 2 |
165.0 | 153.1 | 158.6 | 2 | 3 |
176.9 | 156.0 | 165.0 | 1 | 4 |
| | 166.7 | 1 | 5 |
| | 172.2 | 1 | 6 |
| | 176.4 | 2 | 7 |
| | 176.9 | 1 | 8 |
User Input/Output | Excel Formulas | |||
alpha | 0.05 | |||
Summary Statistics | ||||
n_1 | 4 | =COUNT(Sample1) | ||
n_2 | 4 | =COUNT(Sample2) | ||
Calculations | ||||
W | 23 | =SUMIF(Populati | ||
mu | 18 | =n_1*(n_1+n_2+1 | ||
sigma | 3.4641 | =SQRT(n_1*n_2*( | ||
z | 1.443 | =(W-mu)/sigma | ||
Lower Test | ||||
lower_z | -1.645 | =NORMSINV(alpha | ||
Decision | Do Not Reject Ho | =IF(z | ||
Pvalue | 0.9255 | =NORMSDIST(z) | ||
Upper Test | ||||
upper _z | 1.645 | =NORMSINV(1-alp | ||
Decision | Do Not Reject Ho | =IF(z>upper_z, "Reject Ho", "Do Not Reject Ho) | ||
Pvalue | 0.0745 | =1-NORMSDIST(z) | ||
Two-Sided Test | ||||
two_z | 1.960 | =ABS(NORMSINV(a | ||
Decision | Do Not Reject Ho | =IF(ABS(z)>two_ | ||
Pvalue | 0.1489 | =2*(1-NORMSDIST |
Interpreting The Results
Since the alternative hypothesis for this test is
Ha : Yields are significantly higher in weed-free plots (i.e., m1 > m2 )
We read off the Excel output that corresponds to the Upper Test rows in the example output above.
Conclusion: At the 5% level, we conclude that the data are not significant since both the Pvalue of 0.0745 is not less than alpha (0.05 or 5%), and the observed z value of 1.443 is not greater than the upper_z value of 1.645, as required for rejecting the null hypothesis for this type of test.
The answer to the researcher's question: Does the presence of small numbers of weeds reduce the yield of corn? Our response should be that based on the data at hand, the presence of small numbers of weeds DOES NOT reduce the yield of corn.
No comments:
Post a Comment