I. Introduction
Nonparametric tests are sometimes called distribution free statistics because they do not require that the data fit a normal distribution. More generally, nonparametric tests require less restrictive assumptions about the data. Another important reason for using these tests is that they allow for the analysis of categorical as well as rank data.
- Why Not Used All the Time?
Since nonparametric tests require fewer assumptions and can be used with a broader range of data types, the question becomes, "Why not use them all of the time?" Parametric tests are often preferred because:- They are robust.
- They have greater power efficiency, in other words, they have greater power relative to the sample size.
- They provide unique information (e.g., the interaction in a factorial design).
- Parametric and nonparametric tests often address two different types of questions.
- Relation to Parametric Tests
The Summary of Statistical Tests should help put into perspective where nonparametric tests fit into what we have learned. For example, we have already learned about the binomial test for the simplest case of nominal data and Spearman's Rho for correlations involving rank data. In this unit, we will learn about the chi-square test. The other tests listed in the table (that we have not yet covered) are beyond the scope of the course.It is important to note that even with metric data, if assumptions are badly violated, nonparametric tests are likely to be employed.
This statistic is used to test expected versus observed frequencies. There are two situations in which it is used.
- One Variable (or Sample) Case
This is sometimes called the goodness of fit test. Consider an example.- Research Question
Do people have a preference for movie type? - Hypotheses
In words: HO The observed distribution fits the expected or, in
other words, there is no preference.HA The observed distribution does not fit that expected
(there is a preference).
Notice that there is no mention made of parameters.
- Assumptions
- The sample is chosen randomly.
- The scores are independent (i.e., each subject is allowed only one preference).
- The null hypothesis.
- Decision rules
Let c equal the number of columns. In this case, there are four preferences or columns. Thus, df=c-1 or 4-1=3 and with an a level of .05 the critical value of chi square is 7.82 (see table).
If x2obs7.82, reject Ho.
If x2obs<7.82,>o. - Computation
The appropriate descriptive statistic is the percentages of people prefering each type of movie.
If it looks like these percents are worthy of additional analysis, we must first determine the expected frequencies. If we are asking folks which of four movie types they prefer and there is no preference, we would expect 25% to prefer each type. Let:Ej = the Expected frequency in the j-th column. Oj = the Observed frequency in the j-th column. In our example, j = the number of types of movies. Then:
Now let's consider the following data:
Comedy Horror Drama Sci fi Expected 25 25 25 25 as %s Observed 35 30 20 15 so n=100 %35 30 20 15 Substituting the numbers in the formula gives:
- Decision
Since x2obs (10.00) > x2crit (7.82), we reject Ho and conclude that folks do have a preference for which type of movie they like best. They like comedy the best and sci fi the least.
- Research Question
- Two Variable (or Sample) Case - [Minitab] [Spreadsheet]
This test goes by several names. It is most commonly called the Pearson Chi Square, but is sometimes called a test of independence between two variables or crosstabs.
Consider the following data (called a contingency table) on drug usage that I collected when I was a student in college.Contingency
TableFrequency of Marijuana Use Total <> 3 times/week Categories
of Other
Drugs Tried1-3 26 6 32 4-6 17 25 42 Total43 31 74
It looks like folks that smoked marijuana more frequently also tried more categories of other drugs.- Research Question
Is frequency of marijuana smoking related to number of other drugs tried? - Hypotheses
In words: HO There is no relationship (or contingency) between
the two variables, that is, they are independent.HA The two variables are related.
Again, notice that there is no mention made of parameters.
- Assumptions
- The individuals in each sample are chosen randomly.
- The scores are independent (i.e., each subject fits in only one cell of the table).
- For a 2x2 table, all expected cell frequencies should be at least equal to 10 (for larger tables, this value is 5).
- The null hypothesis.
- Decision rules
Again, let c equal the number of columns. Since we are also considering another variable, let r equal the number of rows. Thus, df=(c-1)(r-1) or (2-1)(2-1)=1 and with an a level of .05 the critical value of chi square is 3.84 (see table).
If x2obs3.84, reject Ho.
If x2obs<3.84, do not reject Ho. - Computation
First we must determine the expected frequencies. Let:
Ejk = the expected frequency of the cell defined by
the j-th column and the k-th row.Ojk = the observed frequency of the cell defined by
the j-th column and the k-th row.Where j = # columns and k = # rows. And:
Note, a helpful check is that the sum of the expected cell frequencies is equal to N, that is:Then:
So, let's compute the Ejks for the data above.
Contingency
TableFrequency of Marijuana Use Total <> 3 times/week Categories
of Other
Drugs Tried1-3 26 (18.59) 6 (13.40) 32 4-6 17 (24.41) 25 (17.59) 42 Total43 31 74 To be clear, E11 = (32*43)/74 = 18.59
and checking our work, 18.59 + 13.40 + 24.41 + 17.59 = 73.99 74.So 6/32 or about 19% of folks who had tried 1-3 other drugs smoked marijuana frequently whereas 25/42 or about 60% of folks who had tried 3-6 other drugs smoked frequently. These percentages are the relevant descriptive statistics that give us the reason for performing the chi square test.
Substituting the values in the formula gives:
- Research Question
No comments:
Post a Comment