【正文】
low frequencies – When a cell in a contingency table has an expected value less than 5, Fisher’s Exact test is more reliable –In this case, SPSS putes Fisher’s exact significance level automatically when the chisquare test is selected ? SPSS Releases 15 and 16 have removed the Fisher’s Exact test module, which can be purchased separately Fisher’s Exact test Don39。t fet to weight cases! Fisher’s Exact test Fisher’s Exact test Force an FE test Practice ? Use both the UCREL/Xu’s LL calculator / SPSS to determine if the difference in the frequencies of passives in the CLEC and LOCNESS corpora is statistically significant – CLEC: 7,911 instances in 1,070,602 words – LOCNESS: 5,465 instances in 324,304 words Collocation statistics ? Collocation: the habitual or characteristic cooccurrence patterns of words – Can be identified using a statistical approach in CL, . ? Mutual Information (MI), t test, z score – Can be puted using tools like SPSS, Wordsmith, AntConc, Xaira – Only a brief introduction here ? More discussions of collocation statistics to be followed Mutual information ? Computed by dividing the observed frequency of the cooccurring word in the defined span for the search string (socalled node word), . a 4:4 window, by the expected frequency of the cooccurring word in that span and then taking the logarithm to the base 2 of the result Mutual information ? A measure of collocational strength ? The higher the MI score, the stronger the link between two items – MI score of or higher to be taken as evidence that two items are collocates ? The closer to 0 the MI score gets, the more likely it is that the two items cooccur by chance ? A negative MI score indicates that the two items tend to shun each other The t test ? Computed by subtracting the expected frequency from the observed frequency and then dividing the result by the standard deviation ? A t score of 2 or higher is normally considered to be statistically significant ? The specific probability level can be looked up in a table of t distribution The z score ? The z score is the number of standard deviations from the mean frequency ? The z test pares the observed frequency with the frequency expected if only chance is affecting the distribution ? A higher z score indicates a greater degree of collocability of an item with the node word