I had used DIF analysis using Winsteps in my paper. Reviewer liked it but he had a good point: he questioned the results of Tables 30.2 and 30.3 from Winsteps. He said that the expected baseline measure (expected over-all item difficulty without a DIF) is estimated using the sub-sample (class) too, so we can’t compare these local and over-all item difficulties directly with Welch’s test, because they are dependent (correlated). I realize that I probably have to agree.
Am I wrong or is it really wrong in Winsteps? Do you have any solution to this problem or can you suggest any response to the reviewer?
Thank you and thank you also for your wonderful software, which I really like :-)
Hynek, yes, so use Winsteps Table 30.1 in the situation you describe. This compares the classes. Usually one class (typically the biggest) is used as the reference class, and all the others are the focal classes.
Tables 30.2 is for situations in which there are many classes of about equal size (for instance, age groups) and we want to identify classes that depart from the consensus. When an idiosyncratic class is identified, use the Output Tables menu and DIF specifications box, www.winsteps.com/winman/difspecifications.htm , to lump all the other classes into one code. Then you can look at Table 30.1 for idiosyncratic class against the rest. OK?
Thank you for your response and for good suggestions. However, in the case above, I compare 6 age classes, similar sample sizes. Comparing local difficulties to the mean difficulties seems to be the best solution for testing my hypothesis, but there is the mentioned problem with dependent parameter estimation. Is it possible to estimate how big is the bias in this case? I know the test will have less power, but how much... Alternative way is to compare the local difficulty in each of the classes with the over-all item difficulty in the all other classes; I think I can do it in hand as t = (B1 – (1/5)*(B2+B3+…+B6) ) / ( sqrt(S1 + (1/5)*(S2+S3+…+S6)) ) , where B1-B6 are difficulties in all six classes and S1-S6 are their error variances. The problem is that my test consists from 16 scales and without any automation this approach would take a huge amount of time. However, reviewer liked my paper and corrections in the analysis is not probably necessary. So, the question is, how big could be a bias in table 30.2 with 6 classes? Or is there in Winsteps any quick possibility to extract local item difficulties and their error variances for all classes? Thank you, Hynek