You could do a weighted kappa, but probably the most conventional way to deal with this problem is to use a correlation (Spearman's or Pearson's, depending if you want to treat the data as ordinal or interval).

I have a relevant question. I have twenty subjects to rate (Likert scale) 200 words twice. I would like to examine the test-retest reliability across twenty subjects. So, I obtained a correlation coefficient between the first-time rating and the second-time rating for each subject and tested whether these twenty correlation coefficients significantly deviated from zeros. Is it reasonable? Or, are there more appropriate ways to examine the test-retest reliability across subjects? Thanks!

Veda

Veda,

So did you create 20 correlations? If so perhaps something along the lines of Fleishman's (sp?) Multi-rater reliability may be a useful alternative.