石井先生の「L2 writingにおける人間の評価を何らかの変数で予測しようとした研究などの論文リスト」も参考になりますね。
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-raterⓇ v. 2. The Journal of Technology, Learning, and Assessment, 4(3). [PDF]
Attali, Y., & Powers, D. (2008). A developmental writing scale (Research Report No. ETS RR-08-19). [PDF]
Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The criterion online writing service. AI Magazine, 25(3), 27-36. [PDF]
Chodorow, M., & Burstein, J. (2004). Beyond essay length: Evaluating e-raterⓇ’s performance on TOEFLⓇ essays. (TOEFL Research Reports No. 73). [PDF]
Lee, Y., Gentile, C., & Kantor, R. (2008). Analytic scoring of TOEFL CBT essays: Scores from humans and e-raterⓇ (TOEFL Research Reports No. RR-81). [PDF]
Powers, D., Burstein, J., Chodorow, M., Fowles, M., & Kukich, K. (2001). Stumping E-Rater: Challenging the validity of automated essay scoring (GRE Board Professional Report No. 98-08bP). [PDF]
Quinlan, T., Higgins, D., & Wolff, S. (2009). Evaluating the construct-coverage of the e-raterⓇ scoring engine (Research Report No. ETS RR-09-01). [PDF]
Ramineni, C., Trapani, C.S., Williamson, D.M., Davey, T., & Bridgeman, B. (2012). Evaluation of the e-raterⓇ scoring engine for the TOEFLⓇ independent and integrated Prompts (Research Report No. ETS RR-12-06). [PDF]
Yang, Y., Buckendahl, C., Juszkiewicz, P., & Bhola, D. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15(4), 391-412. [PDF]
田地野彰, 細越響子, 川西慧, 日髙佑郁,髙橋幸, 金丸敏幸. (2011). 「アカデミックライティング授業におけるフィードバックの研究- CriterionⓇを導入した授業実践からの示唆-」. 京都大学高等教育研究, 17, 97-108. [PDF]
細越響子, 金丸敏幸, 髙橋幸, 田地野彰. (2012). 「英文産出に与えるフィードバックの効果検証-CriterionⓇとピア・フィードバックに焦点をあてて-」. 言語処理学会第18回年次大会発表論文集. 1158-1161. [PDF]