The testing and certification of foreign language proficiency have become increasingly important in education. High-stakes language tests are frequently used as gatekeeping tools, serving as key requirements for admission to and completion of educational programs. A standard feature of modern language assessment is the inclusion of open-ended tasks that require candidates to produce extended samples of writing and speech. However, assessing such tasks poses significant challenges: it is time-consuming, costly, and often suffers from reliability issues. As a result, automated assessment of both written and spoken production has become a critical area of research and commercial development.
Several automated language assessment systems have been developed. The most widely-known are e-rater and SpeechRater, created by Educational Testing Service (ETS) and used in the assessment of the TOEFL exam. Research indicates that these systems demonstrate high levels of accuracy, validity, and reliability. However, they provide limited meaningful feedback to both examiners and test-takers relating to specific linguistic features influencing the assessment result. This raises concerns about the accountability, particularly in cases when candidates seek justification for their scores.
This presentation will review research on the role of linguistic features in automated assessment of non-native speakers’ written and spoken production. It will include the results of my own studies, which examined how lexical and phraseological characteristics of EFL learners' written output contributed to the prediction of scores assigned by human raters.