Developing an Online Test to Measure Writing and Speaking Skills Automatically

Developing an Online Test to Measure Writing and Speaking Skills Automatically

Gordon Bateson
DOI: 10.4018/IJCALLT.2021100105
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

As a result of the Japanese Ministry of Education's recent edict that students' written and spoken English should be assessed in university entrance exams, there is an urgent need for tools to help teachers and students prepare for these exams. Although some commercial tools already exist, they are generally expensive and inflexible. To address these shortcomings, a new open-source, online test for assessing English ability was developed. The test features the automatic grading not only of reading and listening, but also of speaking and writing. Thus, the general English ability of large numbers of students can be checked quickly online, making the test suitable for use in entrance exams and placements tests. It is based around the Moodle LMS and features several new plugins to automatically grade speaking and writing. This paper details plugin development, shows preliminary samples, and explains how test reliability will be verified by comparing students' scores with human-ratings and widely used tests such as IELTS, TOEIC, and CASEC.
Article Preview
Top

1. Introduction

For language teachers and institutions in Japan, the shift toward promoting English communication skills by including speaking and writing questions in university entrance exams presents some significant problems. Not least of these is how to develop and test English speaking and writing skills on a large scale, because typically the development of these skills requires frequent and personalized feedback from the teacher to each student, while testing requires each student's work to be read or heard individually by the assessor.

Table 1 compares several tools for testing English ability.

Table 1.
Comparison of general English ability tests
ijcallt.2021100105.g01

As can be seen, the major commercial tests are all available online nowadays and most include speaking and writing sections which are graded by human assessors. Notably the TOEFL test combines the human rating with the score from a proprietary auto-grading system called eRater. While the use of human assessors brings reliability to the test results and has led to their widespread acceptance as valid measures of ability, it also means that fees for taking these tests are high (5,000 yen is approximately 50 US dollars), and furthermore the results only become available a few weeks after the test has been taken.

To a certain extent these two drawbacks, the cost of taking the test and the time to produce results, have been overcome recently by several free and open-source software applications. They are free to use and provide instant results by grading the student's writing and speaking automatically. Essay text is input into a computer via a keyboard, while speech is dictated to a computer via a microphone. However, there currently exists no way to easily create writing and speaking “questions” that can be compiled into a test, and there is no open-source repository of questions or tests that covers the four skills of listening, reading, writing and speaking.

Therefore, the aim of this research project is to investigate the extent to which it is possible to measure automatically, via a freely-available online test, the English reading, writing, listening and speaking ability of a large number of Japanese students. As a by-product of creating this four-skills test, it will also be necessary to develop software to create and administer questions to test English writing and speaking ability.

One major consideration when planning this project was whether or not the auto-grading of writing and speaking is reliable, effective or useful. In discussing this point, Maccrea (2013) notes that “the similarities between the auto-grades and the hand-scored grades are remarkable”. However, she also concedes that “kids will always find a way to game these system”, because once the test takers realize the target vocabulary and discourse features that score points, they will use “well-constructed sentences and million-dollar words” to get high scores regardless of the value of the ideas, or the strength of the argument, in the writing. Greene (2018) voices a similar concern when he says, “the biggest problem with robo-grading continues to be the algorithm's inability to distinguish between quality and drivel.”

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024)
Volume 13: 1 Issue (2023)
Volume 12: 5 Issues (2022)
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing