Skip to main content

    How to game a robotic essay grader: big words matter, factual accuracy doesn't

    How to game a robotic essay grader: big words matter, factual accuracy doesn't


    MIT writing director Les Perelman discusses the limits of automated essay grading software, which has recently been found to provide scores similar to those of human graders.

    Share this story

    A recent study (PDF) at the University of Akron in Ohio has determined that there's little difference between human essay graders and their machine counterparts, at least when looking at high-volume SAT-style papers. After evaluating thousands of essays either with human readers or with one of nine automated scoring engines, researchers found that scores for the same sets of essays had generally similar means and standard deviations regardless of who or what had graded them.

    MIT writing director Les Perelman, however, has found an amusing way to exploit the limitations of machine grading algorithms. After testing the Educational Testing Service e-Rater program and looking at the algorithms behind it, Perelman found that the elements behind a high-scoring essay were simple. Above all, e-Rater is looking for size, whether that's big words, long sentences with certain connective phrases like "however," or essays above a certain word count.

    Since e-Rater doesn't include a method for fact-checking or rhetorical analysis, though, even nonsensical essays can achieve good marks. Take, for example, this piece arguing that college tuition is high because of greedy teaching assistants. "The average teaching assistant makes six times as much money as college presidents... In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures." The essay was given the top score of 6.

    In real life, the designers of grading software don't intend for it to be the sole grader of high-stakes exams, and the Akron study's authors are quick to point out that it can't catch nuanced or creative writing. "If you go to a business school or an engineering school, they’re not looking for creative writers," says Akron Dean Mark Shermis. "They’re looking for people who can communicate ideas." Likewise, composition exams aren't usually meant to check for factual accuracy, although egregious examples like the above would probably get noticed. Like any time- or cost-saving tool, though, the software is likely to be overused on occasion, so it's worth at least knowing what an automated system can't do.