One of the world’s most prestigious machine learning conferences has banned authors from using AI tools like ChatGPT to write scientific papers, triggering a debate about the role of AI-generated text in academia.
The International Conference on Machine Learning (ICML) announced the policy earlier this week, stating, “Papers that include text generated from a large-scale language model (LLM) such as ChatGPT are prohibited unless the produced text is presented as a part of the paper’s experimental analysis.” The news sparked widespread discussion on social media, with AI academics and researchers both defending and criticizing the policy. The conference’s organizers responded by publishing a longer statement explaining their thinking. (The ICML responded to requests from The Verge for comment by directing us to this same statement.)
According to the ICML, the rise of publicly accessible AI language models like ChatGPT — a general purpose AI chatbot that launched on the web last November — represents an “exciting” development that nevertheless comes with “unanticipated consequences [and] unanswered questions.” The ICML says these include questions about who owns the output of such systems (they are trained on public data, which is usually collected without consent and sometimes regurgitate this information verbatim) and whether text and images generated by AI should be “considered novel or mere derivatives of existing work.”
Are AI writing tools just assistants or something more?
The latter question connects to a tricky debate about authorship — that is, who “writes” an AI-generated text: the machine or its human controller? This is particularly important given that the ICML is only banning text “produced entirely” by AI. The conference’s organizers say they are not prohibiting the use of tools like ChatGPT “for editing or polishing author-written text” and note that many authors already used “semi-automated editing tools” like grammar-correcting software Grammarly for this purpose.
“It is certain that these questions, and many more, will be answered over time, as these large-scale generative models are more widely adopted. However, we do not yet have any clear answers to any of these questions,” write the conference’s organizers.
As a result, the ICML says its ban on AI-generated text will be reevaluated next year.
The questions the ICML is addressing may not be easily resolved, though. The availability of AI tools like ChatGPT is causing confusion for many organizations, some of which have responded with their own bans. Last year, coding Q&A site Stack Overflow banned users from submitting responses created with ChatGPT, while New York City’s Department of Education blocked access to the tool for anyone on its network just this week.
AI language models are autocomplete tools with no inherent sense of factuality
In each case, there are different fears about the harmful effects of AI-generated text. One of the most common is that the output of these systems is simply unreliable. These AI tools are vast autocomplete systems, trained to predict which word follows the next in any given sentence. As such, they have no hard-coded database of “facts” to draw on — just the ability to write plausible-sounding statements. This means they have a tendency to present false information as truth since whether a given sentence sounds plausible does not guarantee its factuality.
In the case of ICML’s ban on AI-generated text, another potential challenge is distinguishing between writing that has only been “polished” or “edited” by AI and that which has been “produced entirely” by these tools. At what point do a number of small AI-guided corrections constitute a larger rewrite? What if a user asks an AI tool to summarize their paper in a snappy abstract? Does this count as freshly generated text (because the text is new) or mere polishing (because it’s a summary of words the author did write)?
Before the ICML clarified the remit of its policy, many researchers worried that a potential ban on AI-generated text could also be harmful to those who don’t speak or write English as their first language. Professor Yoav Goldberg of the Bar-Ilan University in Israel told The Verge that a blanket ban on the use of AI writing tools would be an act of gatekeeping against these communities.
“There is a clear unconscious bias when evaluating papers in peer review to prefer more fluent ones, and this works in favor of native speakers,” says Goldberg. “By using tools like ChatGPT to help phrase their ideas, it seems that many non-native speakers believe they can ‘level the playing field’ around these issues.” Such tools may be able to help researchers save time, said Goldberg, as well as better communicate with their peers.
But AI writing tools are also qualitatively different from simpler software like Grammarly. Deb Raji, an AI research fellow at the Mozilla Foundation, told The Verge that it made sense for the ICML to introduce policy specifically aimed at these systems. Like Goldberg, she said she’d heard from non-native English speakers that such tools can be “incredibly useful” for drafting papers, and added that language models have the potential to make more drastic changes to text.
“I see LLMs as quite distinct from something like auto-correct or Grammarly, which are corrective and educational tools,” said Raji. “Although it can be used for this purpose, LLMs are not explicitly designed to adjust the structure and language of text that is already written — it has other more problematic capabilities as well, such as the generation of novel text and spam.”
“At the end of the day the authors sign on the paper, and have a reputation to hold.”
Goldberg said that while he thought it was certainly possible for academics to generate papers entirely using AI, “there is very little incentive for them to actually do it.”
“At the end of the day the authors sign on the paper, and have a reputation to hold,” he said. “Even if the fake paper somehow goes through peer review, any incorrect statement will be associated with the author, and ‘stick’ with them for their entire careers.”
This point is particularly important given that there is no completely reliable way to detect AI-generated text. Even the ICML notes that foolproof detection is “difficult” and that the conference will not be proactively enforcing its ban by running submissions through detector software. Instead, it will only investigate submissions that have been flagged by other academics as suspect.
In other words: in response to the rise of disruptive and novel technology, the organizers are relying on traditional social mechanisms to enforce academic norms. AI may be used to polish, edit, or write text, but it will still be up to humans to assess its worth.