After months of rumors and speculation, OpenAI has announced GPT-4: the latest in its line of AI language models that power applications like ChatGPT and the new Bing.
The company claims the model is “more creative and collaborative than ever before” and “can solve difficult problems with greater accuracy.” It can parse both text and image input, though it can only respond via text. OpenAI also cautions that the systems retain many of the same problems as earlier language models, including a tendency to make up information (or “hallucinate”) and the capacity to generate violent and harmful text.
OpenAI says it’s already partnered with a number of companies to integrate GPT-4 into their products, including Duolingo, Stripe, and Khan Academy. The new model is available to the general public via ChatGPT Plus, OpenAI’s $20 monthly ChatGPT subscription, and is powering Microsoft’s Bing chatbot. It will also be accessible as an API for developers to build on. (There is a waitlist here, which OpenAI says will start admitting users today.)
In a research blog post, OpenAI said the distinction between GPT-4 and its predecessor GPT-3.5 is “subtle” in casual conversation (GPT-3.5 is the model that powers ChatGPT). OpenAI CEO Sam Altman tweeted that GPT-4 “is still flawed, still limited” but that it also “still seems more impressive on first use than it does after you spend more time with it.”
The company says GPT-4’s improvements are evident in the system’s performance on a number of tests and benchmarks, including the Uniform Bar Exam, LSAT, SAT Math, and SAT Evidence-Based Reading & Writing exams. In the exams mentioned, GPT-4 scored in the 88th percentile and above, and a full list of exams and the system’s scores can be seen here.
Speculation about GPT-4 and its capabilities have been rife over the past year, with many suggesting it would be a huge leap over previous systems. However, judging from OpenAI’s announcement, the improvement is more iterative, as the company previously warned.
“People are begging to be disappointed and they will be,” said Altman in an interview about GPT-4 in January. “The hype is just like... We don’t have an actual AGI and that’s sort of what’s expected of us.”
The rumor mill was further energized last week after a Microsoft executive let slip that the system would launch this week in an interview with the German press. The executive also suggested the system would be multi-modal — that is, able to generate not only text but other mediums. Many AI researchers believe that multi-modal systems that integrate text, audio, and video offer the best path toward building more capable AI systems.
GPT-4 is indeed multimodal, but in fewer mediums than some predicted. OpenAI says the system can accept both text and image inputs and emit text outputs. The company says the model’s ability to parse text and image simultaneously allows it to interpret more complex input. In the samples below, you can see the system explaining memes and unusual images:
It’s been a long journey to get to GPT-4, with OpenAI — and AI language models in general — building momentum slowly over several years before rocketing into the mainstream in recent months.
The original research paper describing GPT was published in 2018, with GPT-2 announced in 2019 and GPT-3 in 2020. These models are trained on huge datasets of text, much of it scraped from the internet, which is mined for statistical patterns. These patterns are then used to predict what word follows another. It’s a relatively simple mechanism to describe, but the end result is flexible systems that can generate, summarize, and rephrase writing, as well as perform other text-based tasks like translation or generating code.
OpenAI originally delayed the release of its GPT models for fear they would be used for malicious purposes like generating spam and misinformation. But in late 2022, the company launched ChatGPT — a conversational chatbot based on GPT-3.5 that anyone could access. ChatGPT’s launch triggered a frenzy in the tech world, with Microsoft soon following it with its own AI chatbot Bing (part of the Bing search engine) and Google scrambling to catch up.
As predicted, the wider availability of these AI language models has created problems and challenges. The education system is still adapting to the existence of software that writes respectable college essays; online sites like Stack Overflow and sci-fi magazine Clarkesworld have had to close submissions due to an influx of AI-generated content; and early uses of AI writing tools in journalism have been rocky at best. But, some experts have argued that the harmful effects have still been less than anticipated.
In its announcement of GPT-4, OpenAI stressed that the system had gone through six months of safety training, and that in internal tests, it was “82 percent less likely to respond to requests for disallowed content and 40 percent more likely to produce factual responses than GPT-3.5.”
However, that doesn’t mean the system doesn’t make mistakes or output harmful content. For example, Microsoft revealed that its Bing chatbot has been powered by GPT-4 all along, and many users were able to break Bing’s guardrails in all sorts of creative ways, getting the bot to offer dangerous advice, threaten users, and make up information. GPT-4 also still lacks knowledge about events “that have occurred after the vast majority of its data cuts off” in September 2021