GitHub and OpenAI have launched a technical preview of a new AI tool called Copilot, which lives inside the Visual Studio Code editor and autocompletes code snippets.
Copilot does more than just parrot back code it’s seen before, according to GitHub. It instead analyzes the code you’ve already written and generates new matching code, including specific functions that were previously called. Examples on the project’s website include automatically writing the code to import tweets, draw a scatterplot, or grab a Goodreads rating.
A descendant of GPT-3
GitHub sees this as an evolution of pair programming, where two coders will work on the same project to catch each others’ mistakes and speed up the development process. With Copilot, one of those coders is virtual.
This project is the first major result of Microsoft’s $1 billion investment into OpenAI, the research firm now led by Y Combinator president Sam Altman. Since Altman took the reins, OpenAI has pivoted from a nonprofit status to a “capped-profit” model, took on the Microsoft investment, and started licensing its GPT-3 text-generation algorithm.
Copilot is built on a new algorithm called OpenAI Codex, which OpenAI CTO Greg Brockman describes as a descendant of GPT-3.
GPT-3 is OpenAI’s flagship language-generating algorithm, which can generate text sometimes indistinguishable to human writing. It’s able to write so convincingly because of its sheer size of 175 billion parameters, or adjustable knobs that allow the algorithm to connect relationships between letters, words, phrases, and sentences.
While GPT-3 generates English, OpenAI Codex generates code. OpenAI plans to release a version of Codex through its API later this summer so developers can built their own apps with the tech, a representative for OpenAI told The Verge in an email.
Codex was trained on terabytes of openly available code pulled from GitHub, as well as English language examples.
While testimonials on the site rave about the productivity gains Copilot provides, GitHub implies that not all the code utilized was vetted for bugs, insecure practices, or personal data. The company writes they have put a few filters in place to prevent Copilot from generating offensive language, but it might not be perfect.
“Due to the pre-release nature of the underlying technology, GitHub Copilot may sometimes produce undesired outputs, including biased, discriminatory, abusive, or offensive outputs,” Copilot’s website says.
Given criticisms of GPT-3’s bias and abusive language patterns, it seems that OpenAI hasn’t found a way to prevent algorithms from inheriting its training data’s worst elements.
The company also warns that the model could suggest email addresses, API keys, or phone numbers, but that this is rare and the data has been found to be synthetic or pseudo-randomly generated by the algorithm. However, the code generated by Copilot is largely original. A test performed by GitHub found that only 0.1 percent of generated code could be found verbatim in the training set.
This isn’t the first project to try to automatically generate code to help toiling programmers. The startup Kite pitches a very similar functionality, with availability on more than 16 code editors.
Right now, Copilot is in a restricted technical preview, but you can sign up on the project’s website for a chance to access it.