I mostly cover machines with brains for The Verge, despite being a human without one.
The capacity of AI to generate content is overwhelming the web
That’s according to a report from The Information on the value for Google of YouTube as an AI training dataset. The fact that OpenAI scraped YouTube isn’t surprising, but the company is famously secretive about its training data, partly for competition reasons, and partly, it’s thought, to stymie potential lawsuits.
YouTube’s terms of service forbid using content for anything other than “personal, non-commercial use,” but it’s an open secret in the AI industry that everyone is scraping the web constantly. If Google protests too much, it would end up incriminating itself.
The spread of AI-generated misinformation is worrying lots of us, including Ars Technica’s AI reporter Benj Edwards. That’s why he bought the only remaining and regularly updated physical encyclopedia in existence. As Edwards explains: it’s expensive, sure, but at least it’s reliable, and won’t suffer link rot or stealth edits. Is it time to stock up on facts while we can?