The Sunlight Foundation and the Media Standards Trust have released a new browser extension along with a website called "Churnalism." The purpose of the tool is to automatically check websites for plagiarized content. The tool works by parsing the text on the site and comparing it to a specialized database that includes Wikipedia, press release sites, government sources, and a few others. Originally launched in the UK in 2011, it's is now available in the US.
Essentially, Churnalism automatically highlights text that it believes was copied from other sources, though it's not smart enough to distinguish between quotes and actual plagiarism. Even though that could be seen as a failing, it does give users a chance to sometimes find quoted sources that weren't otherwise clearly sourced. In a statement (which, yes, Churnalism is likely to know came from a press release), Media Standards Trust director Martin Moore said "If we're going to reward original news we have to be able to distinguish it from content that's just copy-pasted."
The technology behind Churnalism's database is interesting in its own right — rather than using a brute-force text search across its 20GB corpus of source documents, it uses an algorithm to "hash" text into strings of numbers that can be searched more quickly. Theoretically, the process should make it easier to expand Churnalism's sources in the future.