Diffbot is a machine learning system that can read any web page and extract useful, structured data from it. Today it announced a $2 million series A round of funding from an impressive list of names, including the founders of Earthlink and Sun Microsystems alongside executives from Twitter, Facebook and AOL.
The main function of Diffbot is to turn the open web into an easily digestible API. So for example, AOL is one of Diffbot's clients. It has numerous online media properties using different content management systems. Rather than trying to figure out how to organize all of that, it uses Diffbot to read new articles from all of those properties and extract that data into one simple API. That way its new tablet magazine app, Editions, can pull stories from across all AOL properties and display them on the iPad in real time.
"Diffbot understands a web page no matter how often it is redesigned."
So far Diffbot has been limited to understanding front pages and article pages, but founder Michael Tung told us that the company is now expanding to cover a much wider range. "So for example we can now understand recipes," Tung explained. "So you could build an app that lets users bookmark recipes from any page on the web, and Diffbot could pull the recipe and break it down into ingredients and instructions."
Right now the company has two basic services. It can scan URLs that a customer sends them or it can monitor a URL for a customer and alert them to changes, something Tung says many clients are using to keep an eye on their competition. The company works on a freemium model, with the first 10,000 API calls per month being free and tiered pricing after that. "Right now we are processing over 100 million API calls per month," said Tung.
The new funds, says Tung, will be used to scale the companies servers to keep up with demand and hire machine learning experts in a crowded and competitive marketplace.