When I wrote my post on natural user interface a couple months back (described by some as a "real barnstormer"), I glossed over most current forms of voice recognition. The problem as I saw it, and still see it, is that recognizing voice has to be bulletproof for it to be truly useful. Where computers fail at listening is that they can't fill in the blanks. Even if I don't catch every syllable (and I don't, I'm a pretty bad listener), I usually know what someone's saying -- sometimes halfway through the sentence -- based on context and logic. The other place where I win is that if I don't understand someone, I can usually ask them to clarify. Voice systems like Ford SYNC ask you "did you mean…" but when you yell "no!" for the tenth time you end up right where you started, the AI functioning as a tabula rasa. What's great about Apple's new Siri virtual personal assistant is that it specifically addresses all three of these hangups.
Siri has access to your current location, your to-do list, and your calendar. Since it's not operating blind, you don't have to be as "computer precise" with your instructions. In the Siri section of Apple's iPhone 4S demo video, our coif-tacular runner asks to move a meeting to noon, but Siri warns him that there's already a meeting at that time. Thanks to Siri's access of the address book, when you ask for directions home, it already knows what you're talking about.
More importantly, Siri has conversational context. In one example, packing-for-a-trip girl asks if it's going to be chilly in San Francisco. When Siri responds no (including the temperature low), PFATG says "what about Napa Valley?" Siri knows she's asking about the weather in Napa Valley due to the previous context of the conversation. This gets more important when you want to drill through the process of finding a restaurant, reading reviews, and finally booking a reservation -- it's all one conversation.
Recognizing context is a simple kind of logic, and very important for listening. When it comes to finding a good answer, however, a whole new level of logic is needed, and obviously this is the crux of Siri's AI -- all the hundreds of millions of dollars worth of it. One aspect I'm particularly excited about is the Wolfram Alpha integration. Wolfram Alpha makes more sense integrated into a personal assistant (like Siri) than subbing in as a search engine (like Google or Bing), and asking Siri to convert units or time zones, or to compare the land speed of Usain Bolt to a cheetah (my primary use for Wolfram Alpha revolves around such comparisons) is going to be endlessly useful / entertaining.
As per Apple's writeup of Siri: "Siri is proactive, so it will question you until it finds what you’re looking for." This is incredibly important. The promise of the other features is tantalizing enough, but Siri will only be more than a gimmick if it can fail well, and keep asking me questions until it gets it right. It's Siri's ability to rephrase and clarify that will allow for truly natural conversation patterns -- SYNC-style canned phrases are never going to cut it.
Ultimately, there's nothing new to these concepts. They just haven't been really executed well, and there's really no trick to AI execution -- it's just a lot of hard work. Siri started out in 2003 as "CALO" (Cognitive Assistant that Learns and Organizes), a DARPA-funded project with 300 crack researchers, which was then spun off as Siri in 2007. With eight or so years of development and millions of dollars of investment, Siri is pretty mature as far as AI goes -- for comparison's sake, IBM's Watson was started in 2005, and only had a staff of 15 researchers working on DeepQA (with 30 or so additional researchers working on other aspects).
Yesterday 9to5 Mac spoke with Norman Winarsky, who was involved in spinning off Siri after the CALO project ended and has been relentlessly self-promoting since rumors got hot and heavy of Apple's re-release of Siri. He calls Apple's "mainstreaming" of Siri a "world-changing event." That might be a little hyperbolic, but I really hope it's not.
Only two problems…
It all sounds very thrilling, but I just have two worries. First off, I don't see any reasonable excuse from Apple for making this only work on the iPhone 4S. I'm sure Apple will say that only the A5 processor is somehow capable of this AI processing... but why can't I just wait a couple more seconds for the decision? If the original Siri could run on a 3GS, why can't the new version at least work on the iPhone 4? It seems super cynical and short sighted to me.
My second problem is that from all the demos I've seen, there's no way to input text into Siri -- it's voice only. Outside of the mere inconvenience of that (you're in a loud place, you're in a need-to-be-quiet place, you've been gagged by your kidnappers), it feels a bit like a vote of no confidence from Apple. If Siri's AI is truly magnificent, wouldn't it be the preferred method of input for most things, most of the time? The lack of text input implies to me that Apple only thinks Siri is an efficient way to get things done when you're running or driving or folding laundry. So I wonder, is it really an efficient way to get things done at all?
Interestingly, the original version of Siri (which has been pulled from the App Store, and will expire for existing users on October 16th) has a field for text input, so… take that for what you will. Maybe I'm making too much of this, or maybe I'm missing some perfectly legitimate reasoning, but what I'm basically saying is this: for AI to be more than a gimmick, it has to be treated like more than a gimmick. Until it's vital and essential to an entire device experience, it's only going to be a cool party trick.
I'll leave you on this heartwarming note:
We made it, guys!