It is discussed in the context of a client (with a powerful processor) using google to search media without text input from the user directly. It should be doable in Android or a PS3 XMB app or iPhone OS.
In that case, I would agree... but this doesn't even require a powerful processor. The powerful processors lives on the cloud, and you just need a fat pipe to that cloud (along with image and sound recording capability).
My original HTC/TMobile G1 with the following lowly specs:
http://www.htc.com/www/product/g1/specification.html
Is enough to perform voice and image recognition, as long as I have WiFi or 3G access... thanks to already published APIs in the android.speech package and yet-to-be-released APIs such as Google Goggles:
http://www.techradar.com/news/phone...e-plan-to-open-up-our-goggles-platform-683454