View Full Version : PeakStream acquired by Google
B3D News
05-Jun-2007, 21:59
GPGPU middleware developer PeakStream has been acquired by Google, according to The Register. Where this leaves the GPGPU development platform is unclear, although it has stopped selling the product for the moment.
Read the full news item (http://www.beyond3d.com/content/news/247)
digitalwanderer
05-Jun-2007, 22:03
Uhm, why? :???:
Tim Murray
05-Jun-2007, 22:06
Nobody really knows. Obviously, Google is basically using parallel processing for *everything*, but how much GPGPU impacts them is unknown. Or, maybe I just missed some papers they've released.
3dilettante
05-Jun-2007, 22:36
I did not see that one coming.
Sounds like a long-term investment with Google's eye on the expanding amount of data it intends to index/harvest.
This sounds like there's going to be huge amounts data mining going on.
With an owned subsidiary producing the tools, a lot of proprietary data can be more securely held, and any special things Google wants can be added.
Of course, if they have that much money to burn, they could go whole-hog and buy AMD, or wait for the liquidiation sale.
I wonder if they'll switch all their indexing servers to GPGPU now. . . man, that'd cause some heartburn at Intel. "This is war!", indeed.
Killer-Kris
06-Jun-2007, 02:04
I wonder if they'll switch all their indexing servers to GPGPU now. . . man, that'd cause some heartburn at Intel. "This is war!", indeed.
Last I hear PeakStream wasn't being designed exclusively for GPUs but would work on massively multi-core CPUs as well. They can still purchase what performs the best for any given workload, and I imagine within Google there would be use for both types of throughput processors.
Last I hear PeakStream wasn't being designed exclusively for GPUs but would work on massively multi-core CPUs as well. They can still purchase what performs the best for any given workload, and I imagine within Google there would be use for both types of throughput processors.
Sure, but what's ahead right now? What Google wants to do, as I understand it, has already been "pulled to the left" (to use Doug Carmean's language). Or at least natively rested there anyway. And while Intel obviously plans an "Empire Strikes Back" counter-offensive, it's still a bit in the distance. . . and it's not exactly going to catch anyone by surprise either (i.e. the gpgpu folks won't be resting on their laurels between now and then).
Last I hear PeakStream wasn't being designed exclusively for GPUs but would work on massively multi-core CPUs as well. They can still purchase what performs the best for any given workload, and I imagine within Google there would be use for both types of throughput processors.
Google has more than enough experience with massively multi-core work. That's the entire point of their MapReduce (http://labs.google.com/papers/mapreduce.html) library. The only reason they would buy PeakStream would be for GPGPU work.
Now comes the question: what GPU is Google going to use? Peakstream already has a CTM backend so that looks likely, but I'm sure Nvidia would be willing to give some developer support if it means selling 1,000 or 10,000 cards.
Last estimate I saw had Google at 450k servers. Tho my understanding is that is unofficial. Presumably they wouldn't turn them all over in two months or somesuch, but still. If that all went to one specific high-end GPU in a one year period. . . sweet Jaysus.
If I'm Google I damn sure am going to make sure that backend supports both G80 and R600, because I want that bargaining leverage when I go shopping for high-end GPUs in the 100's of thousands of units. . . .
Tim Murray
06-Jun-2007, 02:32
Last estimate I saw had Google at 450k servers. Tho my understanding is that is unofficial. Presumably they wouldn't turn them all over in two months or somesuch, but still. If that all went to one specific high-end GPU in a one year period. . . sweet Jaysus.
I will predict exactly zero chance of that. Power consumption is too high, plus I doubt that they have THAT much work that can be done with a GPU. Simply having a data-parallel problem isn't enough; you need a decent algorithm (or enough of a speed increase using a worse algorithm) to make it worthwhile. And that's not always easy to do with a GPU, considering certain penalties (memory access, branching, etc).
Frankly, the more I think about this. . . I'm getting actual chills. If this is a wholesale move by google to gpgpu. . . I just can't think of a more serious validation of the whole gpgpu idea. And that actually scares me a little bit. Even after everything else that's happened the last year more or less hinting that all the big players are taking this very seriously indeed.
Why? Because there's a very old saying. . . "When you strike at a king, you must kill him." Because if you don't, he shall surely kill you. GPGPU is striking at King CPU in a very serious way now.
Killer-Kris
06-Jun-2007, 02:44
I will predict exactly zero chance of that. Power consumption is too high, plus I doubt that they have THAT much work that can be done with a GPU.
Even searching IN pictures and videos?
I will predict exactly zero chance of that. Power consumption is too high.
Err, what? Is it? What are the FLOPS/watt of high end CPUs vs high-end GPUs right now? If that was true, Peakstream wouldn't have been worth buying in the first place.
Killer-Kris
06-Jun-2007, 02:50
The way I'm looking at this whole situation is that this might merely be a future investment so that they have ultimate flexibility to choose the hardware best suited for their tasks, hopefully with out much reworking between platforms. So for highly coherent tasks like searching within pictures and videos a GPU might be advantageous, while the tasks that their current cluster farm is doing might be better suited for a throughput CPU.
Now obviously Google thinks they have something to gain through this purchase but I wouldn't be surprised if it was PeakStream who initiated it. After all life is much safer and secure when you are part of a large, growing, and profitable company. Not to mention it's a whole lot easier to try and break new ground in that situation as well.
Tim Murray
06-Jun-2007, 04:05
Err, what? Is it? What are the FLOPS/watt of high end CPUs vs high-end GPUs right now? If that was true, Peakstream wouldn't have been worth buying in the first place.
FLOPS/watt isn't the only consideration here. There are specific algorithms that you can use on a GPU versus a CPU, and even if you have 100x the FLOPS on a GPU, it won't mean diddly squat if you have to use some algorithm on the GPU that is a thousand times slower than your CPU algorithm. I have my doubts that the GPU is faster on a whole for Google's purposes, but there are certainly some things where it is faster. KK(ris, not utaragi) brings up an interesting point with regards to searching in video. I don't know too much about the algorithms used here (or if you even use traditional image comparison algorithms; maybe you use neural nets or something like that, which would certainly work well on a GPU). However, even with GPGPU, I don't know that they have anywhere close to the processing power to do this on a large scale.
I'll do some research into that, though. I think there was a group working on this at CMU, and I'll try to track them down.
I think the point of Google buying Peakstream is not gpgpu, but to make all their parallel software hardware-independent - Since future hardware architecture is not 100% decided (as you can see by different approaches of Intel, Nvidia, AMD), the value of Peakstream is that they won't need to rewrite thousands of man-years of software when new hardware architectures become available.
Ok, to me, this all makes sense.
Why?
Searching, financial calculations, genetics, image recognition and A.I. or as some people like to call them intelligent agents!
Here is just a few of the things I've dug up.
GPU-based Sorting in PostgreSQL - http://www.andrew.cmu.edu/user/ngm/15-823/project/Draft.pdf
Bio-Sequence Database Scanning on a GPU - http://www.hicomb.org/papers/HICOMB2006-01.pdf
Compute-Intensive, Highly Parallel Applications and Uses
http://www.intelceleron.net/technology/itj/2005/volume09issue02/vol09_iss02.pdf
Certainly if indeed Google has that many servers, then they have something to gain from this technology. And if they choose to eventually upgrade those servers to GPGPU technology, then the stock will soar. Also, if they choose this path and they didn't buy Peakstream but a competitor like Microsoft did, they could be in trouble. Each of these points individually make sense already, but altogether they make a whole manner of sense.
It also reminds me that I think I read somewhere that Cell was pretty good at XML parsing and even certain fileserver duties (of the type where you are sure you can't keep that many in memory anyway). GPGPU and related streaming capabilities are definitely going to go somewhere in the future, that's for sure.
Last estimate I saw had Google at 450k servers. Tho my understanding is that is unofficial. Presumably they wouldn't turn them all over in two months or somesuch, but still. If that all went to one specific high-end GPU in a one year period. . . sweet Jaysus.
There's no way they can drop in GPUs to their current servers. Their current servers are just solid CPUs + RAM, no expansion slots of any sort. Plus they're 1U so there's no space to fit a GPU.
If they're going to do any GPGPU, they're building a cluster from the ground up. The problem is with density. A standard 2U server will let you have 2 GPUs, but that's fairly lame (lots of wasted space). With 2 Quadro Plexs (3U together) and a 1U server controlling the pair you can have 4 GPUs in 4U, or with a GX2 style card 8 GPUs in 4U. 1GPU/U really doesn't seem very compelling. 2GPU/U does, but there is no GX2 G80 (yet?).
Google must have done this as a fairly forward-looking purchase. The market simply is not mature enough at this point to go out and build a large (100+) GPGPU cluster. It'll get there fairly soon (especially if lots of money starts being thrown at problems), but we're talking on the order of a year or two out.
Bouncing Zabaglione Bros.
06-Jun-2007, 10:11
An interesting question is whether Peakstream is now "off the table" and will exclusively be working on proprietary tech for Google's internal use, or if the cash being poured in will see Peakstream products available for the whole market at some point in the future.
An interesting question is whether Peakstream is now "off the table" and will exclusively be working on proprietary tech for Google's internal use, or if the cash being poured in will see Peakstream products available for the whole market at some point in the future.
I think that actually doesn't even matter that much. If Google uses it only internally, but it is perceived by the external market to be successful, then more money will pour into R&D with rivalling companies all over the place. But for now there is little reason for Google to keep it all to themselves. There is always a right time and price. ;)
NocturnDragon
06-Jun-2007, 13:18
Just do add more ideas in the thread:
http://glinden.blogspot.com/2006/06/four-petabytes-in-memory.html
Google reportedly (http://glinden.blogspot.com/2006/04/100k-new-servers-per-quarter-at-google.html) had an estimated 450k machines two months ago and adds machines at roughly 100k per quarter. In 2004, each of these machines had 2-4G (http://en.wikipedia.org/wiki/Google_platform) of memory, and, two years later, likely are up to 8G standard.While it's true that they cannot just add GPUs to their current servers, they could start using servers with GPUs in them, or just whole racks full of GPUs. (isn't lasso supposed to be just that?)
I think the 100k also count as upgraded machines not only new ones.
So 400K - 800K gpus a year wouldn't be too exagerated from this POV.
Creating some new GPGPU friendly algorithms for indexing and data mining certainly is going to be difficult, but if they bought Peakstream they sure know how to use it.
But some other workloads could start to take advantage of it right away.
http://arstechnica.com/news.ars/post/20070530-facial-recognition-slipped-into-google-image-search.html
Today is facial recognition, tomorrow? Objects identification?
There's no way they can drop in GPUs to their current servers. Their current servers are just solid CPUs + RAM, no expansion slots of any sort. Plus they're 1U so there's no space to fit a GPU.
If they're going to do any GPGPU, they're building a cluster from the ground up. The problem is with density. A standard 2U server will let you have 2 GPUs, but that's fairly lame (lots of wasted space). With 2 Quadro Plexs (3U together) and a 1U server controlling the pair you can have 4 GPUs in 4U, or with a GX2 style card 8 GPUs in 4U. 1GPU/U really doesn't seem very compelling. 2GPU/U does, but there is no GX2 G80 (yet?).
They were an AMD launch partner for professional gpgpu, and surely AMD has given this some thought and made progress on the matter. Rumor has it that Nvidia will be launching a productized gpgpu solution in the near future, which would also indicate they must have been giving some thought to how to package the physicals for industrial kind of use. Surely it would come as a surprise to neither that a company interested in gpgpu applications would likely be looking at implementing more than 2 or 3 gpgpus to support it.
Voltron
07-Jun-2007, 05:41
This article claims that reason for the acquisition is x86 multi-threading rather than GPGPU. Not that that The Reg or anyone alse has a perfect track record on speculation concerning Google.
http://www.theregister.com/2007/06/06/google_peakstream_server/
This article claims that reason for the acquisition is x86 multi-threading rather than GPGPU. Not that that The Reg or anyone alse has a perfect track record on speculation concerning Google.
http://www.theregister.com/2007/06/06/google_peakstream_server/
Wow that was amazingly melodramatic, but I simply don't see it. The point of PeakStream is to make parallel programming targeted at multi-core and GPGPUs easy. Google has absolutely no need (that's worth buying a company) for the compiler or multi-core work the article talks about. Sure if the X86 compiler guys they got could optimize stuff to make it run 5% faster it'd be nice, but that really doesn't matter for Google. What matters is getting it to run on 2x as many machines with 2x as much data, which is the point of their MapReduce I linked above. Also any sort of multi-core work that PeakStream has does is a complete joke compared to MapReduce, running something on a single quad core vs across a 10,000 node cluster.
Now I'm not saying PeakStream didn't have smart x86 guys that'll be put to good use at Google, but they're just a nice thing on the side. The only thing that made PeakStream unique from any other compiler company was the GPGPU work, and that's the only sensible reason Google would buy them.
Edit: I was looking through some google papers and found a nice quote (page 7) in this mapreduce (http://www.cs.virginia.edu/~pact2006/program/mapreduce-pact06-keynote.pdf) presentation that shows my point:
Single-thread performance doesn't matter
We have large problems and total throughput/$ more important than peak performance.
The only reason Google cares about PeakStream is if their work can give them higher throughput/$ than their current MapReduce. The only way it can do that is GPGPU.
DemoCoder
07-Jun-2007, 07:58
Actually, Google cares about throughput per watt more than anything else. They are facing serious power density problems, and Google is not interested in being an environmental power hog just because they've got enough money to build their own power plant if they wish.
MapReduce is great at distributing functional bits over a large cluster (actually, it's MapReduce + GFS + a component not known by the public yet) but it does nothing to help boost the throughput per watt per cluster node.
It is highly wasteful to keep scaling by adding more cases, more PSUs, more mainboard chipsets, more RAM, more local disk, more everything because you are not leveraging Moore's law in your favor.
Google knows the future is in throughput computing. They've looked seriously at Sun Niagara, at Azul Vega, at BlueGene, in trying to pack more throughput per box. Eventually, Google will need to move in the future to nodes with throughput oriented CPUs or a high number of non-SMP cores.
There's no way they're going to install GPUs on 450k machines (their current machines have no GPU acceleration) and there's a fat change they're gonna install a G80 or R600 power-sucking GPU on any of their nodes.
My guess is the acquisition boils down to the following:
1) google is always looking to hire very smart teams, they buy companies sometimes just to get their engineers
2) peakstream may be useful to them for the purpose of distributing mapreduce jobs *written in a platform independent* mostly-functional kernel-oriented language that compiles regardless of the target (x86, power, sparc, etc) Remember, MapReduce programs still need to run on a node, and there's no reason that they have to be written in C.
3) prevent anyone else from using peakstream (e.g. MS)
They may not have a GPU plan in mind at the moment, and frankly, given the kinds of work that Google does today with M/R, I don't see those jobs running in GPUs. There's tons of scatter/gather going on within a MR job. MapReduce != Stream Computing. The "kernels" of work that a given node deals with in MR can involve dozens of operating on gigabytes of work (for example, graph traversal jobs) Remember, Google is tossing around *Petabytes* of data on MapReduce.
Tim Murray
07-Jun-2007, 16:09
Actually, Google cares about throughput per watt more than anything else. They are facing serious power density problems, and Google is not interested in being an environmental power hog just because they've got enough money to build their own power plant if they wish.
I agree with you there, which is why I don't agree with this:
There's no way they're going to install GPUs on 450k machines (their current machines have no GPU acceleration) and there's a fat change they're gonna install a G80 or R600 power-sucking GPU on any of their nodes.
Google is constantly expanding its search capabilities, and if they want to get into things like searching inside pictures or video they could certainly use GPUs for some of that.
I think what's most likely is that we'll see a GPGPU-esque MapReduce. GPGPU workloads are generally data-parallel to begin with, and the PeakStream execution model already created kernels on the fly. Translating this to something that could run across, say, 1000 RV630s--there's no reason to use something like R600/G80, since you're looking at 4x the power consumption for 2-3x the performance--doesn't seem like much of a leap to me. And if you have 1000 RV630s, you're looking at a theoretical peak of 192TFLOPS for 45kW.
Sure, you aren't going to use RV630s, but I think you see my point.
NocturnDragon
07-Jun-2007, 16:51
I've found a nice presentation about MapReduce to understand it a bit better:
Experiences with MapReduce, an abstraction for large-scale computation (http://www.google.com/search?lr=&ie=UTF-8&oe=UTF-8&q=Experiences+with+MapReduce%2C+an+abstraction+for +large-scale+computation+Dean), Jeffrey Dean (http://labs.google.com/people/jeff/), Proc. 15th International Conference on Parallel Architectures and Compilation Techniques, 2006
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.