GPU vs Multi-Core CPU

epicstruggle

Passenger on Serenity
Veteran
Just out of curiosity, would intel be able to use multi-core CPUs to compete against GPUs? There was some info released that Intel plans to release a 32 core CPU before the end of the decade, could they be used in desktop pc's to take over some graphics work?

epic
 
I don't think so. Firstly, consider that GPU's are going to get more and more parallel over the next few years. Secondly, there's still a fair amount of fixed-function hardware in GPU's that would take a large number of cycles to emulate on a CPU (e.g. triangle setup, bilinear filtering).

So, it's unlikely that CPU's will ever match GPU's in terms of raw processing power, and even if they did, the CPU would be spending lots of its time performing operations that have little to no cost in a GPU.

For the low-cost market, performance-per-watt will still favor a GPU over emulating the operations on the CPU.

And finally, such a setup would be totally useless for games, as the game would want to make use of the CPU for non-graphics work.
 
I think it would be safe to say a GPU is a few hundred times faster at what it does, everything considered, than a CPU. It's tailored for what it does, and trying to do that with a CPU would just be pointlessly inefficient.
 
epicstruggle said:
Just out of curiosity, would intel be able to use multi-core CPUs to compete against GPUs? There was some info released that Intel plans to release a 32 core CPU before the end of the decade, could they be used in desktop pc's to take over some graphics work?

epic

I think a Cell-like architecture (rather than symmetric multi-core) has a potential as an embedded GPU/media processor.

Sony was intending to follow a PS2 style strategy and use 2 Cell chips with a minimal Toshiba GPU instead of RSX initially. They decided to go for a single Cell and RSX partly because the Cell would not be quite as efficient for graphics rendering as a proper GPU, although it would be far more versatile. However, I understand the main reason was the fact that it would take ages to develop all the dev tools for Cell that the RSX already had, and utilising Cell for graphics rendering would require a similar learning curve to PS2 - something that Sony wanted to avoid.

The advantage of using a Cell-like architecture for embedded graphics, is it's flexibility. For media applications some of the cores can be used for media acceleration. For CAD applications some cores can be used for floating point acceleration. For games you can plug in a graphics card, and the embedded graphics capability of the cores doesn't go to waste - it can be allocated to serve as a PPU.
 
I'm still horrified that such a thread exists here, be ashamed epic, be very ashamed ;)

So, in order to improve the discussion's level a bit (that is, making it B3D-quality, and not that of other forums I won't name), I propose to discuss it at a slightly deeper level. First of all, it has to be considered that no matter how many CPU cores you got, and no matter their level of SSE-like performance, the assumption is that each core still works on serial data. This implies very expensive scheduling and branch prediction systems, in addition to expensive branch misses. The advantage, of course, is that just about any kind of code runs at a "reliable" speed.

Now, consider a GPU. For the sake of argument, let us take the R600 and G80, so the first is... Err, I mean, let's take the G70 instead. The VS is a MIMD processor with relatively cheap branching and no "branch miss"-like behaviour. It can hide some latency by switching threads, but it remains quite limited in that, so the VTF performance is less than optimal. But overall, that's likely to improve, because one of the possible ways to improve branching after a certain point also is to increase latency tolerance.

On the other hand, the G70 PS is a SIMD "monster" that tries to minimize scheduling overhead and maximize latency tolerance, at least in cases of low register counts (possibly in the hope that shaders with more register requirements might have more ineherent parallelism to hide the latency anyway). Branching performance just isn't there at all because the batches are so huge the required coherence is nowhere to be found, since it uses an all-or-nothing scheme. It's questionable whether you really need it for more than optimizations, though.

So, the PS is going to be nearly unbeatable by any CPU, ever. The "FLOPs/mm²" are downright insane (CELL is a weakling in comparaison) and the latency hiding is downright incredible compared to what you could get on a modern CPU, because as I said in the beggining, CPUs assume things to be serial. If you wanted to have hundreds of "threads" in flight to hide latency like on a CPU, you'd need at least 50x as many registers as currently available. Other schemes such as a create use of L1 or L2 might seem attractive, but they don't quite cut it either, imo.

So if we put the PS out of the equation, what's left? The VS, the GS and the fixed-function stuff. I think we can firstly safely conclude that there's no way in hell you can get a perf/mm² or perf/watt ratio within 5-10% of that of a GPU for things like Triangle Setup or Rasterization on a GPU. It's about as much of extreme case as you can imagine, and it's not by mistake the first GPUs accelerated that, even before proper bilinear!

The final question thus is that of the VS and GS units. As is nicely explained by Bob and others in another recent thread, the GS is rather icky to get parallelism out of by increasing the number of threads, due to the temporary storage requirements. So fundamentally, what you want there, on a GPU, is ILDP (Instruction Level Distributed Processing). There is an excellent patent from NVIDIA that uses instruction buffers, btw, and that'd fit nicely, although they present it as a generic solution for PS or VS too, unified or not.

But the scheduling cost is obviously higher, and CPUs themselves have some rather basic kinds of instruction level parallelism, and that's (for some parts) what makes it possible for them to be pipelined and yet not work on as many threads as there are pipeline stages. So the gap is much smaller there. Still, if you needed a texture fetch-like operation in your GS, the fact that a CPU core is fundamentally serial means you'll dance on your head before you can properly hide the latency. I doubt those will be as used as on the PS or VS, though.

But the GS is straight in the middle of the programmable pipeline, so unless the PCI-E bandwidth got sensibly lower, there's no way you could "offload" your GS to the CPU, and not put your VS on there too. And it's far from impossible to make VS-like operations highly efficient on a properly engineered CPU (see: CELL). I don't believe it's as efficient when it comes to perf/mm², but that's not really the discussion here, as long as it remains viably possible for anything but the high-end.

Personally, as I'm sure a fair bit of people will have noticed, I'm a big fan of ILDP-like architectures for certain kinds of workloads, and I'd tend to believe there will be a serious convergence between CPUs and GPUs because of MIMD + ILDP in the VS/GS architectures of at least one IHV. But at the same time, that remains to be seen, and CPU manufacturers don't seem to move into that direction anyway.

In conclusion, there could be a convergence in the coming few years that'd make IGP-level or even entry-level chips once again not require VS or GS capability to any serious degree. But for the PS, that's downright unthinkable, and any attempt would be at least a few orders of magnitude slower. And all that depends on which direction CPU manufactuers are truly taking; personally, I don't feel it's appropriate for even just the VS/GS possibility to be taken very seriously, at least not quite yet. And should they ever manage it, it might already be time for a whole new book of algorithms that wouldn't fit quite as nicely anymore...


Uttar
P.S.: The above poster's discussion about CELL is obviously a fair point, but as I explained, I don't believe CELL can be anywhere nearly efficient enough for proper Pixel Shading with some basic nicely-cached bilinear, yet alone texture filtering with Trilinear+AF. So its rumored usage in PS3 as a GPU, which always was bogus anyway, is completely redundant to any proper discussion imo. As for embedded platforms, besides consoles, you really only have handheld-like ones. And if you look at it from a perf/watt pov, that makes it irrelevant for handheld applications imo. Using CELL for Graphics doesn't make sense unless you use REYES imo, and don't even get me started on that...
 
Why do I get the sense that Uttar feels that epic just farted in the Church of the GPU (B3D)? :LOL:
 
geo said:
Why do I get the sense that Uttar feels that epic just farted in the Church of the GPU (B3D)? :LOL:
It was worth it. :)

epic
ps I plan on destroying Uttar's post, point by point, using only lies, innuendo, and name calling. ;)
 
yes cpu + gpu will become unified in the future, its inevitable the question is when.
i recall tim sweeney saying he believes the next generation of consoles will be like this ie ps4
sorry i couldnt find the quote but found the following piece

Jacob- And ten years from now do you vision that we will see... GPU's handling graphics, and PPU's handling physics, CPU doing A.I. and that kind of thing or do you think we will see some kind of blend of the 3 technologies or maybe 2 of them?

Sweeney- Looking at the long term future, the next 10 years or so, my hope and expectation is that there will be a real convergence between the CPU, GPU and non traditional architectures like the PhysX chip from Ageia, the Cell technology from Sony. You really want all those to evolve in the way of a large scale multicore CPU that has a lot of non traditional computing power as a GPU has now. A GPU processes a huge number of pixels in parallel using relatively simply control flow, CPU's are extremely good at random access logic, lots of branching, handling cache and things like that. I think really, essential, graphics and computing need to evolve together to the point where the future renderers I hope and expect will look a lot more like a software renderer from previous generations than a fixed function rasterizer pipeline and the stuff we have currently. I think GPU's will ultimately end up being... you know when we look at this 10 years from now, we will look back at GPU's being kinda a temporary fixed function hardware solution, to a problem that ultimately was, just general computing.
 
I've always thought Sweeney was a bit off in his predictions for future PC hardware. As an example, this cost him significantly with the original Unreal engine, which was designed for software rendering. He had properly anticipated the advance of CPU's, but had completely underestimated the progression of GPU's.

Edit: Note that I do have great respect for the guy, and think he's especially excellent for creating Unreal Script, but I think Carmack has always been much better at visualizing the future of gaming hardware.
 
Combining of GPU and CPU will be for the low end and for fixed systems (consoles). In the medium to high end it would mean having to buy them as a bundle and not be able to pick which one you wanted or mix combination as we are able to today.

I know a GPU + CPU combo part (more like CPU and media accelarator) are in the future, but I doubt there impact to the high end/gamer market on the PC.
 
Skrying said:
Combining of GPU and CPU will be for the low end and for fixed systems (consoles). In the medium to high end it would mean having to buy them as a bundle and not be able to pick which one you wanted or mix combination as we are able to today.

I know a GPU + CPU combo part (more like CPU and media accelarator) are in the future, but I doubt there impact to the high end/gamer market on the PC.

While I'm not willing to reject the idea entirely right now, I would prefer if the future holds higher efficiency and quality in the low end graphics sector than it does today. I'm sceptical that something like that could be achieved through any sort of combo parts.

As for consoles one has to wonder why XBox360 and/or PS3 despite having multi-core CPUs still have quite powerful graphics units.
 
Chalnoth said:
I've always thought Sweeney was a bit off in his predictions for future PC hardware. As an example, this cost him significantly with the original Unreal engine, which was designed for software rendering. He had properly anticipated the advance of CPU's, but had completely underestimated the progression of GPU's.

Edit: Note that I do have great respect for the guy, and think he's especially excellent for creating Unreal Script, but I think Carmack has always been much better at visualizing the future of gaming hardware.

I don't think that anyone of us doesn't respect Sweeney and his work; au contraire. Everytime in the past I had been puzzled with his comments, people used to say that I misinterpret his statements.

I recall another statement where he predicted along the lines that GPUs might become redundant for the lower end parts of the market and continue to exist only as luxury items for the high end for antialiasing and other added functionalities.

Interpret the above as you like, my problem being here that I personally would prefer features like AA/AF and in the foreseable future float HDR to become an always on standard for the entire market and not being treated as a luxury item for those that can actually afford it.

I have no single doubt that floating point processing power and the amount of cores in future GPUs will scale to unbelievable heights, what I fail though to see is how the final result is going to come even close overall to the constantly increasing amount of units in GPUs. How many ALUs did we have 5 years ago and with what theoretical processing power in GPUs and how does the picture look today? I fail to see how that gap could ever close, unless I'm missing a few serious details for future prospects.
 
Ailuros said:
While I'm not willing to reject the idea entirely right now, I would prefer if the future holds higher efficiency and quality in the low end graphics sector than it does today. I'm sceptical that something like that could be achieved through any sort of combo parts.
Well, I expect it will basically be a combo part in that the part is designed much like today's multi-core CPU's, except that one region of the die holds a GPU instead of housing more CPU real estate. Intel should be able to do this relatively easily, and such a thing would make an excellent supplement to their current Celeron line of budget processors. AMD would have a harder time, for obvious reasons, but may be able to do such a thing through some sort of agreement with nVidia, ATI, or VIA (VIA may be the most likely).

I somewhat expect that this sort of thing would be implemented by connecting the CPU to the GPU via an on-die PCI Express bus, to make it easier for OS's to deal with the novel hardware configuration (such a thing obviously wouldn't limit performance, but may cost a bit of die area), as well as allow Intel or AMD to allow the architecture to have as much in common as possible with its add-in-board equivalent.
 
That would make rather IGPs somewhat redundant and not lowest end standalone GPUs and I still don't see how the quality and performance bar would rise in total with sollutions like that. Cheaper most likely yes, yet better than exactly what?
 
As I am started a D3D10 software rasterizier as a spare time project (very low priority) that should use a Shader to SSE jitter approach we maybe can compare the shader power in the future. As far as I can see the math power of a CPU will not be so bad but as already said the texture fetches will kill you. But I don’t think that this is the only question here. A 32 core CPU can maybe beat a current GPU when there is less texture access but how many quads will a GPU process the day we got 32 core CPU?

PS: In the case someone things I am crazy to write a software rasterzier for D3D10. The first reason is that D3D10 RefRast is a real pain but the real point is to have a future option to decide where I run some non graphics processing. With a D3D10 device that works on a CPU I can easily move the jobs between CPU, GPU or IGP.
 
Ailuros said:
That would make rather IGPs somewhat redundant and not lowest end standalone GPUs and I still don't see how the quality and performance bar would rise in total with sollutions like that. Cheaper most likely yes, yet better than exactly what?
Oh, pretty much just cheaper. I mean, there's always the possibility that new technologies like MRAM (which supposedly has the speed of SRAM and the potential density of DRAM) will change the landscape significantly and allow an integrated GPU to perform admirably, but I think this sort of thing will be just as you say, a replacement for the IGP.
 
Demirug said:
PS: In the case someone things I am crazy to write a software rasterzier for D3D10.
Well, personally, I think it makes more sense than writing a software rasterizer for old fully fixed-function hardware. With the advent of programmable GPU's, the gap between GPU's and CPU's has definitely closed (though CPU's spent a few years stalled there before we started going multicore, that's going to change soon, and in the near future we should see nearly parallel gains in power between CPU's and GPU's).

It's just that you're still left with the tasks that a GPU does with ease, but a CPU would have to spend many cycles on. So such a thing will never make sense for 3D games. But it may make a lot of sense in the GPGPU market if CPU's get powerful/parallel enough: with such a setup, the software that runs through the D3D10 software renderer on the CPU would just be another target to send a processing thread to.
 
Chalnoth said:
Well, personally, I think it makes more sense than writing a software rasterizer for old fully fixed-function hardware. With the advent of programmable GPU's, the gap between GPU's and CPU's has definitely closed (though CPU's spent a few years stalled there before we started going multicore, that's going to change soon, and in the near future we should see nearly parallel gains in power between CPU's and GPU's).

I think it’s still too early to tell who will win the power increase challenge. Maybe in a few years a diagram that compares the number of shading units with the number of CPU cores over the time could be interesting.

Chalnoth said:
It's just that you're still left with the tasks that a GPU does with ease, but a CPU would have to spend many cycles on. So such a thing will never make sense for 3D games. But it may make a lot of sense in the GPGPU market if CPU's get powerful/parallel enough: with such a setup, the software that runs through the D3D10 software renderer on the CPU would just be another target to send a processing thread to.

I am shortly make a joke about porting the whole thing to Red Storm. Unfortunately the typical player doesn’t have such a system in the attic. And if the have the money investing it in a large grid of multi GPU could give them more bang for the bucks.
 
Last edited by a moderator:
The type of CPU core is also important. Cell SPE type cores (DSPs with very fast local store) can perhaps be used to substitute for a GPU. A conventional multi-core SMP CPU - forget it.

I think there is an application for such cores where you need strong media handling capabilities, but only low end 3d acceleration (no games) eg. video editing and mpeg4 encoding and multiple mpeg4 decoding. Of course as I said before, you can add high end 3d acceleration to get a really nice games machine by plugging it a graphics card and get physics acceleration as well from the DSP cores that were being used to handle GPU tasks.

The question is whether this level the multi-media acceleration it allows will justify the use of the more expensive DSP cores rather than a low end embedded GPU.

I think on a media heavy Vista based media PC, it may be viable. If you require basic 3d acceleration for the Vista OS only with simultaneous multimedia acceleration in one scenario, and 3d games requiring low end graphics acceleration (eg. MS flight simulator with no media acceleration) in a second scenario on the base machine (onboard graphics), with the option of plugging in a high end graphics card to get a high end games machine with physics acceleration in the third scenario. Although the DSP cores are more expensive when used as a GPU, the DSP's flexibility could make it cheaper for the same capability because the separate GPU/media acceleration solution will underutilise part of it's resources in some or all of the three scenarios.
 
Back
Top