GPU vs Multi-Core CPU

KimB · Jul 16, 2006

Demirug said:
I think it’s still too early to tell who will win the power increase challenge. Maybe in a few years a diagram that compares the number of shading units with the number of CPU cores over the time could be interesting.

Well, I think they'll be roughly similar. CPU's have the advantage that they've only very recently started to implement thread-level parallelism (for consumer processors). Leveraging this more could prove a tremendous advantage. GPU's have the advantage that they've not yet been engineered as carefully to allow for high clockspeeds. It could go either way, I think, but in very rough overall terms, I expect the gains to be similar.

I am shortly make a joke about porting the whole thing to Red Storm. Unfortunately the typical player doesn’t have such a system in the attic. And if the have the money investing it in a large grid of multi GPU could give them more bang for the bucks.

Yes! Play UT2007 on Red Storm! Hehe

More seriously, though, it could possibly be useful. For GPGPU apps, CPU's are much closer to the performance of GPU's, even in the best of cases for GPU performance, than they would be for normal software rendering. With CPU's in your typical system becoming more and more powerful, being able to send threads to both the CPU and the GPU could potentially be useful for supercomputing.

One might envison, for example, a future supercomputer that has, on each node, two GPU's and four CPU's. The GPU's could be there for people who can make their algorithms run well, likely through third party libraries, within OpenGL 3.0 (Windows on a supercomputer seems unlikely), and the CPU's for those who can't. One might gain a maximal amount of power, then, by leveraging both.

P.S. No quotes on links in UBB

KimB · Jul 16, 2006

SPM said:
The type of CPU core is also important. Cell SPE type cores (DSPs with very fast local store) can perhaps be used to substitute for a GPU. A conventional multi-core SMP CPU - forget it.

Not for graphics work. While a Cell-like CPU might approach a GPU's shader math capabilities, it'll still spend way too much time with texture filtering, triangle setup, FSAA, z-buffer testing, etc..

Natoma · Jul 16, 2006

Uttar said:
I'm still horrified that such a thread exists here, be ashamed epic, be very ashamed

So, in order to improve the discussion's level a bit (that is, making it B3D-quality, and not that of other forums I won't name), I propose to discuss it at a slightly deeper level.

Show off.

Dio · Jul 16, 2006

SPM said:
They decided to go for a single Cell and RSX partly because the Cell would not be quite as efficient for graphics rendering as a proper GPU, although it would be far more versatile.

I think the most likely reason is that they realised they don't have enough expertise with making rasterising engines (ably demonstrated with the mess that is the PS2 backend).

There really are only two companies in the world left with the ability to build high-performance render back-ends - particularly if you consider the additional absolute requirements for cost-effectiveness and R&D/manufacturing execution.

I'm pretty certain games companies all over the world breathed a huge sigh of relief the day it was announced.

Although certain parts of the industry seem to believe CPU's will take over at some point, I remain unconvinced - texture filtering and memory bandwidth efficiency in particular seem the problems with no viable solution.

pascal · Jul 16, 2006

epicstruggle said:
Just out of curiosity, would intel be able to use multi-core CPUs to compete against GPUs? There was some info released that Intel plans to release a 32 core CPU before the end of the decade, could they be used in desktop pc's to take over some graphics work?

epic

Probably no in its current form.

But what if intel is hard pressed and decide to try again to change the paradigms?
Intel some time ago tried to change with the i860: http://en.wikipedia.org/wiki/Intel_i860

edited: SGI used the i860 in the Reality Engine 2 http://hardware.majix.org/computers/sgi.onyx/images/onyx.33.big.jpg

Imagine a chip with 32 new custom designed VLIW processors capable of doing some graphics, DSP. Add some small number of speciallized processors and improved memory architecture (larger buses, faster memory solded on PCB, onchip memory controller, etc...).

Back to the drawing board...

Ailuros · Jul 17, 2006

Chalnoth said:
Oh, pretty much just cheaper. I mean, there's always the possibility that new technologies like MRAM (which supposedly has the speed of SRAM and the potential density of DRAM) will change the landscape significantly and allow an integrated GPU to perform admirably, but I think this sort of thing will be just as you say, a replacement for the IGP.

If bandwidth is the only deciding factor for an increase of overall performance in embedded devices (whether IGPs or multi-core CPU+graphics unit), then another idea would be to simply use deferred renderers.

Frankly I don't see it being the only deciding factor though; transistor budgets (mainly due to the need to keep those type of devices as cost efficient as possible) are usually incredibly low and thus any more bandwidth won't necessarily save the day to call either/or sufficient as a real 3D capable device in the end.

That shouldn't mean though that lowest end graphics cannot get a lot better than they are today.

KimB · Jul 17, 2006

I still don't think deferred renderers will ever be useful in the PC space, even for low-end designs. In particular, they'll still have to draw just as many polygons as any other GPU.

Xmas · Jul 17, 2006

Chalnoth said:
I still don't think deferred renderers will ever be useful in the PC space, even for low-end designs. In particular, they'll still have to draw just as many polygons as any other GPU.

Checking for visibility is not the same as drawing them. Anyway, how would that be a disadvantage?

As long as we're using rasterization for 3D graphics, I don't see CPUs replacing GPUs, even in the low end. Graphics cores integrated into CPUs, maybe, but that might be simply due to power savings and mobile devices (and living-room consoles) taking over many of the typical low-end PC applications in the long run.

KimB · Jul 17, 2006

Xmas said:
Checking for visibility is not the same as drawing them. Anyway, how would that be a disadvantage?

Polygons take more storage space than z-buffer pixels, and will thus require more memory and memory bandwidth as the average triangle size gets small. Since the poly counts are going to be increasing steadily, I expect that immediate-mode renderers will have an easier time with higher polycounts.

Xmas · Jul 17, 2006

Chalnoth said:
Polygons take more storage space than z-buffer pixels, and will thus require more memory and memory bandwidth as the average triangle size gets small. Since the poly counts are going to be increasing steadily, I expect that immediate-mode renderers will have an easier time with higher polycounts.

Rendering resolutions, samples per pixel and colour depth are increasing as well. I wonder what figures you expect to see wrt average vertex size and pixels per polygon.

KimB · Jul 17, 2006

Xmas said:
Rendering resolutions, samples per pixel and colour depth are increasing as well. I wonder what figures you expect to see wrt average vertex size and pixels per polygon.

Right, but rendering resolutions, samples per pixel and color depth are all scalable things. We're talking about low-end hardware here.

Xmas · Jul 17, 2006

Chalnoth said:
Right, but rendering resolutions, samples per pixel and color depth are all scalable things. We're talking about low-end hardware here.

So is polygon count. Otherwise low-end hardware wouldn't be able to get away with a quarter of VS performance.

KimB · Jul 17, 2006

Xmas said:
So is polygon count. Otherwise low-end hardware wouldn't be able to get away with a quarter of VS performance.

Poly count isn't nearly as scalable as resolution.

JohnH · Jul 17, 2006

Chalnoth said:
Polygons take more storage space than z-buffer pixels, and will thus require more memory and memory bandwidth as the average triangle size gets small. Since the poly counts are going to be increasing steadily, I expect that immediate-mode renderers will have an easier time with higher polycounts.

Of course how difficult a time you might have does depend on how many of those polygons you think you migth need to actually store, and how you might actually store them.

Naturally IMR's are yet to be able to address the issue of performant OIT and other functions requireing more complex per pixel data structures, which are becoming even more important with increasing poly counts.

So, imo it isn't anywhere near as clean cut as you indicate.

John.

Ailuros · Jul 18, 2006

Chalnoth said:
I still don't think deferred renderers will ever be useful in the PC space, even for low-end designs. In particular, they'll still have to draw just as many polygons as any other GPU.

For the time being DRs haven't escaped yet the PDA/mobile realm. Ironically there they do exceptionally well with polygon throughput, unlike some competing sollutions with just funky paper claims (Falanx/ARM work being a possible exception until I see it in real time).

rwolf · Jul 18, 2006

Uttar said:
blah, blah, blah

Doesn't mean you can have a multi-core cpu with one core being a cpu, and another core being a gpu, and then have a wicked memory controller in the future. Imagine wiping out the latency of transferring objects between main memory and the GPU memory.

Or

You could add parallel execution units and new instructions for parallel execution. That is essentially what the GPUs are doing and moving towards.

KimB · Jul 18, 2006

Ailuros said:
For the time being DRs haven't escaped yet the PDA/mobile realm. Ironically there they do exceptionally well with polygon throughput, unlike some competing sollutions with just funky paper claims (Falanx/ARM work being a possible exception until I see it in real time).

I still don't buy that that would translate well to a PC part. The rendering loads are rather different.

Xmas · Jul 18, 2006

Chalnoth said:
Poly count isn't nearly as scalable as resolution.

Given a somewhat useful geometry LOD system and a game targeting high-end GPUs running at 1600x1200 with a certain polygon count, you should be able to divide the polygon count by 4 or 6, and dividing the resolution by 6.25 (to 640x480) is probably the lowest sensible resolution even in the low-end.

Anyway, please answer these questions for yourself:
What's the average number of pixels per polygon you expect in the future?
What's the average data per polygon/vertex you expect?
What's the percentage of polygons and vertices that need to be stored?

Then you can do the math on how small triangles need to get for storing them to become inefficient. If they become even smaller, triangle rasterization as a whole will become inefficient.

KimB · Jul 18, 2006

Xmas said:
Given a somewhat useful geometry LOD system and a game targeting high-end GPUs running at 1600x1200 with a certain polygon count, you should be able to divide the polygon count by 4 or 6, and dividing the resolution by 6.25 (to 640x480) is probably the lowest sensible resolution even in the low-end.

More than that, because subsample resolution is also important, and the level of FSAA used will vary greatly.

Anyway, please answer these questions for yourself:
What's the average number of pixels per polygon you expect in the future?
What's the average data per polygon/vertex you expect?
What's the percentage of polygons and vertices that need to be stored?

Then you can do the math on how small triangles need to get for storing them to become inefficient. If they become even smaller, triangle rasterization as a whole will become inefficient.

Unfortunately, I don't feel like doing this now (rather tired, sorry). But I remember doing this ages and ages ago, and got a number of the order of a few hundred thousand triangles per scene would be the breakeven point in terms of storage space (it'll be more or less depending upon the amount of attributes for each triangle...I seem to remember the number I used for this calculation was 90 bytes per vertex, one vertex per triangle). You'd have to go a bit higher for bandwidth breakeven.

And we're really either there already, if this old calculation is correct, or on the cusp of it.

JohnH · Jul 18, 2006

Chalnoth said:
More than that, because subsample resolution is also important, and the level of FSAA used will vary greatly.

Unfortunately, I don't feel like doing this now (rather tired, sorry). But I remember doing this ages and ages ago, and got a number of the order of a few hundred thousand triangles per scene would be the breakeven point in terms of storage space (it'll be more or less depending upon the amount of attributes for each triangle...I seem to remember the number I used for this calculation was 90 bytes per vertex, one vertex per triangle). You'd have to go a bit higher for bandwidth breakeven.

And we're really either there already, if this old calculation is correct, or on the cusp of it.

And what resoloution and AA level was that figure arrived at for? Did you take into account off screen, back face and non sample point cross culling? Did you take into account any other potenual technologies that reduce the number and size of stored primitives?

Sorry to be critical but the reality is that its very easy to draw incorrect concluesions for a technology, particularily if you're not familiar with the techniques that might be employed by a state of the art implementation.

And to re-iterate what I said previously, there are also a set of problems that are becoming increasingly pertinent that IMR's are not currently able to provide a performant solution to.

John.

GPU vs Multi-Core CPU

KimB

KimB

Natoma

Dio

pascal

Ailuros

Epsilon plus three

KimB

Xmas

Porous

KimB

Xmas

Porous

KimB

Xmas

Porous

KimB

JohnH

Ailuros

Epsilon plus three

rwolf

Rock Star

KimB

Xmas

Porous

KimB

JohnH

Similar threads