GPU vs Multi-Core CPU

Xmas · Jul 18, 2006

Chalnoth said:
More than that, because subsample resolution is also important, and the level of FSAA used will vary greatly.

How much supersampling do you think is feasible at the mid-high-end?

And we're really either there already, if this old calculation is correct, or on the cusp of it.

I doubt it. A lot of triangles submitted for rendering don't need to be stored at all. Show me one wireframe shot of a current game where the triangles are really that small.

SPM · Jul 18, 2006

Well if DSP type CPU cores functioning as GPUs takes off, I can tell you where it will happen first - in mobile phones, mobile phones with movie clip cameras, and HDTVs. These devices will have pretty awesome DSPs to get every bit of compression/decompression possible out of movies, and when you want to play games on them (not the most cutting edge of course), you won't be using the DSPs for decoding/encoding, which means you might as well use them as a GPU rather than add hardware for that.

KimB · Jul 19, 2006

Xmas said:
How much supersampling do you think is feasible at the mid-high-end?

I'm not talking about supersampling, just multisampling, and multisampling has the same effect on z-buffer accesses as supersampling (which is the most stark difference between TBDR's and IMR's).

I doubt it. A lot of triangles submitted for rendering don't need to be stored at all. Show me one wireframe shot of a current game where the triangles are really that small.

Well, two things:
1. You can only effectively remove triangles that are submitted that are either backface culled, or are entirely clipped outside of the view area.
2. Since one a triangle requires significantly more storage than a pixel, triangles can be quite a bit larger than pixel-sized. Exactly how much would obviously depend upon how many attributes are used.

Anyway, don't expect any game tests from me for at least another week. I'm off in Santa Fe for a conference until then

KimB · Jul 19, 2006

SPM said:
Well if DSP type CPU cores functioning as GPUs takes off, I can tell you where it will happen first - in mobile phones, mobile phones with movie clip cameras, and HDTVs. These devices will have pretty awesome DSPs to get every bit of compression/decompression possible out of movies, and when you want to play games on them (not the most cutting edge of course), you won't be using the DSPs for decoding/encoding, which means you might as well use them as a GPU rather than add hardware for that.

No possible way. Mobile phones are exceptionally sensitive to power requirements, and thus you'd definitely want to go for the power advantages of having specialized hardware.

SPM · Jul 19, 2006

Chalnoth said:
No possible way. Mobile phones are exceptionally sensitive to power requirements, and thus you'd definitely want to go for the power advantages of having specialized hardware.

Are you suggesting that having a GPU and a DSP on a mobile phone, is going to consume less power than the DSP on it's own?

Are you suggesting that DSPs can't be power efficient, but GPUs can?

Mobile phones have exceptionally high bandwidth cost and low bandwidth requirements for video, so this is the real limiting factor, not battery life. This is also the reason why Toshiba and Sony have been talking about use of Cell in mobile phones (probably one or two special low power SPEs teamed up with a separate embedded low power CPU like an ARM processor).

JF_Aidan_Pryde · Jul 19, 2006

A few games are starting to use a fully deferred rendering engine (eg. Stalker). How does this relate/affect deferred rendering hardware?

Xmas · Jul 19, 2006

Chalnoth said:
I'm not talking about supersampling, just multisampling, and multisampling has the same effect on z-buffer accesses as supersampling (which is the most stark difference between TBDR's and IMR's).

But multisampling can be an "always-on" feature in the low-end as well, especially on a deferred renderer where it is even cheaper than on an IMR. So that does not increase the "resolution span".

Well, two things:
1. You can only effectively remove triangles that are submitted that are either backface culled, or are entirely clipped outside of the view area.
2. Since one a triangle requires significantly more storage than a pixel, triangles can be quite a bit larger than pixel-sized. Exactly how much would obviously depend upon how many attributes are used.

1. There's more. If you compare the list of submitted triangles to the list of those triangles contributing to the final rendering, you will find triangles missing from the latter for several reasons: outside the view area, backfacing, not hitting a single pixel/sample, hidden by other triangles, rejected by stencil test, and maybe others.

While it is extremely hard to get the ideal, minimal list of contributing triangles under all circumstances, all of the above reasons can in some way be used to reduce the number of triangles that have to be stored.
For example, if a game doesn't use a geometry LOD system and a 10,000 triangle object viewed at a distance happens to cover only 20 pixels/samples, at least 9,980 triangles can be safely culled. Of course such a case usually means really bad aliasing and should be avoided.
There are also methods where an object may be stored but never read, like predicated rendering.

2. Obviously. And on how much bandwidth deferred rendering can save in other areas. But don't forget that, as triangles get smaller, IMRs lose efficiency as well.

Xmas · Jul 19, 2006

JF_Aidan_Pryde said:
A few games are starting to use a fully deferred rendering engine (eg. Stalker). How does this relate/affect deferred rendering hardware?

It depends. If the API allows the application to tell the driver that MRTs used to store surface properties are to be read 1:1 later, a TBDR can keep that data entirely on-chip, thus saving a huge amount of bandwidth and memory space. It could even be combined with multisampling.

A Z-first pass hurts a TBDR because it means doing the same work twice. However it should be very easy for any application to skip such a pass.

KimB · Jul 19, 2006

SPM said:
Are you suggesting that having a GPU and a DSP on a mobile phone, is going to consume less power than the DSP on it's own?

Are you suggesting that DSPs can't be power efficient, but GPUs can?

Absolutely not. What I'm suggesting is that you just can't make generalized hardware be as power-efficient as specialized hardware. If you want to make the DSP to the graphics as well, you're either going to have very subpar graphics (which sort of defeats the entire point), or have to have a much more powerful DSP. And not only that, but dedicated hardware can do much better at managing things like memory bandwidth than specialized hardware.

The primary concern with dedicated hardware, of course, is that parts that are not in use may still draw power. For this you need to have good power savings circuit design, but the technology is already there.

KimB · Jul 19, 2006

Xmas said:
1. There's more. If you compare the list of submitted triangles to the list of those triangles contributing to the final rendering, you will find triangles missing from the latter for several reasons: outside the view area, backfacing, not hitting a single pixel/sample, hidden by other triangles, rejected by stencil test, and maybe others.

I don't think you can possibly do the stencil/depth tests while binning the triangles, except perhaps in a very gross sense (Hierarchical-Z, for instance). To do so would require you to have an external z-buffer. Your comment about sub-pixel triangles makes some sense, but due to the aliasing inherent in that, I don't think that's something that IHV's should seek to optimize for.

2. Obviously. And on how much bandwidth deferred rendering can save in other areas. But don't forget that, as triangles get smaller, IMRs lose efficiency as well.

Yes, but the things that cause a loss of efficiency in IMR's with small triangles will cause a similar loss in efficiency with TBDR's (ex. you need to have a quad to calculate the partial derivatives for texture coordinates).

KimB · Jul 19, 2006

Xmas said:
A Z-first pass hurts a TBDR because it means doing the same work twice. However it should be very easy for any application to skip such a pass.

You need to do a Z-first pass for stencil shadows, though.

JohnH · Jul 19, 2006

Chalnoth said:
I don't think you can possibly do the stencil/depth tests while binning the triangles, except perhaps in a very gross sense (Hierarchical-Z, for instance).

No more gross than the hierachical Z used on any modern IMR.

To do so would require you to have an external z-buffer.

Only a low resolution version, this may even easily fit on chip depending on tile size.

Your comment about sub-pixel triangles makes some sense, but due to the aliasing inherent in that, I don't think that's something that IHV's should seek to optimize for.

Its actually the culling triangles that don't cross sample points, this does not cause aliasing as by definition they are never rasterised.

Yes, but the things that cause a loss of efficiency in IMR's with small triangles will cause a similar loss in efficiency with TBDR's (ex. you need to have a quad to calculate the partial derivatives for texture coordinates).

That isn't actually a given, there is some dependency on the arrangement of your pipeline and the presence of certain other provisions.

Cheers,
John.

Xmas · Jul 19, 2006

Chalnoth said:
I don't think you can possibly do the stencil/depth tests while binning the triangles, except perhaps in a very gross sense (Hierarchical-Z, for instance). To do so would require you to have an external z-buffer.

Hierarchical Z doesn't require an external Z-buffer on IMRs.

Your comment about sub-pixel triangles makes some sense, but due to the aliasing inherent in that, I don't think that's something that IHV's should seek to optimize for.

But if you have a well working geometry LOD system you shouldn't have much trouble with very small triangles requiring too much bandwidth.
Sub-pixel triangles do happen, unfortunately, and "optimizing" for them can be cheaper than keeping them. Degenerate triangles need to be detected as well (since many people use them to generate long triangle strips).

For the rest, see JohnH's answers.

JohnH said:
Its actually the culling triangles that don't cross sample points, this does not cause aliasing as by definition they are never rasterised.

It's not the culling itself that causes aliasing, but the fact that only one (almost randomly selected) of many small triangles contributes to the final image. By ignoring part of the information you get "sampling holes". The only way around this is massive supersampling or a geometry LOD system.

JohnH · Jul 19, 2006

Xmas said:
It's not the culling itself that causes aliasing, but the fact that only one (almost randomly selected) of many small triangles contributes to the final image. By ignoring part of the information you get "sampling holes". The only way around this is massive supersampling or a geometry LOD system.

If a triangle doesn't cross a sample point its not rasterised anyway so optimising it out doesn't cause holes to appear that wouldn't have been present anyway. This behaviour comes from clearly defined rasterisation rules, any resulting holes are actually the result of an incorrectly defined mesh (generally as a resultof non shared, non exactly = vertices on common edges).

Regards,
John.

Xmas · Jul 19, 2006

JohnH said:
If a triangle doesn't cross a sample point its not rasterised anyway so optimising it out doesn't cause holes to appear that wouldn't have been present anyway. This behaviour comes from clearly defined rasterisation rules, any resulting holes are actually the result of an incorrectly defined mesh (generally as a resultof non shared, non exactly = vertices on common edges).

I'm not referring to holes in the rendered geometry, but "sampling holes" as in the distance between two samples being too large in relation to the frequency of the sampled signal, i.e. geometry. A highly complex mesh covering only a few pixels (sampled only at a few points) often results in aliasing. As I said, it's not the culling that causes aliasing.

JohnH · Jul 19, 2006

Xmas said:
I'm not referring to holes in the rendered geometry, but "sampling holes" as in the distance between two samples being too large in relation to the frequency of the sampled signal, i.e. geometry. A highly complex mesh covering only a few pixels (sampled only at a few points) often results in aliasing. As I said, it's not the culling that causes aliasing.

The only way of fixing that is to increase your sample rate i.e. apply AA, maybe I've missed Chalnoths original point, but it still remains valid to cull non sample crossing poly's.

Later,
John.

KimB · Jul 19, 2006

Xmas said:
Hierarchical Z doesn't require an external Z-buffer on IMRs.

Hence the word, "except."

Xmas said:
But if you have a well working geometry LOD system you shouldn't have much trouble with very small triangles requiring too much bandwidth.
Sub-pixel triangles do happen, unfortunately, and "optimizing" for them can be cheaper than keeping them. Degenerate triangles need to be detected as well (since many people use them to generate long triangle strips).

Degenerate tris are easy. But to omit triangles that don't cross any sample points, but are within the view area would require a significant amount of computation per-triangle, for information that would most likely have to be thrown away.

JohnH said:
That isn't actually a given, there is some dependency on the arrangement of your pipeline and the presence of certain other provisions.

Of course there is. But it's not like you don't have these problems in a TBDR as well. It's all about optimizing for small triangles, which is going to have to happen either way.

JohnH · Jul 19, 2006

Chalnoth said:
But to omit triangles that don't cross any sample points, but are within the view area would require a significant amount of computation per-triangle, for information that would most likely have to be thrown away.

Not true, its actually very cheap to discard the bulk of small non sample point crossing triangles.

Of course there is. But it's not like you don't have these problems in a TBDR as well. It's all about optimizing for small triangles, which is going to have to happen either way.

There are techniques that require HW that already exists one form in certain tilers, not to say they can't be done on an IMR the relative area cost is just higher.

John.

epicstruggle · Jul 25, 2006

Is it safe to assume that Amd noted this thread, and seeing the threat decided that Ati needed to be bought? I think so.

KimB · Jul 25, 2006

epicstruggle said:
Is it safe to assume that Amd noted this thread, and seeing the threat decided that Ati needed to be bought? I think so.

Of course. You didn't know that we here at B3D control the entire industry?

GPU vs Multi-Core CPU

Xmas

Porous

SPM

KimB

KimB

SPM

JF_Aidan_Pryde

Xmas

Porous

Xmas

Porous

KimB

KimB

KimB

JohnH

Xmas

Porous

JohnH

Xmas

Porous

JohnH

KimB

JohnH

epicstruggle

Passenger on Serenity

KimB

Similar threads