Cube Maps and Next Gen Consoles

Acert93 · Oct 4, 2006

Cube Maps are one way to do reflections on modern GPUs. From what I have read it can take 6 passes to do a cube map on current GPUs, although with D3D10 this can be done in a single pass (thanks to the geometry shaders and streamout).

Because of the unique environment consoles offer (in which the platform is stable, exploitable, and in many cases offers features and performance not found in many average desktopns) we see console devs do some really amazing stuff.

So my question: With the current hardware in the PS3 and Xbox 360 are there ways to speed up cube mapping? And are there better/comparable techniques that can be done faster than cube mapping? (After seeing nAo ingenuity in using NAO32 for HDR effects I wouldn't be surprised!)

mrboo · Oct 4, 2006

If DX10 does it in 1 pass then should'nt PS3? nAo already said that Cell+RSX vertex shaders are way more flexable then dx10's geometry shader's

Or am i completely misunderstanding him?

Rolf N · Oct 4, 2006

Single-pass cubemap rendering is a totally boneheaded idea. You forfeit the ability to do per-face culling for approximately zero gain, not to mention all the unneccessary complications you're causing for the hardware guys. It's just silly differentiating-your-graphics-API-with-a-hammer.

Rolf N · Oct 4, 2006

mrboo said:
If DX10 does it in 1 pass then should'nt PS3?

DX10 is an API. It does nothing. Devices and drivers can do something.
Nobody here is prepared to tell you how fast that feature will be implemented in hardware, or if it's just going to be a bunch of software hacks in the drivers.

flick556 · Oct 4, 2006

If I understand correctly, since a triangle can face at most 3 sides of a cube map they use the gs to determine what these sides are? Kind of like determining what texture(s)/render_target to assign a triangle to dynamically based on their transform. It would be pretty cool if the gs can do this. The CPU could do this if vertex transforms where done on the CPU.

MrWibble · Oct 4, 2006

flick556 said:
If I understand correctly, since a triangle can face at most 3 sides of a cube map they use the gs to determine what these sides are? Kind of like determining what texture(s)/render_target to assign a triangle to dynamically based on their transform. It would be pretty cool if the gs can do this. The CPU could do this if vertex transforms where done on the CPU.

A triangle could be visible on 5 sides of a cubemap, if it were large/close enough.

Not that this changes or answers your question

davepermen · Oct 4, 2006

zeckensack said:
Single-pass cubemap rendering is a totally boneheaded idea. You forfeit the ability to do per-face culling for approximately zero gain, not to mention all the unneccessary complications you're causing for the hardware guys. It's just silly differentiating-your-graphics-API-with-a-hammer.

... what culling? if you render onto a cubemap there is none.. and this is why it's very useful, it doesn't transform the triangle 6 times and culls it then 3 to 5 times just for nothing. (and renderstate setup 6 times instead of once, etc).

22psi · Oct 4, 2006

Forgive this naive question, isn't Xenos sort of close to DX10? If so, would it be able to pull this off with less than 6 passes?

MrWibble · Oct 4, 2006

davepermen said:
... what culling? if you render onto a cubemap there is none.. and this is why it's very useful, it doesn't transform the triangle 6 times and culls it then 3 to 5 times just for nothing. (and renderstate setup 6 times instead of once, etc).

Well there might be some savings, but basically you'd still end up with culling somewhere - identifying which faces to render to is still a culling operation, and you'd still need to rasterise it seperately for each face.

I'm not sure how DX10 or DX10-grade hardware will help or hinder this, nor whether it's a good idea. I can also see that possible ways of achieving it would be possible on earlier (or different, such as Cell+RSX in PS3) hardware anyway.

For example on PS3, I might process the triangle stream on SPU, letting the SPU project the triangles to all 6 faces and reject ones they aren't visible on while submitting the rest. Instead of rendering to cubemap faces, render to a single buffer containing all six faces (and you'd need to clip the triangles or otherwise mask them to avoid crossing faces, though this can be done with vertex setup and per-pixel ops if you want to avoid a real clip or or setting state). One pass and a bunch of SPU work and you have all six cube-faces packed into a single texture. You'll probably have to blit them around to get them onto a real cubemap but that should be pretty cheap.

Damned if I know if that's a win on anything though.

Simon F · Oct 4, 2006

MrWibble said:
Damned if I know if that's a win on anything though.

You'd only need one "full" transformation per vertex per cube map to orient it with the 1st viewing direction. All subsequent transformations are then trivial. <shrug>

darkblu · Oct 4, 2006

Simon F said:
You'd only need one "full" transformation per vertex per cube map to orient it with the 1st viewing direction. All subsequent transformations are then trivial. <shrug>

well, what stops you from doing that yourself when you deem it apt? i still don't get what's the point of throwing in a special-case optimisation technique right into the API. is that hip or something? or have they ironed out all bumps in their API so they're now in the 'random customer's requests fulfillment' mode?

nAo · Oct 4, 2006

darkblu said:
i still don't get what's the point of throwing in a special-case optimisation technique right into the API.

That's exactly the first thing I thought first time I heard about it, but I'm sure it's there for a (political) reason.

Powderkeg · Oct 4, 2006

darkblu said:
i still don't get what's the point of throwing in a special-case optimisation technique right into the API.

That's part of creating a "standard" API format. You want everyone to do it the same way, so even special-case optimizations need to be included in your standards.

Simon F · Oct 4, 2006

darkblu said:
well, what stops you from doing that yourself when you deem it apt?

How do you read the vertex/triangle once and output it up to 6 times with different clipping behaviour in an efficient manner?

i still don't get what's the point of throwing in a special-case optimisation technique right into the API. is that hip or something?

I guess if a lot of ISVs are using (dynamic) cubemaps in their software then it makes sense to move it into the API.

One might ask why bilinear/trilinear filtering is part of the API. After all, it could all be done with multiple point sample passes...

ERP · Oct 4, 2006

Forget about reflection maps and think about cubemaps as accumulation buffers or offscreen buffers.

If I want to combine cube maps in different orientations for example it's very useful.
All it really is, is the ability to address an array of render targets, and I can think of uses beyond cubemaps for that.

darkblu · Oct 4, 2006

Simon F said:
How do you read the vertex/triangle once and output it up to 6 times with different clipping behaviour in an efficient manner?

read it once and output it 6 times? - unless the hw has explicit support for this, i can't see how an API can do it more efficiently than i can do it myself. same with the 1 full transform and posterior 5 optimised transforms - on hw that does not have deliberate facilites for this i can't see how the API 'abstracting' it for me can speed things at all. i'd really prefer an API that makes life easier on its bread'n'butter pipeline, and lets me worry about optimisations like that, than an API that is rough on the fundamentals but covers everything in oranges marmalade. what if i don't want to draw some of the sides of the cube as i know i have a large outstanding occluder there?

I guess if a lot of ISVs are using (dynamic) cubemaps in their software then it makes sense to move it into the API.

if the API does allow for good old-fashioned dynamic cubemaps in a reasonable manner this 'abstraction' buys me negligible little.

One might ask why bilinear/trilinear filtering is part of the API. After all, it could all be done with multiple point sample passes...

if those point-samples were equally efficient to the higher-tap versions there's no reason why the latter are required. usually this is not the case, though.

edit: ok, i just took the time to trace the origins of the discussed feature and from all i can tell it's just one speculated use for geometry shaders - if that's indeed the case and there's no extra associated burdeon on API interfaces, and moreover, the HW does go to the full lengths to do the vertex instancing more efficient than i can do with local buffers then i'm perfectly ok with that, as in the worst case of sub-par HW implementation i can always neglect it.

arjan de lumens · Oct 4, 2006

MrWibble said:
A triangle could be visible on 5 sides of a cubemap, if it were large/close enough.

It's actually possible to construct a triangle that is visible on all the 6 sides of the cubemap.

darkblu said:
read it once and output it 6 times? - unless the hw has explicit support for this, i can't see how an API can do it more efficiently than i can do it myself. same with the 1 full transform and posterior 5 optimised transforms - on hw that does not have deliberate facilites for this i can't see how the API 'abstracting' it for me can speed things at all. i'd really prefer an API that makes life easier on its bread'n'butter pipeline, and lets me worry about optimisations like that, than an API that is rough on the fundamentals but covers everything in oranges marmalade. what if i don't want to draw some of the sides of the cube as i know i have a large outstanding occluder there?

Bandwidth. This API allows you to prevent the vertex shader from actually reading in the vertex 6 times; this can be a rather serious bandwidth savings, even if it doesn't cut down the amount of calculations done. As for occluders: if you know that a cube face will never be visible, you can abstain from binding it to a render-target, and if you know that a specific object will be occluded, you can abstain from passing the object to the vertex shader in the first place.

darkblu · Oct 4, 2006

arjan de lumens said:
Bandwidth. This API allows you to prevent the vertex shader from actually reading in the vertex 6 times; this can be a rather serious bandwidth savings, even if it doesn't cut down the amount of calculations done.

yes, that's what i meant by 'deliberate hw support' with the vertex instancing.

As for occluders: if you know that a cube face will never be visible, you can abstain from binding it to a render-target, and if you know that a specific object will be occluded, you can abstain from passing the object to the vertex shader in the first place.

exactly that latter is the questionable aspect - say, i know that object A is well occluded at face +X but otherwise visible elsewhere, and with a straight dummy 1-in-6-out apporach i'll still have to pass it down for vertex handling at face +X - can this be opimised or not?

arjan de lumens · Oct 4, 2006

darkblu said:
exactly that latter is the questionable aspect - say, i know that object A is well occluded at face +X but otherwise visible elsewhere, and with a straight dummy 1-in-6-out apporach i'll still have to pass it down for vertex handling at face +X - can this be opimised or not?

Optimizing such a case in the presence of a geometry shader can be done in a few different ways:

Unbind the render target corresponding to the +X face.
Set up a geometry shader that conditionally skips outputting to the +X face based on the value of a uniform or vertex attribute.

Either way, you will end up with no polygons being sent to clipping/triangle-setup/rendering/etc for the +X face.

darkblu · Oct 4, 2006

arjan de lumens said:
Set up a geometry shader that conditionally skips outputting to the +X face based on the value of a uniform or vertex attribute.

ok, fair enough. if the geom shaders implementations also can do their verex instancing without reverting output to temporary local buffers then that could be a win over current apporaches. as otherwise if they could not then it would be little more than a semantical gimmick in the context of this particular task.

Cube Maps and Next Gen Consoles

Acert93

Artist formerly known as Acert93

mrboo

Rolf N

Recurring Membmare

Rolf N

Recurring Membmare

flick556

MrWibble

davepermen

22psi

MrWibble

Simon F

Tea maker

darkblu

nAo

Nutella Nutellae

Powderkeg

Simon F

Tea maker

ERP

darkblu

arjan de lumens

darkblu

arjan de lumens

darkblu

Similar threads