Cube Maps and Next Gen Consoles

Acert93

Artist formerly known as Acert93
Legend
Cube Maps are one way to do reflections on modern GPUs. From what I have read it can take 6 passes to do a cube map on current GPUs, although with D3D10 this can be done in a single pass (thanks to the geometry shaders and streamout).

Because of the unique environment consoles offer (in which the platform is stable, exploitable, and in many cases offers features and performance not found in many average desktopns) we see console devs do some really amazing stuff.

So my question: With the current hardware in the PS3 and Xbox 360 are there ways to speed up cube mapping? And are there better/comparable techniques that can be done faster than cube mapping? (After seeing nAo ingenuity in using NAO32 for HDR effects I wouldn't be surprised!)
 
If DX10 does it in 1 pass then should'nt PS3? nAo already said that Cell+RSX vertex shaders are way more flexable then dx10's geometry shader's :) Or am i completely misunderstanding him?
 
Single-pass cubemap rendering is a totally boneheaded idea. You forfeit the ability to do per-face culling for approximately zero gain, not to mention all the unneccessary complications you're causing for the hardware guys. It's just silly differentiating-your-graphics-API-with-a-hammer.
 
If DX10 does it in 1 pass then should'nt PS3?
DX10 is an API. It does nothing. Devices and drivers can do something.
Nobody here is prepared to tell you how fast that feature will be implemented in hardware, or if it's just going to be a bunch of software hacks in the drivers.
 
If I understand correctly, since a triangle can face at most 3 sides of a cube map they use the gs to determine what these sides are? Kind of like determining what texture(s)/render_target to assign a triangle to dynamically based on their transform. It would be pretty cool if the gs can do this. The CPU could do this if vertex transforms where done on the CPU.
 
Last edited by a moderator:
If I understand correctly, since a triangle can face at most 3 sides of a cube map they use the gs to determine what these sides are? Kind of like determining what texture(s)/render_target to assign a triangle to dynamically based on their transform. It would be pretty cool if the gs can do this. The CPU could do this if vertex transforms where done on the CPU.

A triangle could be visible on 5 sides of a cubemap, if it were large/close enough.

Not that this changes or answers your question :)
 
Single-pass cubemap rendering is a totally boneheaded idea. You forfeit the ability to do per-face culling for approximately zero gain, not to mention all the unneccessary complications you're causing for the hardware guys. It's just silly differentiating-your-graphics-API-with-a-hammer.

... what culling? if you render onto a cubemap there is none.. and this is why it's very useful, it doesn't transform the triangle 6 times and culls it then 3 to 5 times just for nothing. (and renderstate setup 6 times instead of once, etc).
 
Forgive this naive question, isn't Xenos sort of close to DX10? If so, would it be able to pull this off with less than 6 passes?
 
... what culling? if you render onto a cubemap there is none.. and this is why it's very useful, it doesn't transform the triangle 6 times and culls it then 3 to 5 times just for nothing. (and renderstate setup 6 times instead of once, etc).

Well there might be some savings, but basically you'd still end up with culling somewhere - identifying which faces to render to is still a culling operation, and you'd still need to rasterise it seperately for each face.

I'm not sure how DX10 or DX10-grade hardware will help or hinder this, nor whether it's a good idea. I can also see that possible ways of achieving it would be possible on earlier (or different, such as Cell+RSX in PS3) hardware anyway.

For example on PS3, I might process the triangle stream on SPU, letting the SPU project the triangles to all 6 faces and reject ones they aren't visible on while submitting the rest. Instead of rendering to cubemap faces, render to a single buffer containing all six faces (and you'd need to clip the triangles or otherwise mask them to avoid crossing faces, though this can be done with vertex setup and per-pixel ops if you want to avoid a real clip or or setting state). One pass and a bunch of SPU work and you have all six cube-faces packed into a single texture. You'll probably have to blit them around to get them onto a real cubemap but that should be pretty cheap.

Damned if I know if that's a win on anything though.
 
Damned if I know if that's a win on anything though.
You'd only need one "full" transformation per vertex per cube map to orient it with the 1st viewing direction. All subsequent transformations are then trivial. <shrug>
 
You'd only need one "full" transformation per vertex per cube map to orient it with the 1st viewing direction. All subsequent transformations are then trivial. <shrug>

well, what stops you from doing that yourself when you deem it apt? i still don't get what's the point of throwing in a special-case optimisation technique right into the API. is that hip or something? or have they ironed out all bumps in their API so they're now in the 'random customer's requests fulfillment' mode?
 
i still don't get what's the point of throwing in a special-case optimisation technique right into the API.
That's exactly the first thing I thought first time I heard about it, but I'm sure it's there for a (political) reason.
 
i still don't get what's the point of throwing in a special-case optimisation technique right into the API.


That's part of creating a "standard" API format. You want everyone to do it the same way, so even special-case optimizations need to be included in your standards.
 
well, what stops you from doing that yourself when you deem it apt?
How do you read the vertex/triangle once and output it up to 6 times with different clipping behaviour in an efficient manner?
i still don't get what's the point of throwing in a special-case optimisation technique right into the API. is that hip or something?
I guess if a lot of ISVs are using (dynamic) cubemaps in their software then it makes sense to move it into the API.

One might ask why bilinear/trilinear filtering is part of the API. After all, it could all be done with multiple point sample passes...
 
Forget about reflection maps and think about cubemaps as accumulation buffers or offscreen buffers.

If I want to combine cube maps in different orientations for example it's very useful.
All it really is, is the ability to address an array of render targets, and I can think of uses beyond cubemaps for that.
 
How do you read the vertex/triangle once and output it up to 6 times with different clipping behaviour in an efficient manner?

read it once and output it 6 times? - unless the hw has explicit support for this, i can't see how an API can do it more efficiently than i can do it myself. same with the 1 full transform and posterior 5 optimised transforms - on hw that does not have deliberate facilites for this i can't see how the API 'abstracting' it for me can speed things at all. i'd really prefer an API that makes life easier on its bread'n'butter pipeline, and lets me worry about optimisations like that, than an API that is rough on the fundamentals but covers everything in oranges marmalade. what if i don't want to draw some of the sides of the cube as i know i have a large outstanding occluder there?

I guess if a lot of ISVs are using (dynamic) cubemaps in their software then it makes sense to move it into the API.

if the API does allow for good old-fashioned dynamic cubemaps in a reasonable manner this 'abstraction' buys me negligible little.

One might ask why bilinear/trilinear filtering is part of the API. After all, it could all be done with multiple point sample passes...

if those point-samples were equally efficient to the higher-tap versions there's no reason why the latter are required. usually this is not the case, though.


edit: ok, i just took the time to trace the origins of the discussed feature and from all i can tell it's just one speculated use for geometry shaders - if that's indeed the case and there's no extra associated burdeon on API interfaces, and moreover, the HW does go to the full lengths to do the vertex instancing more efficient than i can do with local buffers then i'm perfectly ok with that, as in the worst case of sub-par HW implementation i can always neglect it.
 
Last edited by a moderator:
A triangle could be visible on 5 sides of a cubemap, if it were large/close enough.
It's actually possible to construct a triangle that is visible on all the 6 sides of the cubemap.

read it once and output it 6 times? - unless the hw has explicit support for this, i can't see how an API can do it more efficiently than i can do it myself. same with the 1 full transform and posterior 5 optimised transforms - on hw that does not have deliberate facilites for this i can't see how the API 'abstracting' it for me can speed things at all. i'd really prefer an API that makes life easier on its bread'n'butter pipeline, and lets me worry about optimisations like that, than an API that is rough on the fundamentals but covers everything in oranges marmalade. what if i don't want to draw some of the sides of the cube as i know i have a large outstanding occluder there?
Bandwidth. This API allows you to prevent the vertex shader from actually reading in the vertex 6 times; this can be a rather serious bandwidth savings, even if it doesn't cut down the amount of calculations done. As for occluders: if you know that a cube face will never be visible, you can abstain from binding it to a render-target, and if you know that a specific object will be occluded, you can abstain from passing the object to the vertex shader in the first place.
 
Bandwidth. This API allows you to prevent the vertex shader from actually reading in the vertex 6 times; this can be a rather serious bandwidth savings, even if it doesn't cut down the amount of calculations done.

yes, that's what i meant by 'deliberate hw support' with the vertex instancing.

As for occluders: if you know that a cube face will never be visible, you can abstain from binding it to a render-target, and if you know that a specific object will be occluded, you can abstain from passing the object to the vertex shader in the first place.

exactly that latter is the questionable aspect - say, i know that object A is well occluded at face +X but otherwise visible elsewhere, and with a straight dummy 1-in-6-out apporach i'll still have to pass it down for vertex handling at face +X - can this be opimised or not?
 
exactly that latter is the questionable aspect - say, i know that object A is well occluded at face +X but otherwise visible elsewhere, and with a straight dummy 1-in-6-out apporach i'll still have to pass it down for vertex handling at face +X - can this be opimised or not?

Optimizing such a case in the presence of a geometry shader can be done in a few different ways:
  • Unbind the render target corresponding to the +X face.
  • Set up a geometry shader that conditionally skips outputting to the +X face based on the value of a uniform or vertex attribute.
Either way, you will end up with no polygons being sent to clipping/triangle-setup/rendering/etc for the +X face.
 
Set up a geometry shader that conditionally skips outputting to the +X face based on the value of a uniform or vertex attribute.

ok, fair enough. if the geom shaders implementations also can do their verex instancing without reverting output to temporary local buffers then that could be a win over current apporaches. as otherwise if they could not then it would be little more than a semantical gimmick in the context of this particular task.
 
Last edited by a moderator:
Back
Top