REYES on unified shaders

Frank

Certified not a majority
Veteran
This is probably a stupid question, but could REYES be done at all on unified shaders, if they have enough ROP's and buffers for them? Are they able to skip the fragment processing altogether and add the triangle/pixel coverage directly to the frame buffer? Although we probably have to wait for SM 4.0 to be able to have the vertex shaders do the tesselation. And we would need them to be able to tesselate curves.

My nest question would be, can you get texture filtering in the vertex shaders that way? We probably have to wait for SM 4.0 for that as well.

And my last question would be: are the vertex shaders able to do complex lighting and shadows? I think so, but I'm not sure.

It would be pretty nifty to have Pixar's RenderMan like quality through hardware (albeit surely not real-time). And if possible at all, it might encourage the hardware vendors to keep that in the back of their head when designing their next hardware.

Or am I talking nonsense?

:)


Edit: and would it be possible to to the tesselation on the CPU, especially with Cell? Although the memory bandwidth would likely keep things pretty slow.
 
Last edited by a moderator:
FWIW, i asked Mike Doggett not long ago if they had carried out any experiments on xenos in the direction of micropolies - the answer was they hadn't.
 
After thinking it over a bit more, I think the only two real problems would be the ROP's not being able to add the coverage, and the tesselation. And the first could be handled by an extra fragment pass, although that would slow things down quite a bit. The texture lookups should use the same mechanism and opcodes in a vertex shader as in a pixel shader, and the same goes for the lighting. And the FIFO's are almost surely large enough, as they are only important for the draw commands, which are limited by memory bandwidth or the speed of the tesselation. As far as my understanding goes, everything but the direct tesselation of the curved surfaces can be done by the GPU, the only issue would be the performance hit.

Darkblu, do you think there are other showstoppers, things that Pixar's RenderMan can do, but cannot be done with shaders?
 
Last edited by a moderator:
While we are at Xenos forgive me for going OT but i feel it dont deserve a whole thread.
First I just assumed that Xenos will be the first hardware that can really do fast and good Vertex texturing?
Second does we know for sure now that the ALUs features extralogic or mini ALUS for pixelshading. Dave hinted at it in his article and gave a 33% increase number but has there been more info about this later, Dave?
 
DiGuru, i'm not an expert on rman but you're right with regard to the vertex shaders texture sampling abilities - those need to be fully-filtered, plus all parameters' partial derivatives should be made availabe there as well, as you remember rman does much filtering work per-vertex/micropoly.
 
overclocked said:
While we are at Xenos forgive me for going OT but i feel it dont deserve a whole thread.
First I just assumed that Xenos will be the first hardware that can really do
fast and good Vertex texturing?

i don't know what you mean by 'good' but xenos' vertex texturing is unfiltered.

Second does we know for sure now that the ALUs features extralogic or mini ALUS for pixelshading. Dave hinted at it in his article and gave a 33% increase number but has there been more info about this later, Dave?

allow me to answer for Dave: xenos has two sets of tex samplers, independent of each other: one for vertex and one for pixel textures. their samples do gather in a common cache at a certain late stage, though.
 
darkblu said:
i don't know what you mean by 'good' but xenos' vertex texturing is unfiltered.
Xenos can do the same texture filtering in the vertex shader as it can in the pixel shader, filtered or unfiltered.
 
3dcgi said:
Xenos can do the same texture filtering in the vertex shader as it can in the pixel shader, filtered or unfiltered.

now that you mention it, yes. and i was originally thinking of texture and vertex samplers, but as their outputs gather and are available to all shaders, that does not matter. yep, my bad.
 
darkblu said:
i don't know what you mean by 'good' but xenos' vertex texturing is unfiltered.



allow me to answer for Dave: xenos has two sets of tex samplers, independent of each other: one for vertex and one for pixel textures. their samples do gather in a common cache at a certain late stage, though.

Thanks for the fast reply, have you tried to set up a scenario on your sim project that is "likely" for xenos or or have it in mind so you/we get a headstart to when the unified parts have come to the PC-segment?

Btw i must really give you kudos for the work on your simulator project, highly interesting and i hope you continue to improve it, just plain good job!
 
overclocked said:
Thanks for the fast reply, have you tried to set up a scenario on your sim project that is "likely" for xenos or or have it in mind so you/we get a headstart to when the unified parts have come to the PC-segment?

Btw i must really give you kudos for the work on your simulator project, highly interesting and i hope you continue to improve it, just plain good job!

overclocked, just keep in mind the correction to my original answer (as it wasn't originally quite correct). also remember Dave has a nice xenos article on this very site ; )

as regarding my IMR project, thanks! it's not modelled after any particular architecture (although it borrows from some), and most likely will remain like that, unless somebody spawns some architecture simulation from it. also, thurp's relation to shaders is a little bit awkward - in contrast to, say, Nick's assembly shaders compiler to sse, thurp does not deal with shaders as separate entities - its 'pixel shaders' (i.e. the different segment spanner routines) come embedded and get compiled as part of the pipeline's code (like in a fixed-function pipeline), and its 'vertex shading' comes in one flavor for now, plus thurp is totally missing the assembly representation and goes directly for "HLSL" - namely templetized C++, which, of course, must pass through your actual ccompiler; so at the end it's a bit like writing renderman surface shaders in c - but hey, people have been known for doing that ; )

ps: lately i've been working to make use of the auto-vectorization feature of the current gcc - so eventually thurp will compile to acceptable SIMD code, and it's gonna be multi-platform from the get go.
 
I'm not an rman expert either, but I've chatted to a few people and seen some material that indicates that the hardware just isn't close to being able to get the quality (forget performance). Rman is more of a general graphics system - various effects just don't translate to GPU's (I think "Deep Shadow Maps" with variable sized texels was an example). I've also read that some of the jittered shadow mapping uses textures upto 16,384x16,384 textures :oops:

As for the comment on tesselation - SM4's GS provides for this, but the current estimates from IHV's for usable tesselations are well below the actual limits. I'm not expecting to see much heavy-weight HOS/tesselation in the first wave of D3D10 engines/hardware ;)

Jack
 
I totally agree with you about the performance. But it's certainly better than what you would get from general CPU's.

I don't agree about the quality issue. I would have, a few years ago. But not any more. With programmable shaders, you can do *anything* you want. The only issue with that will be the performance hit as well. Sure, it might take a very large amount of instructions to get it right. So what? As long as it isn't real-time, who cares?

And that is about the only concern from the generic users: the amount of fps they can get versus the amount of "nifty" effects. Which isn't very interesting, as it might take minutes to render a single frame. But that might translate into hours while only using a single CPU. That's why they use render farms.

The textures are moot as well: sure, your fps will take a major hit (as if you would care when it isn't real time to start with), but with PCIe there are no real limits to the texture size but what the hardware can address (so you might have to split them up) and the total amount of system memory available. 64 bits hardware and drivers could take care of it.



Then again, we really would want it to be in the same ballpark as real time graphics to make it feasible for general consumer use. Frames per minute would be bad. But slow is nothing, if it looks stupendously gorgeous!
 
Last edited by a moderator:
DiGuru said:
With programmable shaders, you can do *anything* you want. The only issue with that will be the performance hit as well. Sure, it might take a very large amount of instructions to get it right. So what? As long as it isn't real-time, who cares?
Hmm, I disagree. Shaders are exceptionally good at what they're designed for - but they are a long way form a general purpose CPU or GPU. There are many things you can't do.

The classic one is that you can't do a scatter operation, only gathers. Another issue is FP consistency - the D3D10 spec is pretty good on this, so things will get better, but it's not good enough yet. There isn't a programmable blender either - even in D3D10. You can only sample 16 different textures as well. One-in-one-out on the VS unit; sure this gets busted with the GS in D3D10, but still, most of the grunt work is done prior to the GS, which makes it difficult to get arbitrary dynamic HOS.

It's more of a performance thing, but lack of a programmable blender pretty much elimintates the possibility of using 128bit FP textures - there's NO blending with the FB.

Jack
 
hopefully we will see all the things in the next-next gen PCs and consoles, post-DX10, post SM4.0, that will bring in things like REYES and everything else that shaders cannot do.
 
JHoxley said:
Hmm, I disagree. Shaders are exceptionally good at what they're designed for - but they are a long way form a general purpose CPU or GPU. There are many things you can't do.

The classic one is that you can't do a scatter operation, only gathers.
True, for the most part. You cannot have a pixel shader modify other pixels. But then again, with REYES, you would do that before or during tesselation.

Another issue is FP consistency - the D3D10 spec is pretty good on this, so things will get better, but it's not good enough yet. There isn't a programmable blender either - even in D3D10. You can only sample 16 different textures as well. One-in-one-out on the VS unit; sure this gets busted with the GS in D3D10, but still, most of the grunt work is done prior to the GS, which makes it difficult to get arbitrary dynamic HOS.
Yes, but again, most of the setup work will be done before or during tesselation, and you can expand the texture lookups through fragment passes.

And while you cannot calculate the effects in larger color spaces, you can eliminate the most glaring problems. Like any kind of AA issues. They're "automatic", through the blending stage.

It's more of a performance thing, but lack of a programmable blender pretty much elimintates the possibility of using 128bit FP textures - there's NO blending with the FB.

Jack
But you could surely simulate that through pixel shaders, isn't it? But I agree that it would be pretty slow.
 
Hmm, JHoxley, most of the things you mentioned that "can't" be done can actually be achieved by ping-ponging between buffers, can they not? Of course you will take a big performance hit, but as long as we're just talking non-realtime possibilities...

(Though this doesn't have much impact on a REYES scheme, which I find interesting but know too little about ;))
 
Back
Top