Vertex fetch rate (RSX and Xenos)?

nAo said:
jvd said:
How efficent are pixel shaders at vertex shading ?
Very efficient.


:?:

Sorry, I don't understand how you do vertex shaders through the pixel shaders... :?: Would you or someone else mind explaining this a little bit? :)
 
Alstrong said:
nAo said:
jvd said:
How efficent are pixel shaders at vertex shading ?
Very efficient.


:?:

Sorry, I don't understand how you do vertex shaders through the pixel shaders... :?: Would you or someone else mind explaining this a little bit? :)
In Xenos, you don't have "pixel shaders doing vertex shading".

You have units, a bit more general purpose than surrent PS and VS, that can handle both.

So i'm not sure it's correct to ask "how efficient is a PS at doing VS'ing" because, on current GPUs without unified shaders, that question wouldn't work cause pixel shaders only do pixel shading and vertex shaders only do vertex shading. And on Xenos, there are no pixel shaders or vertex shaders, there are "unified shaders" that can do both.

Now, if the question is "how efficient are the unified shaders in Xenos at PS'ing nad VS'ing?", then that's all down to speculation until we get a lot more informations and concrete results, which will come in the future.
 
Alstrong said:
Sorry, I don't understand how you do vertex shaders through the pixel shaders... :?: Would you or someone else mind explaining this a little bit? :)
Store your vertices in one or more textures and render a primitive with a 1:1 ratio between textures size and primitives size, fetch a vertex attributes from texture(s), shade it and output it in one or more render targets, according the number of output attributes a shaded vertex has to pass to a 'real' pixel shader.
 
ah.... l-b, yeah, I was thinking in the context of "how do you get the PS to do the VS on a conventional card." :oops: Thanks


And thank you, nAo :)
 
First it's so inefficient to use external memory and render-to-texture techniques and the required translation between formats as a data transport between PS and VS compared to the standard packed vertex formats, that to split up your vertex programs and use PS ALU's to help VS performance as nAo suggests would likely be slower than just letting the longer shaders run. And secondly most VS ops are on 4 components and the PS ALU's can only perform 3 component ops. I would be interested to know if anyone has tried this in a real game. It doesn't seem like it would be worth spending the time on.
 
But, back to my original question, I'm really looking to see if anyone thinks RSX will also get a "MEMEXPORT" like capability, as that would significantly increase the usefulness of the CELL <-> RSX connection. Otherwise, it still has numerous limitations. As Shifty said, it's probably too early to tell, but I didn't know if anyone had seen or heard of any such capability.
 
Rockster said:
I stand corrected, apparently NV's pixel shader ALU's can issue a vec4 op, ATI's cannot.
ATIs pixel shaders can issue vec4 ops too. In fact, I seem to remember that each ALU could issue one vec4 + one scalar op. I might be mistaken though.
 
I thought MEMEXPORT was just a fancy name for saying that Xenos has unrestricted memory access. Doesn't RSX have access to the full 512mb of RAM?

What's the diff.?
 
Gholbine said:
I thought MEMEXPORT was just a fancy name for saying that Xenos has unrestricted memory access. Doesn't RSX have access to the full 512mb of RAM?

What's the diff.?
http://www.beyond3d.com/forum/viewtopic.php?t=24993&postdays=0&postorder=asc&start=40

DeanoC said:
The difference between render target writes and MEMEXPORT is that MEMEXPORT uses no coherancy patterns to dictate the data output. For example a vertex shader on Xenon can do this

MEMEXPORT TO Address(0), Val0
MEMEXPORT TO Address(10000), Val1
MEMEXPORT TO Address(2344), Val2
MEMEXPORT TO Address(9990), Val3
And still write fragments via the pixel shader to EDRAM

To do the same thing using a conventional rasterisor would involve 5 seperate triangles (one for each memory write). Thats a vast difference for many GPGPU operations, any GPU can do MEMEXPORT like function but by using lots and lots of triangles...

It basically allows full scatter/gather memory functions, the major difference between CPU and GPUs.
 
Back
Top