Vertex fetch rate (RSX and Xenos)?

Rockster

Regular
A lot of talk has revolved around RSX and CELL's high speed interconnect (much faster than previous architectures), and that it will enable a new rendering paradigm. And that with CELL it might be possible to do some pre or post processing that hasn't been feasible before.

My questions. Is the hardware responsible for fetching vertices part of the vertex shading units themselves or some other de-coupled units in the G70? If CELL was used to perform transformation, lighting, or both, is there a way to have the data bypass the VS side of the RSX and be fed directly into the PS array or would the VS still be used as the transport? The likely answers would seem to be no. In which case, CELL can't do anything to increase the theoretical maximums, but it can be used to reduce the length of the shaders run by the GPU. And as such, it can't do anything to help balance VS/PS load as it relates to polygon size, but potentially can as it relates to shader length?

According to the beyond3d Xenos article, it can fetch 16 vertices per clock, is it the same for the G70?
 
Don't believe QRoach's post. C'mon people, he was being obvious enough in his joke there. ;)
 
It's a good question. I guess my own questions on this would be

a) does it matter for the visible vertices if you're going to be limited by triangle setup anyway?

and

b) for the invisible ones, does every invisible vertex created/used end up having to go to the pixel shaders?

and

c) which do we need more - more vertices than the gpu can handle itself or more work per vertex? I'm sure either would be handy, of course ;)
 
Rockster said:
According to the beyond3d Xenos article, it can fetch 16 vertices per clock, is it the same for the G70?
Umh..I believe you misunderstood Xenos article, it refers about 16 vertex fetch units that is 16 point filtered textures fetch per clock cycle.
Xenos can sample 16 bilinear filtered textures per clock and 16 point filtered textures per clock.
G70 can sample 24 bilinear filtered textures per clock and 8 point filtered textures per clock.
 
nAo said:
Rockster said:
According to the beyond3d Xenos article, it can fetch 16 vertices per clock, is it the same for the G70?
Umh..I believe you misunderstood Xenos article, it refers about 16 vertex fetch units that is 16 point filtered textures fetch per clock cycle.
Xenos can sample 16 bilinear filtered textures per clock and 16 point filtered textures per clock.
G70 can sample 24 bilinear filtered textures per clock and 8 point filtered textures per clock.

So is there a limit to what the pixel shaders can take in terms of vertices? Or more specifically, does that limit coincide with the capability of the Vertex Shaders, or could they accomodate a greater volume of vertices if the CPU were "helping out"?
 
Titanio said:
So is there a limit to what the pixel shaders can take in terms of vertices? Or more specifically, does that limit coincide with the capability of the Vertex Shaders, or could they accomodate a greater volume of vertices if the CPU were "helping out"?
Pixel shaders process fragments so they stall if there aren't any fragment to process.
Lack of fragments could be due to an unbalance between vertex and pixel shaders complexity: imagine a huge vertex shader coupled to a small pixel shader.
In this case a developer could split a big vertex shader in 2 parts, the first one would run on CELL and the second one would run on RSX.
So the answer is 'yes' ;)
 
nAo said:
Pixel shaders process fragments so they stall if there aren't any fragment to process.
Lack of fragments could be due to an unbalance between vertex and pixel shaders complexity: imagine a huge vertex shader coupled to a small pixel shader.
In this case a developer could split a big vertex shader in 2 parts, the first one would run on CELL and the second one would run on RSX.
So the answer is 'yes' ;)

Ah, I see what you're getting at, thanks.

Although this seems to be more reducing the load on the VS in order to increase the throughput, but the end result is the same as what I was thinking ;)
 
Titanio, The pixel shader receives fragments not vertices from the setup engine that contain the interpolates for that fragment position, along with the shader program and attribute information for that batch. One triangle typically generates many fragments, that are then distributed to the pixel shaders. I can't see how CELL has any effect on this fixed throughput.

nAo. I don't think so. The filtered and point-sampled units are equally accessible to both vertex and pixel shaders so there is no logic in the setup you describe. I think point sample/vertex fetcher/address logic is one and the same. At some point vertices must be fetched and directed to hardware. I believe there are 8 vertex input lanes in RSX/G70 vs 16 possible vertex input lanes in Xenos.
 
Rockster said:
Titanio, The pixel shader receives fragments not vertices from the setup engine that contain the interpolates for that fragment position, along with the shader program and attribute information for that batch. One triangle typically generates many fragments, that are then distributed to the pixel shaders. I can't see how CELL has any effect on this fixed throughput.

Cheers. I think nAo's idea is that whilst you can't change the fixed max that can go into the pixelshaders, in situations where you're bottlenecking on vertex shaders, your pixelshaders may often be waiting on data (and stalling). If you can split your shader between the VS and SPEs you could relieve that bottleneck and come closer to that "fixed max" again..(?)
 
Right. That's what I said in my original post. No reason Xenos couldn't do the same via a L2 FIFO buffer, but bandwidth permitting, at potentially twice the rate. X360 natively supports packed vertex formats, and dot products without swizzling or just wasting register space, so depending on the number of attributes for each vertex, it should be able to much the same thing, don't you think? Especially since frame-buffer bandwidth is quite significant, there is a high probability that developers will opt to spread their data structures around between both mem pools on PS3 to maximize bandwidth, which will further strain memory access resources.
 
As long as the rate is enough to keep the pixel shaders busy, that's what matters most, no? At least from the approach nAo was taking, I think..

The same can be done on 360, of course, but if you were doing it purely to keep the pixel shaders busy, there'd be less of a need since Xenos can balance that itself. If you wanted to increase the pixel load on Xenos without compromising vertex work, it might make more sense, but my thinking was more of how to keep RSX busy when the odds are stacked against that (when vertex work is bottlenecking)..Xenos can keep itself busy.

Although memory access across both sides of Flexio is one issue (there is one-way access in 360, from the CPU over its pipe to Xenos too though), from a computational resource perspective it seems very doable.
 
Rockster said:
One triangle typically generates many fragments, that are then distributed to the pixel shaders. I can't see how CELL has any effect on this fixed throughput.
Cell might help to reach a higher throughput.

nAo. I don't think so. The filtered and point-sampled units are equally accessible to both vertex and pixel shaders so there is no logic in the setup you describe.
Umh..I'm not sure what are u saying here.
On Xenos point and bilinear filtering TMUs are accessible to both vertex and pixel shaders.
G70 is a different story though..
I think point sample/vertex fetcher/address logic is one and the same. At some point vertices must be fetched and directed to hardware. I believe there are 8 vertex input lanes in RSX/G70 vs 16 possible vertex input lanes in Xenos.
I think you're mixing 2 completely different things, as I said before..
Xenos article is referring to the amount of textures that can be sampled per clock, this has nothing in common with the number of vertices that can be fetched per clock.
 
Titano: I missed the most obvious answer, sorry.
If RSX is steadily vertex shading limited..you can also shift some shading from vertices to pixel, without using CELL at all ;)
 
If RSX is steadily vertex shading limited..you can also shift some shading from vertices to pixel, without using CELL at all

How efficent are pixel shaders at vertex shading ? Also can the rsx do this dynamicly ? Or will the developer have to code this ?
 
Rockster said:
My questions. Is the hardware responsible for fetching vertices part of the vertex shading units themselves or some other de-coupled units in the G70? If CELL was used to perform transformation, lighting, or both, is there a way to have the data bypass the VS side of the RSX and be fed directly into the PS array or would the VS still be used as the transport?
I would say it's too early to tell. Sony+nVidia are talking up the close relationship with RSX+Cell, and looking at RSX as little more than an overclocked G70 with FlexIO connect may be selling it short. We know there's a degree of interfacing, at least in reading/writing vertex data, so streamlining vertex feed to the pixel shaders might be an added feature, but if so something you won't necessarily be seeing in G70.
 
Back
Top