Radeon 9700 and 8x1 pipes...

I need some help here.. The R300 has 8 pipes with one TMU each. However the ATI specs state that it can do 16 textures per pass..

this is always looked at as *Per Pipe* right?? Or is this being figured another way? There is some critisism on the net today that the 1 TMU per pipe will limit R300's multi texture power.. Yet it still seems to meet the Dx9 spec.

It seems to me that the 9700 must be able to loopback 7 times if processing 4x4 pipes or 15 times if fully paralell 8.. Strangely this looks strikingley similar to the way the Flipper chip works.. AS it can also process 8 textures per pipe wit only one TMU...

Does the Floating point nature of the pixel pipeline come into play in this?

Will one of you kind programer/hardware types please go into this a bit further??
 
Please read threads before posting repeated questions!

I doubt anyone can give difinitive answers on how the pipelines are actually working because there are a large number of possibilities to which we don't yet know the answers. I posted the following in this thread:

http://www.beyond3d.com/forum/viewtopic.php?t=1658&postdays=0&postorder=asc&start=25

However, this possibly raises more questions than it actually answers, because theres more to it than just the number of TCU’s you have per pipe. For instance, 8500’s ability to address 6 textures meant that each pipeline also had 6 texture registers, giving a total of 24 texture registers. With the need for 16 registers in DX9 how does R300’s pipeline handle this? If it build it up over 16 cycles then that would mean that each pipe has 16 registers giving a total of 128 registers for the chips – which I assume would be huge; alternatively does it use pipeline combining meaning that some/many of the actual pixel pipes are not doing anything for much of the time when multitexturing is in operation but would cut down on the total number of registers used.

Read the rest of that thread for other peoples insights into the configuration.
 
While I'm not sure I would call it a TEV, I wouldn't be suprised if the two setups share some of the same design philosophies, considering they are both from ArtX.
 
My guess is it does one pixel with 16 textures over two cycles.

There'd be too much render latency using 15 loopback cycles. Or at least, that's how it'd seem...
 
Tagrineth said:
My guess is it does one pixel with 16 textures over two cycles.

There'd be too much render latency using 15 loopback cycles. Or at least, that's how it'd seem...
8 texels per clock is 8 texels per clock. Loopback doesn't necessarily imply bad performance. The key is that you don't have to go to multi-pass, which is a big plus.
 
OpenGL guy can you confirm 8 pixel pipelines with 1 TMU per pipe? Just curious, as they weren't very clear at the breakout sessions (man oh man I asked a lot of questions but I was a bit wasted at those meetings)
 
There's no necessity that '8 pipes' means 8 completely independent render pipes, or that there is a tight binding between pixels/texels/pipes/units/pixel shader instruction throughput/etc....

When companies make statements such as this it means you get 8 pixels per clock. Or 8 texels per clock. Or maybe less. Or maybe more. It's all quite complicated, and dependent.

For example, you have a trilinear texture. Does that take two texture units, or just one? And then anisotropic - 16 samples? In that case, why isn't performance 0.5 pixel per clock, not 8? And what about antialiasing? In 4X mode you have to generate 4x as many pixels... but performance doesn't drop that much? Where does Hyper Z come in?

Without a detailed architectural description (which isn't going to happen; frankly I'm slightly surprised by the level of detail which has gone out in some of the white papers) there's no answer to these questions. Even then there might not be. A lot of the real performance tuning on these chips is by hand, and there is often no such thing as 'expected performance' - it's just a question of measuring what actual pixel/texel rates you get out....
 
Tagrineth said:
My guess is it does one pixel with 16 textures over two cycles.

There'd be too much render latency using 15 loopback cycles. Or at least, that's how it'd seem...

No, I think it'd be more efficient if each pixel pipeline was autonomous.

After all, think of it this way. If they all worked together, what would they do when you decided to use seven textures? If the pipelines worked together for multitexturing, then you'd end up with only seven in use at once...reducing the overall processing.

And 16 clock cycles isn't much latency when you have over 300 million per second.

Update: It may also help texture caches if all eight could be made to work on the same texture at once (instead of each pipeline working on its own texture).
 
I think it's most likely that they are 8 pipes arranged as a 4x2 rectangle.

The only disadvantage this would have over the GF4 arrangement, would be even further reduced fillrate on very small tris. This probably isn't an issue at PC resolutions and tri counts.

Of course this is speculation, perhaps ATI allow the pipes to be dynamically allocated to pixels, but this seems complex and probably not of a significant benefit in current and near future PC product.
 
Once you start to get really small triangles, chances are you'll start hitting transform and triangle setup limits before you'll hit the fillrate limit.
 
Actually you'd be surprised.
On a 2x2 pipeline card At 640x480, I've seen real fillrate measure very close to 1/4 of best case when tris get small. and with simple vertex shaders it's still very much fill limited.
I agree though on a geforce 4 if your doing a lot of animation or complex setup, >20 or so instructions and the tris are small, then you end up limited by the shader.
However if you have 4x the vertex shader performance, you need a much more complex shader for it to become the bottleneck.

Again though at 1280x1024 with PC polygon counts, I don't really see it being an issue.
 
Back
Top