"Not conventional pipes"

digitalwanderer

wandering
Legend
I keep hearing hints and references whenever the old 8x2, 16x1, or whatever configuration talk comes up about either the R420 or the NV40 about how we "shouldn't think of these pipes in conventional terms" and such.

What does that mean?

I know there's not going to be anyone explain what the new surprises will be, but I thought it'd be an interesting avenue to speculate down friendly like. :)
 
Perhaps as the graphics pipeline becomes more and more complicated, the theoretical limit on performance imposed by the actual number of pipelines becomes increasingly remote, as the graphics become more dependent on things like shader throughput and the number of samples taken.

However, I think it is more of a marketing reason, since it's going to suck to have to explain why your card with a single-textured fillrate of X can only manage X/4 in any actual game.
 
By my definition 8 Traditional pipes would mean it can output 8 pixels per clock in color or z only rendering. Execute 8 shader ops per clock. 1 per pipe.

If a graphics chip can execute 2 shader ops per clock per pipe it wouldn't be "traditional."
 
3dcgi said:
By my definition 8 Traditional pipes would mean it can output 8 pixels per clock in color or z only rendering. Execute 8 shader ops per clock. 1 per pipe.

If a graphics chip can execute 2 shader ops per clock per pipe it wouldn't be "traditional."

Doesn't the R3XX execute max 40 ops per clock? Keep in mind GPUs have to hide latency through pipelined (as in staged execution) units and the R3XX has parallelism within the pipeline as well.

I think the traditional pipeline has more to do with organization on the chip itself. For example NV3X architecture is suppose to have an array of floating point units rather than having floating point units tied to each pipeline.
 
rwolf said:
3dcgi said:
By my definition 8 Traditional pipes would mean it can output 8 pixels per clock in color or z only rendering. Execute 8 shader ops per clock. 1 per pipe.

If a graphics chip can execute 2 shader ops per clock per pipe it wouldn't be "traditional."

Doesn't the R3XX execute max 40 ops per clock? Keep in mind GPUs have to hide latency through pipelined (as in staged execution) units and the R3XX has parallelism within the pipeline as well.
I wasn't necessarily referring to R300. Also, if someone wanted to get picky they would say traditional graphics pipelines didn't have shader units. Classifying something as traditional is really only useful if you want to describe the peak performance in one sentence.
rwolf said:
I think the traditional pipeline has more to do with organization on the chip itself. For example NV3X architecture is suppose to have an array of floating point units rather than having floating point units tied to each pipeline.
I agree. You seem to be saying the same thing I did, but in a different way. Hence I indicated the shader ops would be resticted to a specific pipe. NV3x is not traditional.
 
As developers (usually those with knighthood or some such... you get the drift) tell IHVs their wishes for certain things, the IHVs attempt to comply. In doing so, design decisions are made whereby the rather generalized (and normally understood) "AxB" pipeline configuration isn't what I would term as a "priority". However, I think it is fine to follow the normally-understood term of "pipeline config", for now. As time and the industry progress, this (i.e. the accepted "normal understanding of the term") should (probably) diminish in importance.

Excuse the vagueness.
 
digital,

If you're asking for fill-rates f.e., you still calculate clockspeed * number of TMUs.

As for anything else the answers so far were pretty clear. Looking for a new definition? Call "pipelines" from now on SIMD units or channels if that sounds better.

I keep hearing hints and references whenever the old 8x2, 16x1....

By the time you have to think that it's really a A*B/C*D/E*F (different variables under different conditions), then it's time to distract from older terms and accept that things have become a tad more complex.

It can very well be 8*2/16*1/4*(2*2) at the same time; confused yet?
 
3dcgi said:
If a graphics chip can execute 2 shader ops per clock per pipe it wouldn't be "traditional."
I was just told that the NV2A can do that, shattering my belief that it was basically an NV25. So does that qualify the NV2A as "not traditional?"
 
Think not of what can be achieved at the "pixel pipe" level, but what can be achieved at the quad level.

It was pointed out to me yesterday that, dependant on the architectures capabilities, there are differences at what can be done if you are working on a quad than what you may nominally expect given a single pixel pipe capabilities taken in isolation.
 
DaveBaumann said:
Think not of what can be achieved at the "pixel pipe" level, but what can be achieved at the quad level.



It was pointed out to me yesterday that, dependant on the architectures capabilities, there are differences at what can be done if you are working on a quad than what you may nominally expect given a single pixel pipe capabilities taken in isolation.

can I read that as one pipe that can output 4 zixels?
 
Think more about the other end of the pipeline - specifically then number of texture samples a "pixel pipe's texture sampler" can handle, opposed to the number of samples that may be required per quad.
 
An array of TMU's and ALU's dynamically alocated to "requesting" pipes?

Edit: Scratch that. One ubber TMU per 4 pipes reading the whole area on the texture regarding the quad, autogenerating a lower mip-map and able to output a trilinear quad in one cycle. Did I win anything?

PS. feels like first grade
 
Evildeus said:
Can someone explain what DB says to me (in simple words :devilish: ) ? Because it seems really interesting and important :!:

I'll give it a go, but I am probably wrong as well:

Maybe a quad shares more logic than one would think (e.g. maybe the 4 pixels in a quad share texture lookup-logic so that if a texture-lookup for all 4 pixels lies in the same 2x2 texel footprint (bilinear filtering) [or maybe texel-cache block] everything's fine, but if some texels lie in different regions additional latencies might me introduced).

That's my guess. Maybe. :?
 
Can I just be pleasantly pleased that I'm actually understanding a little of what y'all are talking about? :)

I'm almost starting to think that we should use the term "interconnector" more than "pipeline"...
 
DB does this apply to both major IHVs or one specific one?


Meaning, is this how ATi with an 8 "pipe" (420 :eek:) will be able to compete with the 16 "pipe" nV40? Being able to place more into the pipeline at the start and having the whole quad worked on sooner than the quad being assembled at a later stage in the nV40?
 
Pete said:
3dcgi said:
If a graphics chip can execute 2 shader ops per clock per pipe it wouldn't be "traditional."
I was just told that the NV2A can do that, shattering my belief that it was basically an NV25. So does that qualify the NV2A as "not traditional?"

NV20, NV2A and NV25 are all can execute 2 shader ops per clock per pipe.
 
Thanks, Hyp-X. So the following statement isn't entirely correct?

Geforce 3 has one vertex shader, then 4 pipelines with a FP32 TMU + FX9 (integer) shader per pipeline.

Geforce 4 has two vertex shaders, then 4 pipelines with a FP32 TMU + FX9 shader per piepline.

The XBOX NV2a has one vertex shader, then 4 pipelines with a FP32 TMU + FX9 + FX9 (note two shaders) per pipeline.

...

So the XBOX has around double the FX9 shader power per pipeline.
The poster meant 32-bit texture memory unit, not FP32. I'm still under the impression that NV2A, like NV25, had two vertex shaders, but I wouldn't put it beyond the realm of possibility that NV2A had additional improvements over NV20.
 
Pete said:
The poster meant 32-bit texture memory unit, not FP32. I'm still under the impression that NV2A, like NV25, had two vertex shaders, but I wouldn't put it beyond the realm of possibility that NV2A had additional improvements over NV20.
Pete, according to DaveBaumann and Demirug, the NV2x has an IEEE compliant fp32 texture shader (confirm) in addition to the TMU. The texture shader is thought to have been expanded and generalized into the Texture address processor/shader core witnesseed in NV3x (confirm).
 
No, both NV20 and NV25 have two register combiners per pipe. So does IIRC NV10 and NV15, although loopback is missing (same as in texture shader).
 
Back
Top