Hello everyone,
This post is a sort of follow-up to my pipeline article, and the resulting refusal by many people, including Dave, that it simply HAD to have pipelines.
Well, I've been doing some additional research. And it sounds like the idea of going away from pipelines and using pools of units instead has been considered by people for, well, quite a while.
I've stumbled upon an article by J. E. Smith, professor at the Dept. of Elect. and Comp. Engr. at the University of Wisconsin, his job being mostly related to computer archtiectures.
His personal website is: http://www.engr.wisc.edu/ece/faculty/smith_james.html
And the article I'm talking about is: http://www.ece.wisc.edu/~jes/papers/hipc.00.pdf ( Instruction Level Distributed Processing )
The paper of course is more aimed at CPU architectures, although it is quite general. Let me quote parts of the article...
Before anyone asks, I *did* read all of the article.
Seems familiar? Remember nVidia insisting about the VS being many small units? All the hype about the super high clock speeds, at least compared to traditional GPUs? The communication delays with registers ( although a lot got to be left to the imagination in that case ) ?
A lot of things COULD be explained if the NV3x was heavily influenced by ILPD. I'm not saying it *is*. I'm just saying it is, in my opinion, a likely explanation.
So, you might say "Yes, but since it's distributed, why aren't the fragment and vertex units shared?"
Well, to share, you need to have similar caracteristics. You need the same ISA, mostly. And the NV3x does not have that: no texture lookup in the VS, no branching in the PS, ...
So, how to define the NV3x based on this? Well, it would be a pipeline, with stages of the pipeline using distribution and sharing, but with the impossibility to share with other parts of the pipeline. Just an idea, really, let me reinsist it's all speculation.
I'm going to close by a quote of CMKRNL...
I don't want this to become a NV40 speculation thread though. I want this to be related to the technical caractierists of the NV3x. So should this become too related to the NV4x, I'd suggest opening a new thread maybe or something...
Feedback, comments, flames related to this?
Uttar
EDIT: Fixed title ( was ILPD, should have been ILDP ) and typos
This post is a sort of follow-up to my pipeline article, and the resulting refusal by many people, including Dave, that it simply HAD to have pipelines.
Well, I've been doing some additional research. And it sounds like the idea of going away from pipelines and using pools of units instead has been considered by people for, well, quite a while.
I've stumbled upon an article by J. E. Smith, professor at the Dept. of Elect. and Comp. Engr. at the University of Wisconsin, his job being mostly related to computer archtiectures.
His personal website is: http://www.engr.wisc.edu/ece/faculty/smith_james.html
And the article I'm talking about is: http://www.ece.wisc.edu/~jes/papers/hipc.00.pdf ( Instruction Level Distributed Processing )
The paper of course is more aimed at CPU architectures, although it is quite general. Let me quote parts of the article...
( Figure 1 is on page 5 )A microarchitecture paradigm which deals effectively with technology and application trends is Instruction Level Distributed Processing ( ILDP ). A processor following the ILPD paradigm consists of distributed functional units, each fairly simple with a very high frequency clock cycle ( for example, Fig. 1 )
( it is fairly obvious the NV3x does not use any, or anyway not by any noticeable amount, such techniques, which is one of the causes of its high heat and power consumption. Other parts of the paper explain some ways to remedy to this )With high intra-processor communication delays, the number of instructions executed per cycle may level off or decrease when compared to today, but overall performance can be increased by running the smaller distributed processing elements at a much higher clock rate. The structure of the system and clock speeds have implications for global clock distribution. There will likely be multiple clock domains, possibly asynchronous from one another.
( the 21264 is an Alpha processor )Clustered dependence-based architectures are one important class of ILDP processors. The 21264 is a fairly recent example. In these microarchitectures, processing units are organized into clusters and dependent instructions are steered to the same cluster for processing
Before anyone asks, I *did* read all of the article.
Seems familiar? Remember nVidia insisting about the VS being many small units? All the hype about the super high clock speeds, at least compared to traditional GPUs? The communication delays with registers ( although a lot got to be left to the imagination in that case ) ?
A lot of things COULD be explained if the NV3x was heavily influenced by ILPD. I'm not saying it *is*. I'm just saying it is, in my opinion, a likely explanation.
So, you might say "Yes, but since it's distributed, why aren't the fragment and vertex units shared?"
Well, to share, you need to have similar caracteristics. You need the same ISA, mostly. And the NV3x does not have that: no texture lookup in the VS, no branching in the PS, ...
So, how to define the NV3x based on this? Well, it would be a pipeline, with stages of the pipeline using distribution and sharing, but with the impossibility to share with other parts of the pipeline. Just an idea, really, let me reinsist it's all speculation.
I'm going to close by a quote of CMKRNL...
This part will also contain a completely revamped unified shading model. This means that both vertex and pixel shaders will share the exact same ISA and constructs.
I don't want this to become a NV40 speculation thread though. I want this to be related to the technical caractierists of the NV3x. So should this become too related to the NV4x, I'd suggest opening a new thread maybe or something...
Feedback, comments, flames related to this?
Uttar
EDIT: Fixed title ( was ILPD, should have been ILDP ) and typos