Nvidia Patent: Unified Shaders/ G80?

So if I understand the TMU part of this one correctly, the most complex configuration could be understood as four separate units: a LOD unit, an address calculation unit, a load/store unit and a filter unit. And you can use each of them in the shader, e.g. for fast bilinear filtering of any four samples, not just from textures?
 
Last edited by a moderator:
Demirug said:
As Xenos have only one 16x SIMD Pipeline it need only one scheduler.

Doesn't Xenos have 3 independent 16x SIMD Pipelines?

The whole design can be scaled easier.
You can kill a MPU in the case of a defect and have still a working chip.

But that would mean for a given chip there is no guarantee of equal capability across shaders (pipes). How would you classify such a chip?
 
Mordenkainen said:
Hmm... why are all the 240 connected between one another? Is this normal?

Err... There was really a hell of speculation, but don't you think that the scheme simply shows the ability of the execution units to reuse the results/variables of each others? It should be pretty usual thing to do...

And it is really pretty obvious that we have a unified shader architecture here, but what's so exeptional about it? It is a natural evolution, and actually, it would be simpler to design a unified architecture then a separated.
 
How three different people can misinterpret that sentence is surely case of study.
 
trinibwoy said:
Doesn't Xenos have 3 independent 16x SIMD Pipelines?

As Demirug seems to mean it, it's not necessarily wrong. One single 3-way pipeline, where each consists of 16x SIMD.

While there is no dependency between the 3 arrays afaik, it's another way to interpret it.

http://www.beyond3d.com/articles/xenos/index.php?p=07

I can see Wavey has crossed out the term "pipeline" in the page header. It's one shader array isn't it?
 
A further related patent,

Patent said:
Method and apparatus for multithreaded processing of data in a programmable graphics processor

Abstract

A graphics processor and method for executing a graphics program as a plurality of threads where each sample to be processed by the program is assigned to a thread. Although threads share processing resources within the programmable graphics processor, the execution of each thread can proceed independent of any other threads. For example, instructions in a second thread are scheduled for execution while execution of instructions in a first thread are stalled waiting for source data. Consequently, a first received sample (assigned to the first thread) may be processed after a second received sample (assigned to the second thread). A benefit of independently executing each thread is improved performance because a stalled thread does not prevent the execution of other threads...

Method and apparatus for multithreaded processing of data in a programmable graphics processor

Haven't read it but Fig 2 is clearly a unified shader from the first patent and by the same people.

EDIT: updated first post.
 
Last edited by a moderator:
  • Like
Reactions: Geo
6. The graphics processor of claim 5, wherein the samples include at least one of vertices, primitives, surfaces, fragments and pixels.

Further embodiments of a method of the invention include using a function call to configure the graphics processor. Support for processing samples of at least one sample type independent of an order in which the samples are received by a multithreaded processing unit within the graphics processor is detected. The function call to configure the multithreaded processing unit within the graphics processor to enable processing of the samples independent of an order in which the samples are received is issued for the at least one sample type.

:D
 
Last edited by a moderator:
_xxx_ said:
Hmm, is that good or bad? Was there any talk about unified shaders back then at all (besides the possible hints in the patent itself)?

I'm sure these guys start thinking about stuff long, long before we do. They would have to be working on their 2008-2010 architectures right now and who knows what those would look like.
 
trinibwoy said:
I'm sure these guys start thinking about stuff long, long before we do.

Well that's sure. My question was rather, was there any talk about going unified from MS side back then at all? The planning only goes as far as the estimations/goals set.
 
Are nVidia going to do a R9700 and show up early? With Vista delayed again, who knows maybe some Quarter more than "Jan 07" and the talk about G80 in H2 combined makes it very interesting. What have they been doing with NV50 for all these years..
Just so i dont start a flame by mistake, with nVidia doing a R9700 i mean it in the sence that it was launched well before DX9. And that NV50 *maybe* also will be launched earlier so that the API is not there yet for some time.
 
What does this say about dynamic branching performance? They seem to equate a single fragment with a single sample, and since there is a thread per sample, does this mean that branching is at the per-fragment / per-vertex level? Also, by "group of fragments" below, are they referring to quads or something more akin to the batches in today's architectures?

For example, each fragment or group of fragments within a primitive can be processed independently from the other fragments or from the other groups of fragments within the primitive. Likewise, each vertex within a surface can be processed independently from the other vertices within the surface. For a set of samples being processed using the same program, the sequence of program instructions associated with each thread used to process each sample within the set will be identical. However, it is possible that, during execution, the threads processing some of the samples within a set will diverge following the execution of a conditional branch instruction. After the execution of a conditional branch instruction, the sequence of executed instructions associated with each thread processing samples within the set may differ.
 
_xxx_ said:
Well that's sure. My question was rather, was there any talk about going unified from MS side back then at all? The planning only goes as far as the estimations/goals set.
Well, I think it is out of the question that there were plans for going "unified", but that doesn´t necessarily imply that NV already thought about it back then, because we all know how they (David Kirk) reacted when they got asked about going unified in hardware.

However, when speaking of ATI in particular, development and possible execution of R400 wouldn´t have made much sense back then at all if there hadn´t been any plans about going unified. Going back in time, both IHVs switched from their original plans to alternative ones, when MS announced they couldn´t execute as planned. That´s where we are today (basically), but that shouldn´t tell us anything about R600 / NV50, because while MS did their "winter sleep" , both IHVs haven´t been at sleep at all.
 
Last edited by a moderator:
_xxx_ said:
Well that's sure. My question was rather, was there any talk about going unified from MS side back then at all? The planning only goes as far as the estimations/goals set.

DX9 came out in the Fall of 2002, and already had sm3 finalized at the time, right (i.e. it was included)? Wouldn't you think they'd have started the "what comes next" convos before it hit the street? I would. . .
 
Back
Top