Nvidia Patent: Unified Shaders/ G80?

j^aws

Veteran
I thought I'd post a thread here (after being told off!) about a patent I posted in the console tech a while ago,

NVIDIA Patent: G70/G80/RSX?

Patent said:
Programmable graphics processor for generalized texturing

Abstract

A programmable graphics processor including an execution pipeline and a texture unit is described. The execution pipeline processes graphics data as specified by a fragment program. The fragment program may include one or more opcodes. The texture unit includes one or more sub-units which execute the opcodes to perform specific operations such as an LOD computation, generation of sample locations used to read texture map data, and address computation based on the sample locations...

...The fixed function computation units for performing texture mapping are configured in a pipeline that is dedicated to performing the texture mapping operations specified by texture map instructions. When texture map instructions are not used to process graphics data, the pipeline is idle. Likewise, when many texture map instructions are executed to perform texture mapping operations, a bottleneck may develop in the pipeline, thereby limiting performance.

Accordingly, it would be desirable to provide improved approaches to performing texture operations to better utilize one or more processing units within a graphics processor....

Programmable graphics processor for generalized texturing

It describes a unified shader unit with a coupled texture unit. The texture unit has sub-units that are multi-threaded. These specialised sub-units could be used in combination for a desired result and therefore increase untilisation. Effectively the unified shader unit is a multi-threaded processor. They also discuss removing ROPs completely so that the unified shader unit could act as a 'programmable' ROP...

Discuss!

EDIT:

Further related patent,

Method and apparatus for multithreaded processing of data in a programmable graphics processor

http://www.beyond3d.com/forum/showthread.php?p=724114#post724114
 
Last edited by a moderator:
rs2.JPG


Here's a pic of the unit... thoughts?
 
Well, thanks for the thread, but while I was semi-kidding about the title, I really was serious about the concept.

The fact is, there isn't enuf pointing at stuff of mutual interest from the console forum here in 3dTech, and it would be useful to have a generic thread to do that with. Mebbee "Recent Console Threads of Mutual 3dTech Interest"? I dunno, something like that.

Tho if y'all want to individually thread when it happens, I'm cool with that. . .I was just thinking it might happen more if there was an official home for it.

Edit: Mebbee a sticky that only mods can add to? Somebody in console says "Hey, V, go flag those arrogant pc elitists on this one, willya?" That way it keeps the traffic/posts in console too, which is only fair.
 
Last edited by a moderator:
Mordenkainen said:
Hmm... why are all the 240 connected between one another? Is this normal?

It's showing each shader unit can consume 'both' vertex and fragment buffers and that they are also coupled to a TMU. However, the TMU has decoupled, specialised sub-units...( not shown in pic)...

And no, it's not normal for current GPUs...
 
Mordenkainen said:
Hmm... why are all the 240 connected between one another? Is this normal?
I don't think 240 is a number of connections but a number of block identification, as can be seen that there are different number assigned to each block, but only same number applied to the blocks with the same function. Anyway, are there any description on detail of the execution pipeline block? It would give more clear picture on a whole diagram in my believe.
 
Well, "3D tech&HW" is flexible enough naming for both console and PC tech IMHO...

On topic: could someone explain the bit about the workings of that texturing unit so that I can get it as well? :oops:

EDIT: referring to this:
The texture unit includes one or more sub-units which execute the opcodes to perform specific operations such as an LOD computation, generation of sample locations used to read texture map data, and address computation based on the sample locations...
 
Mordenkainen said:
Hmm... why are all the 240 connected between one another? Is this normal?

Yeah the '240s' aren't actually 'connected' with each other in that drawing, rather the lines you see horizontally are 'skipping' over the vertical lines stemming from the vertex and pixel input buffers.
 
geo said:
...
Tho if y'all want to individually thread when it happens, I'm cool with that. . .I was just thinking it might happen more if there was an official home for it.

I know with certain Bulletin board software you can post a thread in "multiple" forums. That might be ideal but not sure if this could be setup with B3D...
 
I think it's worth posting one of the '240s' as well, just for some context. Whether G80 ends up related to this patent or not, I think the fact that this patent itself describes a unified shader architecture is fairly evident.

rs.JPG


With some select quotes from the patent:

In a typical implementation Programmable Graphics Processing Pipeline 150 performs geometry computations, rasterization, and fragment computations. Therefore Programmable Graphics Processing Pipeline 150 is programmed to operate on surface, primitive, vertex, fragment, pixel, sample or any other data. For simplicity, the remainder of this description will use the term "samples" to refer to graphics data such as surfaces, primitives, vertices, pixels, fragments, or the like.

FIG. 3 is an illustration of an exemplary embodiment of Execution Pipeline 240 containing at least one Multithreaded Processing Unit 300 in accordance with one or more aspects of the present invention. An Execution Pipeline 240 can contain a plurality of Multithreaded Processing Units 300, each Multithreaded Processing Unit 300 containing an Execution Unit 370. Each Execution unit 370 includes at least one PCU 375. PCUs 375 are configured using program instructions read by a Thread Control Unit 320 via a dedicated Read Interface 205. In an alternate embodiment Read Interface 205 is shared between two or more Multithreaded Processing Units 300. Thread Control Unit 320 gathers source data specified by the program instructions and dispatches the source data and program instructions to at least one PCU 375. PCUs 375 perform computations specified by the program instructions and outputs data to at least one destination, e.g., Pixel Output Buffer 270, Vertex Output Buffer 260 or Register File 350.
 
Last edited by a moderator:
_xxx_ said:
...
On topic: could someone explain the bit about the workings of that texturing unit so that I can get it as well? :oops:

EDIT: referring to this:

That texture unit/ block contains several but different specialised/ fixed function sub-units. Using a fragment program, you can utilise several combinations/permutations of these sub-units in parallel for your desired effect...
 
satein said:
I don't think 240 is a number of connections but a number of block identification, as can be seen that there are different number assigned to each block, but only same number applied to the blocks with the same function.

Er... I didn't ask about "240 connections". I was especifically referring to the "240" block that is called "Execution Pipeline".

xbdestroya said:
Yeah the '240s' aren't actually 'connected' with each other in that drawing, rather the lines you see horizontally are 'skipping' over the vertical lines stemming from the vertex and pixel input buffers.

I was actually referring to the lines that come out of the "240" blocks that are joined as a single vertex output buffer "260" whereas the same blocks output individual contributions to the pixel output buffer "270".
 
Mordenkainen said:
I was actually referring to the lines that come out of the "240" blocks that are joined as a single vertex output buffer "260" whereas the same blocks output individual contributions to the pixel output buffer "270".

Gotcha, I see what you're saying now. Maybe a case of 'drawing semantics'? It is an interesting observation though.
 
The USA is just a framework into which to drop the texture unit.

The later diagrams are the ones that should have been included - this summary diagram is essentially irrelevant. It's like those diagrams in patents that show a GPU connected to a computer monitor, a keyboard and a mouse.

Jawed
 
I agree, it's the texture units that seem to be the big thrust of this patent (as was pointed out to me a couple of weeks ago), but since this thread is kind of here in the context of supporting a unified shader theory for G80, it stands to reason that the USA drawings are the ones that have been included thus far.
 
Jaws said:
That texture unit/ block contains several but different specialised/ fixed function sub-units. Using a fragment program, you can utilise several combinations/permutations of these sub-units in parallel for your desired effect...

Not only that but the design provides the option of replacing each specialized sub-unit with a shader (kinda like software emulation on the GPU itself).
 
Can anybody who's chewed through the patent tell, how is texture unit access arbitrated/scheduled?

Otherwise the architecture seems pretty nice. The execution pipelines pretty much resemble regular multithreaded vectorprocessor cores which share one special purpose units. One could draw parallels with this architecture and Sun's Niagara processor. The cores have a lot more of threads and insted of integer cores + FPU they are vector FPU cores + texturing, but ideology seems strikingly similar. Hide memory fetches with heavy multithreading and use lots of simple efficient cores.
 
I read this patent when it first appeared in console and just re-read it again. One thing that stands out to me is that the threading and arbitration logic is implemented at the pipeline level and there isn't a master scheduler like in Xenos.

The other thing is the extra level of nesting of processing elements.

Pipeline (Shader) -> Multi-threaded processing unit(s) MPU(s) -> Execution Unit -> PCU(s).

A PCU looks to be just what we call an ALU today. What I don't get is why you would need multiple MPU's in a pipeline ? ILP is handled through more PCU's. TLP can be handled by a single MPU. What benefit is there to having multiple MPU's per shader as opposed to more shaders with a single MPU each?
 
Last edited by a moderator:
Mordenkainen said:
Hmm... why are all the 240 connected between one another? Is this normal?
I think it's just showing that the outputs all go to the same place and there could have been separate lines.
 
trinibwoy said:
I read this patent when it first appeared in console and just re-read it again. One thing that stands out to me is that the threading and arbitration logic is implemented at the pipeline level and there isn't a master scheduler like in Xenos.

As Xenos have only one 16x SIMD Pipeline it need only one scheduler.

trinibwoy said:
The other thing is the extra level of nesting of processing elements.

Pipeline (Shader) -> Multi-threaded processing unit(s) MPU(s) -> Execution Unit -> PCU(s).

A PCU looks to be just what we call an ALU today. What I don't get is why you would need multiple MPU's in a pipeline ? ILP is handled through more PCU's. TLP can be handled by a single MPU. What benefit is there to having multiple MPU's per shader as opposed to more shaders with a single MPU each?

The whole design can be scaled easier.
You can kill a MPU in the case of a defect and have still a working chip.

I am not sure if anybody notices this but in the references list contains two patents that are not published yet.

2003/0164823 Sep., 2003 Baldwin et al.
2004/0012597 Jan., 2004 Zatz et al.


Both names are well known as persons who work for nVidia. As the second patent is filled in the same month as this one I expect that I contain the USA part in detail.
 
Back
Top