G80 to have unified shader architechture??!?!

Carl B · Jul 24, 2005

TOrangeMonkey said:
So the G80 would be non-unified, just another branch from the nv40/G70 tree and the G80 isn't WGF2.0... right?

Just because it's non-unfied doesn't mean it would have to be a branch in the same vein as NV40--->G70. All we know is it's 'different,' no one can make any claims as to how.

Well, someone like Dave may in fact have further insights into it though.

KimB · Jul 24, 2005

Oh, I think nVidia's going to produce an architecture for WGF 2.0 that isn't unified first (well, unless WGF 2.0 doesn't come out until a year after Longhorn or so...). But I expect it to be quite different from the current NV4x.

Ailuros · Jul 25, 2005

A couple of things that still aren't completely clear to me, so please correct me if I'm wrong:

WGF2.0 presupposes a Geometry Shader. Looking at the rumours about Xenos and R600 similarities it's quite safe to presume that the latter will also have a "pool" of shader units, but not necessarily including GS functions also. How low are the chances that the GS is (temporarily) a separate unit in architectures like R600?

One good reason according to my quite simplistic understanding to skip temporarily the unification of PS/VS, would be a Topology processor/GS being too "tightly" connected to the VS units in an alternative sollution.

Now if the above makes any sense and both IHVs in the longrun will end up with a pool of shader units that will handle all 3 types of shaders (GS/VS/PS), then the point of interest would be which of the two hypothetical above is more efficient overall.

Frankly all the above isn't that much interesting to me; I'd love to know which of the IHVs is responsible for WGF2.0 ending up with no programmable tesselation units "again" and why real on chip adaptive tesselation gets constantly postponed in the last years, when I actually have the feeling that developers are in fact showing interest for such functionalities.

Rys · Jul 25, 2005

Is the Topology Processor still a part of WGF2.0 ? Maybe Ilfirin can update his DirectX Next piece with any recent changes, if he's around and has the time. I've been using it as a WGF2.0 reference piece of sorts, but I'm sure it needs a poke or two

Ailuros · Jul 25, 2005

That's what I understand under the Geometry Shader.

It's my understanding that early DX-Next drafts included also a tesselation unit (which wasn't all that programmable?), must have been slated as "optional" down the road and has been removed entirely when the API was finalized.

A hypothetical PPP would be able to handle both topology and tesselation functions, but seing how the real requirements in WGF2.0 turned out to be, I doubt that IHVs went beyond the topology/GS implementation after all. Could be wrong though (that's why I'm calling for possible corrections).

Rys · Jul 25, 2005

I guess it depends when the IHVs started designing WGF2.0 hardware. If they've committed resources to designing hardware that can not only output tris early, pre-raster, as the tesselation hardware would, but also perform some of the functions mooted here, that hardware smells to me like it'd be separate from the unified pool of shader units.

I therefore think it's a good bet that a partial non-PPP of sorts, I guess, should be in R600 as separate-from-unified-PS-and-VS silicon. The timeframe for R600's design would help us out here. Dave?

Given the tesselation unit has been dropped, what's the chances of them pulling those transistors out of the hardware by now, too, if the timeline for R600 stretches back that far?

Ailuros · Jul 25, 2005

Rys said:
I guess it depends when the IHVs started designing WGF2.0 hardware. If they've committed resources to designing hardware that can not only output tris early, pre-raster, as the tesselation hardware would, but also perform some of the functions mooted here, that hardware smells to me like it'd be separate from the unified pool of shader units.

I therefore think it's a good bet that a partial non-PPP of sorts, I guess, should be in R600 as separate-from-unified-PS-and-VS silicon. The timeframe for R600's design would help us out here. Dave?

That's my guess also considering the so far supplied data. What I am actually wondering about, is how NVIDIA might have actually approached WGF2.0.

Given the tesselation unit has been dropped, what's the chances of them pulling those transistors out of the hardware by now, too, if the timeline for R600 stretches back that far?

I'm not sure I understand your question completely; my uneducated guess would be that WGF2.0 requirements where set in stone (in a relative sense) somewhere in 2002. Assuming Xenos is the closest thing to the original "R400", there should have been enough time for R600.

Did actually ATI ever have a full PPP in it's roadmap in the past? There have been repeated rumours of a PPP in NVIDIA's past roadmaps though. No idea though if any of it is actually correct.

Jawed · Jul 25, 2005

I dunno if it's fair, but I've interpreted Xenos's MEMEXPORT functionality as the basis for geometry shading.

Wasn't there a patent from NVidia on the subject of a tesselation processor quite recently?

It seems to me that both IHVs are attacking this.

But I freely admit to being way behind the curve on tessellation and other geometry operations.

Jawed

Ailuros · Jul 25, 2005

Jawed said:
I dunno if it's fair, but I've interpreted Xenos's MEMEXPORT functionality as the basis for geometry shading.

I don't think it's really that comparable, since Xenos is a console design:

http://arstechnica.com/articles/paedia/cpu/xbox360-1.ars

Jawed · Jul 25, 2005

http://www.beyond3d.com/articles/xenos/index.php?p=10

MEMEXPORT expands the graphics pipeline further forward and in a general purpose and programmable way. For instance, one example of its operation could be to tessellate an object as well as to skin it by applying a shader to a vertex buffer, writing the results to memory as another vertex buffer, then using that buffer run a tessellation render, then run another vertex shader on that for skinning. MEMEXPORT could potentially be used to provide input to the tessellation unit itself by running a shader that calculates the tessellation factor by transforming the edges to screen space and then calculates the tessellation factor on each of the edges dependant on its screen space and feeds those results into the tessellation unit, resulting in a dynamic, screen space based tessellation routine.

I see this as a piecemeal process, tessellating portions of geometry at a time, so that MEMEXPORT doesn't run away and consume all RAM, but simply works within the confines of a "buffer", say 20MB. While Xenos is geometry shading one object, the previous object is being vertex shaded, and the object before that is being pixel-shaded. etc.

I don't understand what you're saying Ailuros... What is it about a console that makes GPU-executed geometry-shading so improbable?

Jawed

j^aws · Jul 25, 2005

Fyi...

Here's a recent nVidia Geometry Shader patent I posted,

http://www.beyond3d.com/forum/viewtopic.php?t=23591

Jawed · Jul 25, 2005

That's the one I was thinking of

Jawed

Megadrive1988 · Jul 26, 2005

TOrangeMonkey said:
DaveBaumann said:

I asked DK about unified architectures when I was given my very initial NV40 breifing he argued quite vehemently against them then. He has pretty much been consistently on that path since then, up until fairly recently where be has been making more conciliatory noises about it. Given that DK is working one or two architecture down those types of thoughts are probably about the types of things he's actually working on. Given the recent noises I think its almost certain they will go the unified route at some point, but I personally don't expect it for G80 given the design of this thing is probably in its final stages (i.e. the high level "architecture" choices were set down a long time ago), but possibly for the architecture after - this type of timing would also fit a lot better with the timing for WGF2.0.

Click to expand...

So the G80 would be non-unified, just another branch from the nv40/G70 tree and the G80 isn't WGF2.0... right?

G80, probably non-unified, probably WGF2.0 / Shader Model 4.0/ but it is a tossup about G80 being another branch of the NV40 as G70 was, or if G80 is really the true next-gen NV5X.

Jawed · Jul 26, 2005

Well David Kirk said that NVidia's on a 2 years between major architectures timescale, so we should expect G80 to be big.

Fingers crossed...

2006 could be a repeat of 2005 - NVidia releases a new product line at the end of spring (ish) and we spend the rest of the year arguing over what ATI's going to release some time soon, pretty please. ARGH.

Jawed

KimB · Jul 26, 2005

Jawed said:
Well David Kirk said that NVidia's on a 2 years between major architectures timescale, so we should expect G80 to be big.

Yeah, I would expect the G80 to be released in late 2006 to correspond with Longhorn.

Ailuros · Jul 26, 2005

Jawed said:
http://www.beyond3d.com/articles/xenos/index.php?p=10

MEMEXPORT expands the graphics pipeline further forward and in a general purpose and programmable way. For instance, one example of its operation could be to tessellate an object as well as to skin it by applying a shader to a vertex buffer, writing the results to memory as another vertex buffer, then using that buffer run a tessellation render, then run another vertex shader on that for skinning. MEMEXPORT could potentially be used to provide input to the tessellation unit itself by running a shader that calculates the tessellation factor by transforming the edges to screen space and then calculates the tessellation factor on each of the edges dependant on its screen space and feeds those results into the tessellation unit, resulting in a dynamic, screen space based tessellation routine.

Click to expand...

I see this as a piecemeal process, tessellating portions of geometry at a time, so that MEMEXPORT doesn't run away and consume all RAM, but simply works within the confines of a "buffer", say 20MB. While Xenos is geometry shading one object, the previous object is being vertex shaded, and the object before that is being pixel-shaded. etc.

I don't understand what you're saying Ailuros... What is it about a console that makes GPU-executed geometry-shading so improbable?

Jawed

What I have in mind is programmable adaptive on chip tesselation. I wasn't at all refering to geometry shading and Xenos isn't by far entirely a WGF2.0 compliant GPU, because it lacks a geometry shader. A well designed programmable primitive processor would be capable of both topology and tesselation, as described in B3D's DX-Next article. The Geometry Shader in WGF2.0 covers mostly topology functions and the once optional tesselation unit has been scratched.

In conjuction with the former link I provided, I think you forgot one paragraph out of Wavey's article:

Other examples for its use could be to provide image based operations such as compositing, animating particles, or even operations that can alternate between the CPU and graphics processor.

Dynamic LOD would be a further example.

Jawed · Jul 26, 2005

Jawed said:
http://www.beyond3d.com/articles/xenos/index.php?p=10

MEMEXPORT expands the graphics pipeline further forward and in a general purpose and programmable way. For instance, one example of its operation could be to tessellate an object as well as to skin it by applying a shader to a vertex buffer, writing the results to memory as another vertex buffer, then using that buffer run a tessellation render, then run another vertex shader on that for skinning. MEMEXPORT could potentially be used to provide input to the tessellation unit itself by running a shader that calculates the tessellation factor by transforming the edges to screen space and then calculates the tessellation factor on each of the edges dependant on its screen space and feeds those results into the tessellation unit, resulting in a dynamic, screen space based tessellation routine.

Click to expand...

And then I got bored bolding stuff, because the whole quote is about tesselation.

Ailuros said:
What I have in mind is programmable adaptive on chip tesselation.

I really haven't got the foggiest what you're trying to say. Xenos can generate an arbitrary collection of new vertices in addition to or in replacement of the input vertices.

Are you saying that simply because Xenos doesn't have a piece of hardware designed for this task that it can't do it?

I dare say it wouldn't surprise me if ATI and M$ went off into a corner and decided that regardless of WGF2.0's functionality, Xenos was going to do this.

Jawed

Jawed · Jul 26, 2005

http://www.beyond3d.com/articles/directxnext/index.php?p=4#top

There are a number of interesting consequences to a unified shading model, some of which may not be immediately apparent. The most obvious addition is, of course, the ability to do texturing inside the vertex shader, and this is especially important for general-purpose displacement mapping, yet it need not be limited to that. A slightly less obvious addition is the ability to write directly to a vertex buffer from the vertex shader, allowing the caching of results for later passes. This is especially important when using higher-order surfaces and displacement mapping, allowing you to tessellate and displace the model once, store the results in a video memory vertex buffer, and simply do a lookup in all later passes.

But perhaps the most significant addition comes when you combine these two together with the virtual video memory mindset â€“ with virtual video memory, writing to and reading from a texture becomes pretty much identical to writing to or reading from any other block of memory (ignoring filtering, that is). With this bit of insight, the General I/O Model of DirectX Next was born â€“ you can now write any data you need to memory to be read back at any other stage of the pipeline, or even at a later pass.

Sounds like MEMEXPORT combined with a unified shader to me!

Jawed

KimB · Jul 26, 2005

That description probably doesn't include MEMEXPORT, but only general reads/writes to video memory. After all, MEMEXPORT is efficient because the Xenos is the memory controller for the system. This won't be the case on a PC.

Ailuros · Jul 26, 2005

Are you saying that simply because Xenos doesn't have a piece of hardware designed for this task that it can't do it?

In collaboration with the CPU definitely; and in that department a console and especially with the CPUs the consoles incorporate, consoles have an advantage over the PC IMHO for the moment.

Besides that a tesselation unit is no longer needed for WGF2.0.

I dare say it wouldn't surprise me if ATI and M$ went off into a corner and decided that regardless of WGF2.0's functionality, Xenos was going to do this.

The Xenos GPU is still not WGF2.0 compliant.

Sounds like MEMEXPORT combined with a unified shader to me!

You've just highlighted the avdancements in the I/O Model of WGF2.0. Can we keep for a second tesselation and topology for a second apart?

Here's what a theoretical scheme of the pipeline looked like before the optional tesselation unit was scratched from WGF2.0, as shown in a JPR presentation in 2004:

What you are refering to with the highlights above in MEMEXPORT, is part of the new I/O model and thus what is described in the graph above as "reusable stream output" in two locations, one for the tesselation unit and one for the geometry shader.

Xenos can generate an arbitrary collection of new vertices in addition to or in replacement of the input vertices.

Does it delete any?

It's my understanding that the optional tesselation unit in WGF2.0 was removed (even as being optional) because it turned out with every other draft being less and less programmable and there it makes sense to just get rid of it all along IMHO.

G80 to have unified shader architechture??!?!

Carl B

Friends call me xbd

KimB

Ailuros

Epsilon plus three

Rys

Graphics @ AMD

Ailuros

Epsilon plus three

Rys

Graphics @ AMD

Ailuros

Epsilon plus three

Jawed

Ailuros

Epsilon plus three

Jawed

j^aws

Jawed

Megadrive1988

Jawed

KimB

Ailuros

Epsilon plus three

Jawed

Jawed

KimB

Ailuros

Epsilon plus three

Similar threads