David Kirk of NVIDIA talks about Unified-Shader (Goto's article @ PC Watch)

Lezmaka · Apr 25, 2006

Dave Baumann said:
I think the easiest route for "partial" unification is to unify the vertex shaders and geometry shaders, while still keeping the pixel shaders separate - all the geometry operation types will be unified, but still a separation between geometry and pixel ops.

Would that be what was meant by "hybrid" (Xbit said used this term I think)? 2 of 3 are unified and the 3rd is still separate? I never really understood what hybrid was supposed to mean, but the term kind of makes sense when looking at it like this.

Acert93 · Apr 28, 2006

Yay, more PR from Mr. Kirk! Nice to see some more spin on his comments... it will be interesting to see what G80 and future NV parts look like. Looks like they are pacing themselves, which is good for them. They will do what they always do: Hit check boxes, and dictate design goals by only performing well in certain areas. Take SM3.0. You would have to be crazy to design a game with heavy SM3.0 features with NV having 88% of the market but being slower here. Ditto 32bit percision with the NV30 series. They had it... and it was slow!

Richthofen said:
a rather huge edge if you ask me.

Of course NV has to brag on something. They effectively filled all the check boxes while undersupporting SM3.0 (of course they will be quite ready to sell us new GPUs with a SM4.0 check boxs all filled in and decent SM3.0 performance in their DX10 part!) Of course such tactics are very business savvy as you mee the current need, have the sale bullet points, and create a situation where the consumer returns for a new product when they don't get the performance they quite need.

So that smaller die size may be nice, but I have seen a number of benchmarks showing the 7900GTX anywhere from 4x-10x slower than the X1900XTX in heavy dynamic branching. As a consumer who keeps his graphics card for 24-36 months that is pretty relevant to many consumers. Its nice the 7900GTX is smaller, but does it have the features, IQ, and longevity?

compres · Apr 28, 2006

Acert93 said:
So that smaller die size may be nice, but I have seen a number of benchmarks showing the 7900GTX anywhere from 4x-10x slower than the X1900XTX in heavy dynamic branching. As a consumer who keeps his graphics card for 24-36 months that is pretty relevant to many consumers. Its nice the 7900GTX is smaller, but does it have the features, IQ, and longevity?

No.

Ailuros · Apr 28, 2006

Of course NV has to brag on something. They effectively filled all the check boxes while undersupporting SM3.0 (of course they will be quite ready to sell us new GPUs with a SM4.0 check boxs all filled in and decent SM3.0 performance in their DX10 part!) Of course such tactics are very business savvy as you mee the current need, have the sale bullet points, and create a situation where the consumer returns for a new product when they don't get the performance they quite need.

PR/marketing departments will always find you something to "brag" about, whether it has any importance or not.

I believe that we'll see fully fletched SM3.0 with D3D10 GPUs in general and alas if ATI wouldn't have had any improvements at all considering SM3.0 when their according GPUs arrived that much later.

The sad reality apart from the back of and forth of either side's marketing campaigns, is that today's games cannot show R580's full potential. It's not "SM3.0 done right" yet rather "SM3.0 done better, yet late". All IMHLO of course.

Razor1 · Apr 28, 2006

Acert93 said:
Yay, more PR from Mr. Kirk! Nice to see some more spin on his comments... it will be interesting to see what G80 and future NV parts look like. Looks like they are pacing themselves, which is good for them. They will do what they always do: Hit check boxes, and dictate design goals by only performing well in certain areas. Take SM3.0. You would have to be crazy to design a game with heavy SM3.0 features with NV having 88% of the market but being slower here. Ditto 32bit percision with the NV30 series. They had it... and it was slow!

Hmm that was quite different, the NV30's 16 bit was also quite slow, there wasn't much apprciable performance increase from a gf4 performance standpoint actaully.

Of course NV has to brag on something. They effectively filled all the check boxes while undersupporting SM3.0 (of course they will be quite ready to sell us new GPUs with a SM4.0 check boxs all filled in and decent SM3.0 performance in their DX10 part!) Of course such tactics are very business savvy as you mee the current need, have the sale bullet points, and create a situation where the consumer returns for a new product when they don't get the performance they quite need.

Just as ATi had to downplay sm 3.0 and dynamic flow control when they didn't have it, its all marketing, all is fair in love and war, business is war, everything is fair.

So that smaller die size may be nice, but I have seen a number of benchmarks showing the 7900GTX anywhere from 4x-10x slower than the X1900XTX in heavy dynamic branching. As a consumer who keeps his graphics card for 24-36 months that is pretty relevant to many consumers. Its nice the 7900GTX is smaller, but does it have the features, IQ, and longevity?

Out of the high end consumers which is around 2% very few of these guys keep thier cards for more then 1 year maybe even as little as 6 months.

A major reason the 9700 lasted as long as it did was because game developers were not able to push shaders to higher limits, because of the fx series being shot to hell.

_xxx_ · Apr 28, 2006

Acert93 said:
Of course NV has to brag on something. They effectively filled all the check boxes while undersupporting SM3.0

"Undersupporting"? :???:

Having the tech a year ahead of the competition is anything else but that IMHO. Even if it was rather slow in the beginning and despite the fact that ATI has a better implementation now. That's like saying "anything that doesn't reach 200 kmh is not a car".

Mintmaster · Apr 28, 2006

Razor1 said:
Just as ATi had to downplay sm 3.0 and dynamic flow control when they didn't have it, its all marketing, all is fair in love and war, business is war, everything is fair.

IMO that was very stupid of ATI. Surely they knew back in 2004 that it would perform much better R5xx, when released, than on NV4x/G7x. Through their own tests they should have found that DB isn't very good on NVidia's architectures, so encouraging it among developers would have been a very good idea. They know it takes a lot of time to get things into games.

Something along the lines of "Use DB even if you don't get any performance gain, because it will help future GPUs" would have been a very useful message. R5xx was clearly designed with good DB in mind right from the beginning, and its design must have started several years back.

EDIT: For clarity.

_xxx_ · Apr 28, 2006

Mintmaster said:
IMO that was very stupid of ATI. Surely they knew it would perform much better on R5xx than on NV4x/G7x. Through their own tests they should have found that DB isn't very good on NVidia's architectures, so encouraging it among developers would have been a very good idea.

Hard to persude the devs in doing so, since the huge majority of the DX9 market was (is?) green.

Mintmaster · Apr 28, 2006

But at the time NVidia was also pushing DB, even if not wholeheartedly. ATI should have joined in. I'm talking about well before X1K was introduced, say a few months after R420 and NV40 were introduced.

_xxx_ · Apr 28, 2006

Well they had to justify their checkboxes, right?

Geo · Apr 28, 2006

Mintmaster said:
But at the time NVidia was also pushing DB, even if not wholeheartedly. ATI should have joined in. I'm talking about well before X1K was introduced, say a few months after R420 and NV40 were introduced.

Particularly given lead-times in the ISV world, that's a really good point. . .

compres · Apr 28, 2006

Mintmaster said:
But at the time NVidia was also pushing DB, even if not wholeheartedly. ATI should have joined in. I'm talking about well before X1K was introduced, say a few months after R420 and NV40 were introduced.

Very good point. Have not looked at it that way. I still don't like implementations which main purpose is to fill check boxes.

Jawed · Apr 28, 2006

Devs aren't stupid - they put DB in a game and it runs slower. They take it back out and make a mental note to revisit the topic in 2 years' time.

http://www.ati.com/developer/gdc/D3DTutorial08_FarCryAndDX9.pdf

SC:CT is the only game using DB and that's nothing to do with performance. Just a way of collapsing shader complexity.

I don't think it's any accident that Crytek switched from NVidia's bed to ATI's.

If R520 had 9 extra months under its belt at dev studios, perhaps there'd be more sign of DB nowadays. That was what ATI should have done (and prolly intended) and they were right to take the piss out of NVidia's DB...

Jawed

nAo · Apr 28, 2006

Maybe they're not as confident about their next gen DB performance.
BTW it seems to me that R580 is not as good as R520 at DB...

Geo · Apr 28, 2006

nAo said:
Maybe they're not as confident about their next gen DB performance.
BTW it seems to me that R580 is not as good as R520 at DB...

Well, that's been the theory; because of larger batches, right? I think even one of the ATI guys said that would probably be true, in a relative sense, somewhere around here.

But do you have something real world to point at to show just how much impact this has in a real world scenario?

Jawed · Apr 28, 2006

Yeah, I'd love to see a conclusion based on game benchmarking rather than synthetics.

Jawed

nAo · Apr 28, 2006

geo said:
But do you have something real world to point at to show just how much impact this has in a real world scenario?

IIRC R520 batches size is 16 pixels, R580 batches size is 48 pixels.. that's all I know

Razor1 · Apr 28, 2006

Jawed said:
I don't think it's any accident that Crytek switched from NVidia's bed to ATI's.

If R520 had 9 extra months under its belt at dev studios, perhaps there'd be more sign of DB nowadays. That was what ATI should have done (and prolly intended) and they were right to take the piss out of NVidia's DB...

Jawed

It definitly wasn't an accident, there was a major reason for it

.

True on the r520, if it came out in time, there would be some games that would have had shaders that utilized DB more, but there are fall back ways to increase performance without using DB, the Humus demo shows soft shadows really aren't accelerated that much by using DB. We wouldn't have seen POM on a r520 either just way to shader intensive for the r520. We would have had to wait for the r580 for this, which I'm not entirely convinced yet that a game using POM will be fully playable on a r580, other then on lower reses, I'm working with it right now, using GLSL, and at 640x480 I'm getting 100 fps, on an inverted box

, I'm not done with it yet, but I don't expect it to go above 100 fps at 1280x1024 on a 12 poly box (x1900xtx)

Acert93 · Apr 28, 2006

Ailuros said:
PR/marketing departments will always find you something to "brag" about, whether it has any importance or not.

Identifying it as marketing is important IMO to get a proper read on what he is saying.

I believe that we'll see fully fletched SM3.0 with D3D10 GPUs in general and alas if ATI wouldn't have had any improvements at all considering SM3.0 when their according GPUs arrived that much later.

The sad reality apart from the back of and forth of either side's marketing campaigns, is that today's games cannot show R580's full potential. It's not "SM3.0 done right" yet rather "SM3.0 done better, yet late". All IMHLO of course.

Done right is relative.

As a NV40 owner, my concern would be that G70/G71 make nominal steps in improving SM3.0 performance. This is interesting (read: typical PR machine) because NV bragged on their SM3.0 capabilities in 2004, now they are taking the "we are smaller" angle--when in fact that benefits them more than consumers seeing as G70/G71 are in the same consumer cost bracket as ATI's larger chips.

NV could have invested more die space in G70/G71 for better dynamic branching and vertex texturing performance. So while ATI may be late, NV can be said to be incomplete. So what is better: Done better (usable) or hitting a check box?

For a consumer like myself getting features that are usable is more important. Check boxes are irrelevant.

Yet I understand the dynamics of this forum and that there are industry people here, and for pure marketing reasons check boxes are frequently more important than usability in regards to sales, is what is more important to most industry people.

All perspective. But I think NV has a long enough history, shrewd as it is from a sales and market penetration and OEM contract position, to go the check box mark and hit performance targets in the following generation.

As for SM3.0 in general, it is not leaving us any time soon. Both next-gen consoles are SM3.0, and that will strongly influence what we see on the PC side for years, especially cross platform titles. We just saw our first SM2.0 only game (Oblivion) well over 3 years after DX9 shipped (Fall 2002). And if history is any indicator, NV's first DX10 part will provide excellent SM3.0 performance, but will probably be insuffecient for DX10 SM4.0 only/heavy tasks. This is conjecture, but this seems to fit Kirk's comments and past trends. There is no future proof GPU, but SM3.0 should not be ignored unless you plan to upgrade in the next year IMO.

Geo · Apr 28, 2006

Checkboxes are not necessarily all marketing. Being out there first, or nearly the same time but better, can have flow thru consequences to the next gen, and even the one after. We've seen both sides complain about that over the years when they got the short-end of that stick. "All the devs used our competitors part for DX[insert gen] and now we're stuck with their bugs as the functional standard instead of the real standard, stuck with their architectural paradgims having been institutionalized at ISVs, and stuck with their performance limitations for [insert techie-dweebie DX functions a, b, and c here]! Boo-hoo!"

David Kirk of NVIDIA talks about Unified-Shader (Goto's article @ PC Watch)

Lezmaka

Acert93

Artist formerly known as Acert93

compres

Ailuros

Epsilon plus three

Razor1

_xxx_

Mintmaster

_xxx_

Mintmaster

_xxx_

Geo

Mostly Harmless

compres

Jawed

nAo

Nutella Nutellae

Geo

Mostly Harmless

Jawed

nAo

Nutella Nutellae

Razor1

Acert93

Artist formerly known as Acert93

Geo

Mostly Harmless

Similar threads