ATI's decision concerning TMUs

WaltC · Apr 23, 2006

Mintmaster said:
Die size is mostly a result of ATI's design goal to make dynamic branching fast. ATI had a very compact shader pipeline in R300 and R420, and adding NV-level PS3.0 functionality wouldn't be that much more. Instead, ATI completely revamped the way they did pixel shading for dynamic branching. Doing different things on small batches is much less efficient than doing the same thing one large batch, whether you're shading pixels or running a manufacturing business.

Yes, and I'll only add that from the beginning ATi stated it had no intention of doing "nV-like 3.0," or any other such similar half measures.

SM3.0 brings techniques that enable very new algorithms with dynamic branching and vertex texturing. NVidia is very slow at these, but hey, it doesn't matter. They made the right decision, because they've got the checkmark, and nobody's using these features - the hallmarks of SM3.0 - in games.

Yes, and so...while generating PR feature checkmarks sans useful implementations might well help an IHV sell more units, it does little to actually inspire developers to support such features in more games. In the longer run the game developers will be supporting useful implementations as opposed to marketing checkmarks, and so I've always concluded that useful implementations drive software development progress, while empty or largely symbolic marketing checkmarks tend to relieve game development progress of its forward momentum.

If we look back at the record over the last several years, every time a company in active competition with nV at the time would introduce a new and useful feature--such as FSAA, for instance--the immediate reaction from nV might be characterized as "What for?" and "Who needs it?" I well remember this all the way back to 3dfx and the V5 FSAA introduction, with nVidia saying things like, "We believe that what users want in 3d games is resolution increases as opposed to FSAA," etc.

We could talk about a lot of specific things, like nV's initial and useless implementation of fp32 for marketing even while it was actually utilizing fp16, and of course looking at ATi's fp24 at the time and not only saying things like "Who needs it?" and "What for?", but also saying things like, "fp24 is mathematically unnatural" and other such amusing slogans, but I think I've made the point I wanted to make.

With nV, at first, everything seems to be an "either-or" proposition, instead of a "both" proposition, if you know what I mean. Ironically, at the time 3dfx's SLI (the original SLI) was so popular, nV was again saying, "What for?" and "Who needs it?", to only years later actually cop the very same "SLI" acronym for doing the very same thing. ATi is to be commended I think for refraining as long as it did to chase the "SLI" tail, and criticized, I think, for deciding to chase it at all...

(There is much indeed I could criticize about 3dfx's marketing at the time, but as 3dfx was absorbed by nVidia, I see no point.)

Ditto ATi's outstanding initial SM2.0 support inside R300, and so on, ad infinitum. nV reacts to that by rushing a fairly useless version of SM3.0 hardware into the markets, to try and "one-up" ATi's success with SM2.0, without apparently realizing that what helped ATi with R300 wasn't a simply an empty marketing checkmark, but a solid implementation of SM2.0. Big difference.

nV's just too reactionary, imo, without much in the way of a clear vision of where it wants to go in terms of supporting the future of 3d. It's good to focus on selling, selling, and selling your products--nothing wrong with that at all. But I think it's better to focus first on creating the kinds of products people wish to buy so that you can sell, sell, and sell your products into eager markets. It's fundamentally a difference between style and substance, I think. The stylist always reacts while the company creating the substance takes the lead.

Just my lowly opinion, of course...

But this is why I think nV did SM3.0 when it did and how it did, and is why ATi did SM3.0 when and how it did.

Tahir2 · Apr 23, 2006

nV's just too reactionary, imo, without much in the way of a clear vision of where it wants to go in terms of supporting the future of 3d.

The fact that it can react so quickly is an impressive feat.

It has one vision, to light up every pixel on every desktop in the world. Ask Jen.

And having FP32 first was an impressive feat albeit useless in most situations.
ATI stagnated with the R4xx series... and NVIDIA had an excellent SM2.0 featureset with the GF 6 series and beyond with the added SM3.0 checkbox feature.
It worked for them as lots of people switched from ATI to NVIDIA.

And WaltC.. why do you use 1000 words when 10 will do?

Before you think I am an NVIDIA fan, or Mintmaster is, some people are just fans of technology... the politics and PR are just a side issue.

Humus · Apr 23, 2006

Razor1 said:
Really think that the x1900 DB performance will hold up to shaders that require dynamic branching? These shaders are quite a bit longer and very expensive compared to shaders we are using now . Even ATi's "improved dynamic branching" performance on the x1900's won't be usable when the time comes when these types of shaders are in heavy use.

Why do you think dynamic branching shaders neccesarily are "very expensive"? One of the points of dynamic branching is to make shading cheaper by skipping a lot of work.

Razor1 · Apr 23, 2006

Humus said:
Why do you think dynamic branching shaders neccesarily are "very expensive"? One of the points of dynamic branching is to make shading cheaper by skipping a lot of work.

BRiT said:
Bollocks.

lets take for example toy shop demo. this demo uses DB extensively for its ambient occlusion parrallex shader. Now compare the performance of that shader vs. relief bump mapping. I would say there is no need to use AOP with DB since it performs slower then relief (relief bump mapping can be modified to give a softer shadow edge, by doing so you loose a little performance but also take care of the possible artifacts that might occur). This is why I say DB should only be used when a shader truely requires it and there are no possible alternatives that will be faster. And it will come down to shaders that are long enough to require DB, not a shader that takes up 40 to 60 instructions.

Moloch · Apr 23, 2006

Tahir2 said:
The fact that it can react so quickly is an impressive feat.

It has one vision, to light up every pixel on every desktop in the world. Ask Jen.

And having FP32 first was an impressive feat albeit useless in most situations.
ATI stagnated with the R4xx series... and NVIDIA had an excellent SM2.0 featureset with the GF 6 series and beyond with the added SM3.0 checkbox feature.
It worked for them as lots of people switched from ATI to NVIDIA.

And WaltC.. why do you use 1000 words when 10 will do?

Before you think I am an NVIDIA fan, or Mintmaster is, some people are just fans of technology... the politics and PR are just a side issue.

He has a small wee wee

Btw walt posts at several other sites.. arstechnica amongst them.
He does the same thing there..
It's also amusing to see his posts on the subject of hyperthreading as well.. as he is over his head over there.
Not to say I know more than him in that regard but I don't go around talking about matters I know very little about.
/end rant soley on waltc

Hellbinder · Apr 23, 2006

From what i have seen ATi is getting throttled by Nvidias new offerings in a majority of games at most of the resolutions people play at. There are a few exceptions.

I dont see where tripple the shaders has Done Ati any good at all. Its not even a full 20% gain even in shader heavy apps. But people will make one excuse after the next.

"Triple the shaders" clocked through the roof, huge power draws, huge heat generation. I just want to know at what point did all the smart people at Ati quit and they hired all the Gforce FX designers.

We need more, shallow pipelines, with adequate Pixel Fill rate. The world just is not shader driven yet to the degree Ati seems to think it is. 24/48 would be the Ideal design for the R580 imo. It seems like common sense to me.

to bad its not.

Moloch · Apr 23, 2006

Hellbinder said:
From what i have seen ATi is getting throttled by Nvidias new offerings in a majority of games at most of the resolutions people play at. There are a few exceptions.

I dont see where tripple the shaders has Done Ati any good at all. Its not even a full 20% gain even in shader heavy apps. But people will make one excuse after the next. "Triple the shaders" clocked through the roof, huge power draws, huge heat generation.

I just want to know at what point did all the smart people at Ati quit and they hired all the Gforce FX designers.

We need more, shallow pipelines, with adequate Pixel Fill rate. The world just is not shader driven yet to the degree Ati seems to think it is. We should have 24/48 would be the Ideal design for the R580 imo. It seems like common sense to me.

to bad its not.

http://firingsquad.com/hardware/2560_1600_gaming_preview/images/fear.gif
...

DemoCoder · Apr 23, 2006

You have to look at the flip side. Would ATI have concentrated so hard on delivering a good SM3.0 implementation and so soon if Nvidia hadn't made it an important feature for marketing? Sometimes, someone has to take the plunge in order to prove that there is market demand for something.

SLI for example. ATI totally underestimated it. And I think, the anti-SLI whiners have missed perhaps an important and serendipitious fallout from NVidia pushing SLI -- the potential future of using GPUs for Physics processing (and maybe Aureal-like physics based audio processing). Without SLI, the only way we'd get physics processors is via a solution like Ageia, which I think will fail, for many reasons.

But with SLI, one has the following options:

1) SLI motherboard, 1 GPU. Don't use second slot. Later when Physics support becomes mainstream, buy a second GPU and viola.

2) SLI motherboard. 2 GPUs. While you're waiting for Physics support to show up in games, enjoy enhanced Gfx/AA performance in older games. Then, when Physics support shows up, use the second GPU to enhance the game effects.

Because both Nvidia and ATI now support SLI, there is a much better chance for GPU based physics acceleration to be a reality, both from a developer perspective, but also from a consumer marketing perspective.

Now, neither NVidia nor ATI had any idea this was going to happen, it's a serendipitous development, like many of the externalities produced by competition in a capitalist system. But the fact is, SLI is being driven from being an esoteric single vendor niche, into a multivendor commonality, and this is a positive development for everyone with respect to future possibilities around the usage of PCI-E MBs with 2 x8 or x16 slots.

BTW, Didn't WaltC promise to never ever talk about NV again?

Mintmaster · Apr 23, 2006

ATI had the chance to completely thwart NVidia's efforts to make SM3.0 an important feature. All they had to do was include FP blending. I'm positive that this was a heavily requested feature by developers, and it's not a hard thing to do in hardware. I think ATI was banking on the installed R3xx userbase to be enough to discourage devs from making a path that only runs on NV4x, but they grossly miscalculated. By not doing so, HDR became associated with NVidia's SM3.0 cards, and you had to get an NVidia card to get all the goodies in graphics.

From the point of view of the consumer, it doesn't matter whether PS3.0/VS3.0 or FP blending is what makes HDR and currently used effects possible, because all cards with one have the other. For discussion of the "what-ifs" here at B3D, though, the distinction is important.

Mintmaster · Apr 23, 2006

Hellbinder said:
I dont see where tripple the shaders has Done Ati any good at all. Its not even a full 20% gain even in shader heavy apps. But people will make one excuse after the next.

Why are you people so obsessed with the number of pipes? That is completely irrelevent. All that matter is transistor increase. R580 gets more performance per transistor than R520 in most current games. And 20% is completely lowballing it. When R520 came out it was judged to be barely faster than G70. G71 comes out with a 51% clock speed increase over G70, and R580 outperforms it. How the hell to you come to the conclusion that R580 is only 20% faster than R520?

We need more, shallow pipelines, with adequate Pixel Fill rate. The world just is not shader driven yet to the degree Ati seems to think it is. 24/48 would be the Ideal design for the R580 imo. It seems like common sense to me.

16/16 is 320M transistors. 24/24 would be over 420M. 24/48 would bring you over 460M. It is not common sense.

The 3:1 ratio is a very good decision. They are outperforming NVidia with a heavy texturing deficit. Using the same number of transistors for any lower ratio would give you lower performance given the same silicon budget.

The bad decision by ATI is allowing everything else get so huge for a performance trait that won't show up in games for a long time. Everybody, please just drop the dead-end texturing argument.

Mintmaster · Apr 23, 2006

Razor1 said:
lets take for example toy shop demo. this demo uses DB extensively for its ambient occlusion parrallex shader. Now compare the performance of that shader vs. relief bump mapping. I would say there is no need to use AOP with DB since it performs slower then relief (relief bump mapping can be modified to give a softer shadow edge, by doing so you loose a little performance but also take care of the possible artifacts that might occur).

Do you have any idea what you're talking about?

Parallax occlusion mapping gets rid of the warping and flatting effect that ordinary parallax mapping does (I assume you're not talking about RTM, as that's very impractical for games). This has nothing to do with shadows. The only other reasonable way to get rid of the distortion and flattening of parallax mapping is with distance functions, and that can be drastically sped up with dynamic branching also, especially when you want to get rid of artifacts from insufficient samples.

These shaders are quite short, as they loop over a small segment of code a variable number of times. Without DB, you have to fix the iterations to your worst case. The distance function technique has two instructions in the loop. The paper did 16 iterations, and even that's not enough for less ideal bump maps. Dynamic branching lets you stop whenever you want. You can put the limit at 128 iterations to avoid artifacts, and you might average only 10.

The practical benefits are visible right now, and these techniques have playable framerates. In the POM paper, they reduced the poly count of a model from 1.5M to 1100, and this increased the framerate from 30fps to 230fps. The problem is that implementing these techniques into games take time. ATI should have waited until R600 to do all this DB stuff, because they have XB360 to promote implementation of these effects.

You are just making a BS claim about DB with no experience or evidence to back it up. Its usefulness is completely independent of shader length.

Ailuros · Apr 23, 2006

Multitexturing fillrates will lose more and more in importance as time goes by. I'd expect to see in coming NV designs entirely de-coupled texture from arithmetic OPs and there I don't think we'll see MT fillrates to scale as they did up to now.

Jawed,

That was an excellent post, albeit you hate my guts

DemoCoder,

Don't ask silly questions

Ailuros · Apr 23, 2006

Mintmaster said:
ATI had the chance to completely thwart NVidia's efforts to make SM3.0 an important feature. All they had to do was include FP blending. I'm positive that this was a heavily requested feature by developers, and it's not a hard thing to do in hardware. I think ATI was banking on the installed R3xx userbase to be enough to discourage devs from making a path that only runs on NV4x, but they grossly miscalculated. By not doing so, HDR became associated with NVidia's SM3.0 cards, and you had to get an NVidia card to get all the goodies in graphics.

From the point of view of the consumer, it doesn't matter whether PS3.0/VS3.0 or FP blending is what makes HDR and currently used effects possible, because all cards with one have the other. For discussion of the "what-ifs" here at B3D, though, the distinction is important.

That sounds too much like a bundle of stratetic moves from ATI. I'm naiver than that and prefer to think of specific design decisions and priorities. I'm positive that at NV some engineers would had prefered to see a lot of things differently implemented too especially for G70, but as always the usual "not enough time/not enough transistors" bell started ringing.

As I said I expect to see "SM3.0 done right" with D3D10 GPUs.

Ailuros · Apr 23, 2006

DemoCoder said:
You have to look at the flip side. Would ATI have concentrated so hard on delivering a good SM3.0 implementation and so soon if Nvidia hadn't made it an important feature for marketing? Sometimes, someone has to take the plunge in order to prove that there is market demand for something.

I recall an original "R400" for the PC being canned and R4xx costing supposed "man-hours" in developing time. How sure are you that those ideas weren't quite older than you seem to predict especially if you think of the timespan Xenos had been introduced?

Yes I know it's a console design and yes I know it's a USC; but from my understanding there must be quite a lot similiarities to that old R400 for one and secondly Xenos seems to be a baseline for a multitude of future architectures including the PDA/mobile market.

Subtlesnake · Apr 23, 2006

Hellbinder said:
I dont see where tripple the shaders has Done Ati any good at all. Its not even a full 20% gain even in shader heavy apps.

Yes it is:

http://www.anandtech.com/video/showdoc.aspx?i=2679&p=7

Dave Baumann · Apr 23, 2006

DB can also fairly easyily be applied to provide a performance and quality increase with shadowmapping already:

http://www.beyond3d.com/reviews/ati/r580/index.php?p=04

Razor1 · Apr 23, 2006

Mintmaster said:
Do you have any idea what you're talking about?

Parallax occlusion mapping gets rid of the warping and flatting effect that ordinary parallax mapping does (I assume you're not talking about RTM, as that's very impractical for games). This has nothing to do with shadows. The only other reasonable way to get rid of the distortion and flattening of parallax mapping is with distance functions, and that can be drastically sped up with dynamic branching also, especially when you want to get rid of artifacts from insufficient samples.

These shaders are quite short, as they loop over a small segment of code a variable number of times. Without DB, you have to fix the iterations to your worst case. The distance function technique has two instructions in the loop. The paper did 16 iterations, and even that's not enough for less ideal bump maps. Dynamic branching lets you stop whenever you want. You can put the limit at 128 iterations to avoid artifacts, and you might average only 10.

The practical benefits are visible right now, and these techniques have playable framerates. In the POM paper, they reduced the poly count of a model from 1.5M to 1100, and this increased the framerate from 30fps to 230fps. The problem is that implementing these techniques into games take time. ATI should have waited until R600 to do all this DB stuff, because they have XB360 to promote implementation of these effects.

You are just making a BS claim about DB with no experience or evidence to back it up. Its usefulness is completely independent of shader length.

Of course, one could think up alternative ways to implement some of the algorithms that we have used in this demo. For example, you could use relief mapping instead of parallax occlusion mapping. The relief mapping technique performs well on both ATI and NVIDIA hardware because it doesn't utilize dynamic branching and also makes heavy use of the dependent texture reads. However, in my quality results tests for comparison of these two techniques, the relief mapping technique displayed visual artifacts on our current dataset.

http://www.beyond3d.com/forum/showpost.php?p=593755&postcount=78

This is something we have done, the artifacts, are the shadow artifacts of the bump map where they become pixelated around the edge of the shadow, I can't get into any more detail about what and how we are reducing the artifacts, but its possible to reduce these artifacts without much loss in speed, and definitly don't need to take distance into account for it either. Also can use horizon mapping as an alternative aswell, though that tends to have a memory impact.

Dave is right about the soft shadows though.

Humus · Apr 23, 2006

Razor1 said:
lets take for example toy shop demo. this demo uses DB extensively for its ambient occlusion parrallex shader. Now compare the performance of that shader vs. relief bump mapping. I would say there is no need to use AOP with DB since it performs slower then relief (relief bump mapping can be modified to give a softer shadow edge, by doing so you loose a little performance but also take care of the possible artifacts that might occur). This is why I say DB should only be used when a shader truely requires it and there are no possible alternatives that will be faster. And it will come down to shaders that are long enough to require DB, not a shader that takes up 40 to 60 instructions.

You mean Parallax Occlusion Mapping? Either way, you're comparing different techniques, so that's not very useful. But I could easily rip out the dynamic branching from for instance the ParallaxMapping sample in the ATI SDK. For the distance function technique performance dropped from 83fps to 28fps on my X1800XL AIW.

For dynamic branching to be faster you don't really need to have a long shader. The POM or distance function loops is quite short. A typical shader would be pretty short, while the equivalent non-db shader could be long if you want to achieve the same quality. Even in just simple lighting shaders you'll get a nice speedup by doing early out on the attenuation. It's not about the length of the shader, but the ratio between the amount of work you save and what you still need to compute.

dizietsma · Apr 23, 2006

Ona positive note for Ati the X1800GTO looks a really good card, that's the sort of price range I was hoping the 7900GT was going to be at but no, nvidia are just counting money/margins. Nice overclock as well on that thing too.

On a negative note for ATi they reduced the price of the X1600XT, then they reduced it again ..and again and it is still overpriced when compared to the 7600GT when taking into account the 7600GT's large performance lead it seems.

Club3d 7600GT = Â£137.01
Club3D X1600XT = Â£126.00

If 3x the shading power is suppose to futureproof your card it does not seem to be doing a very good job in current high end games like FEAR at present, never mind 1Q2007.

Mind you the X1600XT are in stock (maybe unsurprisingly) whereas the 7600GT is not at the place I got the comparison from.

Humus · Apr 23, 2006

Hellbinder said:
From what i have seen ATi is getting throttled by Nvidias new offerings in a majority of games at most of the resolutions people play at. There are a few exceptions.

I dont see where tripple the shaders has Done Ati any good at all. Its not even a full 20% gain even in shader heavy apps. But people will make one excuse after the next.

"Triple the shaders" clocked through the roof, huge power draws, huge heat generation. I just want to know at what point did all the smart people at Ati quit and they hired all the Gforce FX designers.

We need more, shallow pipelines, with adequate Pixel Fill rate. The world just is not shader driven yet to the degree Ati seems to think it is. 24/48 would be the Ideal design for the R580 imo. It seems like common sense to me.

to bad its not.

To begin with, it's not "triple the shaders" but triple the ALUs. An app can be "shader heavy" without being ALU limited or being just partly so. So yes, you're not going to see 3x the performance today, but you can expect something like 2x in games in a year or two when shader ALU workload has grown. Of course, if you upgrade every six months, you can see it as a 20-30% gain. If you plan to keep your card a year or two, you can expect bigger benefits in the future.

ATI's decision concerning TMUs

WaltC

Tahir2

Humus

Crazy coder

Razor1

Moloch

God of Wicked Games

Hellbinder

Moloch

God of Wicked Games

DemoCoder

Mintmaster

Mintmaster

Mintmaster

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Subtlesnake

Dave Baumann

Gamerscore Wh...

Razor1

Humus

Crazy coder

dizietsma

Humus

Crazy coder