Ati's Technology Marketing Manager @ TechReport

BRiT · May 31, 2004

nelg said:
Has the jury returned its verdict on Nvidia?s SM3.0 performance? Has anyone reached a conclusion that it is viable to use on the Nv40?

No one has reached any verdict on the NV40 SM3 performance for several reasons:
1) Hardly anyone has an NV40-based video card.
2) Nvidia drivers that support features lack performance.
3) Nvidia drivers that have performance lack features.
4) Lack of any genuine SM3 based software.
5) Handful of SM3 demos (PVR) do not run well on NV40 due to driver discontinuity (can't have features and performance).

KimB · May 31, 2004

Mintmaster said:
I'm pretty sure he's talking about an "all else being equal" sort of statement, and with NV40, it isn't. It took 60M transistors for ATI go from 8 pipes to 16 pipes. If ATI went to 220M transistors, it's possible they could have made R420 a 24-pipe beast. Then there's power, though I don't know how much power a non-low-k, 24-pipe R420 would consume. If it's less than NV40, then you could probably increase the voltage and clock in an "all else being equal" situation.

24-pipelines would be a 50% increase. I don't think so. Regardless, if they had added still more pipelines, they wouldn't have been able to clock it as high, so I doubt the performance would have increased by much at all.

Mintmaster · Jun 1, 2004

I think we've learned from R300 that more transistors doesn't necessarily mean lower clock speed, especially if it's just a parallel addition. You need more power, yes, but R420 doesn't consume gobs of power either. In fact, they may be able to increase the clock if they had decided on NV40's design parameters.

Anyway, 50% is enough to make or break a technique. In a slow paced game, 30 fps vs. 20 fps makes a lot of difference. In a fast FPS game, 60 fps is much better than 40. And if you want to compare X800 XT to 6800U (as you did in your previous post), look at the 3-point light phong lighting pixel shader(scroll down). 223 Mpix/s vs. 140. Then if you add 50% to 223, well, it's quite a difference. Yes, this is an extreme example, but it's an example nonetheless. You have to draw the limit somewhere.

Look, I'm not saying ATI's strategy this round is commendable or preferable. I've said several times before that I would buy NV40 if my money was on the line, and expressed my frustration with R420's lack of new features too. I'm just saying Nalasco's argument holds water, and it's not just crap. All else being equal, supporting SM3.0 means a slower chip. Just like the Geforce skipped EMBM, and the Geforce4MX skipped pixel shaders, because if they did on the same transistor budget, they would perform slower.

GraphixViolence · Jun 1, 2004

Chalnoth said:
*sigh* So much crap spewed in that interview, I almost don't know where to begin.

So you decided to begin by spewing some of your own?

When we think we can produce a product with adequate performance that allows you to actually take advantage of some of these new features, then that's when we'll add the features into that product. But if you try to introduce them too early, you're basically adding additional cost to the product that's not providing a benefit to the end user, and that's just something you want to avoid.

Click to expand...

With an attitude like this, we'd have much slower advancement of gaming technology. It's not until low-end hardware is saturated with a certain set of features that those features can be used to their fullest.

So you're saying that, for example, the GeForce FX 5200 is driving the adoption of SM2.0 in software even though it can't run most SM2.0 shaders are usable frame rates? I'd be more inclined to say it's had the opposite effect.

One thing to consider when you're looking at these things is, you know, it might not sound like a huge difference between 24 bits or 32 bits, but what you're basically talking about is 33% more hardware that's required. You need 33% more transistors. Your data paths have to be 33% wider. Your registers have to be 33% wider.

Click to expand...

This is all blatantly false. FP32 math units aren't 33% bigger than FP24 math units, and the parts that you would change wouldn't take up all of the core.

Well, the area required for many functions (like multipliers, registers, and caches) tends to increase linearly with number of bits. And area required for routing certainly increases as the datapath width increases (although I don't know what the typical relationship is). So I don't understand why you find this so hard to believe. It doesn't matter how much of the chip is taken up by the shader units - he's just saying that part of the chip would have been 33% larger, and that space could have been used for additional shader units.

So if you were instead to devote those extra transistors to increasing performance, now you're able to run those 150-instruction shader programs at a much higher speed, so that a lot of techniques that were previously not feasible to run in real time now become feasible.

Click to expand...

There's absolutely nothing that the X800 can do that the GeForce 6800 cannot do. The difference in performance is quite small, and there's simply no algorithm in the world that you could run on the X800 that would be that much faster. So, based on this logic, nVidia definitely took the better route, as they added both performance and features, allowing for more developer freedom.

You've got no evidence to back up this statement. The benchmarks we've seen so far indicate that the X800 outperforms the 6800 in a majority of the applications tested, particularly shader-heavy ones, and often by much greater margins than could be explained simply by the clock speed differences.

[about branching]So performance will potentially be somewhat lower, because now you have to add in the extra pass, but certainly there's that you couldn't doâ€”that you could do only with hardware that supported it.

Click to expand...

That's the understatement of the century. With the wrong kind of loop, performance could be absolutely abysmal without actual dynamic branching support.

And it could also be absolutely abysmal even WITH hardware dynamic branching support. So what's your point?

TR: Won't Shader Model 3 provide an easier-to-use programming model?

Nalasco: Well, not necessarily.

Click to expand...

Yeah, right!!!

Come on! SM3 adds flexibility. This makes programming easier. Without that flexibility, you have to use hacks and workarounds. That's never easier than straight programming. This is just a silly argument, as he's basically arguing that since you can do more advanced things with SM3, it's going to be harder to develop for.

No, the point is that when you want to do the exact same thing on SM2 and SM3, it's either going to be identical or easier with SM3.

No, you ignored the rest of his statement. He was saying that it's much more difficult to make efficient hardware and compilers for SM3 and SM2, so your supposedly simpler SM3 code may end up running inexplicably slow, and it will be difficult to debug, or you may end up using hacks and workarounds. SM2 avoids the difficult areas and delivers more predictable performance. The PS2.0a support Nvidia has in the FX series is a perfect example of how more flexibility doesn't necessarily translate into easier programming.

Nobody in their right mind is going to apply a blur filter to a game.

There's a great quote. Apparently you haven't played many DX9 games lately. Blur filters are one of the most commonly used post-process effects!

KimB · Jun 1, 2004

Mintmaster said:
I think we've learned from R300 that more transistors doesn't necessarily mean lower clock speed, especially if it's just a parallel addition. You need more power, yes, but R420 doesn't consume gobs of power either. In fact, they may be able to increase the clock if they had decided on NV40's design parameters.

You're living in a dream world. Regardless, what's really important is the die size. The NV40 is, apparently, only about 10% larger than the R420. I don't see any way to fit 50% more pipelines in. And, the fact remains that they didn't. They went for a lower transistor budget and higher clock speeds instead.

Anyway, 50% is enough to make or break a technique.

No, it's not. This is why we have adjustable resolution.

And besides, this 50% crap is stupid. Come on. No matter which way you slice it, ATI could not have made the R420 into a 24-pipeline design with the features it has and still have it come in with as many transistors as the NV40. It would have quite a few more.

And if you want to compare X800 XT to 6800U (as you did in your previous post), look at the 3-point light phong lighting pixel shader(scroll down). 223 Mpix/s vs. 140. Then if you add 50% to 223, well, it's quite a difference. Yes, this is an extreme example, but it's an example nonetheless. You have to draw the limit somewhere.

Try comparing the partial precision numbers, at least. That's what is actually useful for games.

KimB · Jun 1, 2004

Oh, and I'd like to make one more comment:

One thing about counting transistors is that every time you see a transistor count on a chip, it's almost certainly a rough estimate. The reason is that there's no simple and straightforward way to just go and count all the transistors on a large ASIC.

This is the stupidest thing I've ever heard. If it were impossible to actually count the number of transistors in a design, the thing could not be fabricated. Come on. Granted, we all know that the number gets rounded up or down, but the idea that they can't be counted is just ludicrous.

Dave Baumann · Jun 1, 2004

You're living in a dream world. Regardless, what's really important is the die size. The NV40 is, apparently, only about 10% larger than the R420.

Thats a 10% surface area differnce. Do they both use the same number of layers?

Joe DeFuria · Jun 1, 2004

Chalnoth said:
You're living in a dream world.

It's you who's living in an nVidia dream world.

I don't see any way to fit 50% more pipelines in.

Why not?

Maybe they could, and maybe they couldn't. They could certinaly fit at least another quad in, you think? That's the point.

And, the fact remains that they didn't. They went for a lower transistor budget and higher clock speeds instead.

What dream world are you living in? They went for a lower transistor budget (smaller die size), yes. This says nothing about going for a "higher clock speed." ATI stuck to a power consumption budget, which, combined with transistor count is what ultimately would dictate their clock rate target.

What it comes down to, is nVidia decided to not give much a crap about power supply, and it shows.

And besides, this 50% crap is stupid. Come on. No matter which way you slice it, ATI could not have made the R420 into a 24-pipeline design with the features it has and still have it come in with as many transistors as the NV40. It would have quite a few more.

I don't see how. ATI doubled their pipelines (R300 to R420) and only increased their transistor budget by a factor of what, 40%?

Try comparing the partial precision numbers, at least. That's what is actually useful for games.

In other words, all that extra transistor needed for FP32 on the NV40 is a waste for games and gamers. You can't have it both ways, Chal.

WaltC · Jun 1, 2004

Chalnoth said:
Oh, and I'd like to make one more comment:

This is the stupidest thing I've ever heard. If it were impossible to actually count the number of transistors in a design, the thing could not be fabricated. Come on. Granted, we all know that the number gets rounded up or down, but the idea that they can't be counted is just ludicrous.

The next dumbest thing is to actually count the number of transistors, make a marketing issue out of it, and pretend that it tells us something meaningful about performance and functionality...

IMO, the "total number of transistors" in a chip *may* tell us something about likely yields (may)--but that's about it...

Geo · Jun 1, 2004

Sources: http://www.beyond3d.com/misc/chipcomp/ and http://www.beyond3d.com/reviews/ati/r420_x800/index.php?p=6

R300: 107m transistors, 8 pipelines, 4 vertex shaders.

R420: 160m transistors, 16 pipelines, 6 vertex shaders, and whatever little odds and sods they stuffed in there, like 3dc and those HDs

Therefore, transistor budget for 8 pipelines, 2 vertex shaders, and odds & sods = 53m transistors.

So come again on what they couldn't have done with an extra 60m?

What I don't get is the implications of the earlier ATI statement that they were going to cut margins this round by accepting lower yields. . and they still took 60m transistors in the shorts compared to NV. What does this mean? I suspect ATI is wondering what it means too, and will be watching closely just how many NV40 16 pipe cards actually become available.

Either that or they are using their profits from Office to subsidize the fact that they can't possibly be making any money on. . .oh wait, wrong company.

KimB · Jun 1, 2004

nVidia actually considers all of the transistors in a chip when they report a transistor count. ATI only counts logic transistors, so the R420 and NV40 are quite a bit closer in actual transistor count than that analysis suggests.

AlphaWolf · Jun 1, 2004

Chalnoth said:
nVidia actually considers all of the transistors in a chip when they report a transistor count. ATI only counts logic transistors, so the R420 and NV40 are quite a bit closer in actual transistor count than that analysis suggests.

So you've counted them have you? You have evidence to back this up?

jimmyjames123 · Jun 1, 2004

I thought that ATI claimed that they count transistors in a different way than NV, counting only the transistors associated with logic.

With respect to "proof", it works both ways. We don't have hard "proof"regarding how close or how far ATI's transistor count is to NV's transistor count. However, assuming that ATI only counts logic transistors, it is safe to assume that the actual transistor counts between ATI and NV are much closer than it seems from looking solely at product specs.

The whole inconsistency between companies in counting transistors is a bit silly, because there are both pluses and minuses to underestimating and overestimating transistor count, and in the end it is consumers and analysts who are confused. Transistor count at the moment is only a useful measure when comparing cards within the same company.

AlphaWolf · Jun 1, 2004

jimmyjames123 said:
I thought that ATI claimed that they count transistors in a different way than NV, counting only the transistors associated with logic.

With respect to "proof", it works both ways. We don't have hard "proof"regarding how close or how far ATI's transistor count is to NV's transistor count. However, assuming that ATI only counts logic transistors, it is safe to assume that the actual transistor counts between ATI and NV are much closer than it seems from looking solely at product specs.

The whole inconsistency between companies in counting transistors is a bit silly, because there are both pluses and minuses to underestimating and overestimating transistor count, and in the end it is consumers and analysts who are confused. Transistor count at the moment is only a useful measure when comparing cards within the same company.

What I recall is that one guy from ATi when pressed about the difference in teh transistor counts of r3xx and nv3x suggested that perhaps nvidia counted transistors differently. No one to my knowledge has actually investigated if there is a disparity in the counting methods.

Nvidia has claimed r420 has 160 million transistors as has Ati, nvidia claims that nv40 has 220 million transistors and I haven't seen Ati dispute this fact either. The only people I know saying that r420 has more than 160 million transistors are you and Chalnoth. So why don't you run out and buy one and start counting, because as far as I am concerned the onus of proof lies with you (maybe Chalnoth will help you, you can each count the transistors on one of the cores).

jimmyjames123 · Jun 1, 2004

LOL, I don't think you know what you are talking about. There are plenty of people who believe that ATI counts transistors in a different manner than NV, including DaveB I think. Prove us wrong

AlphaWolf · Jun 1, 2004

jimmyjames123 said:
LOL, I don't think you know what you are talking about. There are plenty of people who believe that ATI counts transistors in a different manner than NV, including DaveB I think. Prove us wrong

ah the arguement of a man who hasn't a fact to stand on. I presented my evidence and you come back with this crap?

/dismiss

jimmyjames123 · Jun 1, 2004

You presented evidence? LOL, and what "evidence" exactly is that?

Geo · Jun 1, 2004

jimmyjames123 said:
LOL, I don't think you know what you are talking about. There are plenty of people who believe that ATI counts transistors in a different manner than NV, including DaveB I think. Prove us wrong

DaveBaumann said:
You're living in a dream world. Regardless, what's really important is the die size. The NV40 is, apparently, only about 10% larger than the R420.

Click to expand...

Thats a 10% surface area differnce. Do they both use the same number of layers?

That would not appear to be a quote from a man who has made up his mind on the issue.

Pete · Jun 1, 2004

Nobody in their right mind is going to apply a blur filter to a game.

Quincunx.

Heathen · Jun 1, 2004

So why don't you run out and buy one and start counting

160,000,427... Lost count a few times but I managed it in the end.

Ati's Technology Marketing Manager @ TechReport

BRiT

(>• •)>⌐■-■ (⌐■-■)

KimB

Mintmaster

GraphixViolence

KimB

KimB

Dave Baumann

Gamerscore Wh...

Joe DeFuria

WaltC

Geo

Mostly Harmless

KimB

AlphaWolf

Specious Misanthrope

jimmyjames123

AlphaWolf

Specious Misanthrope

jimmyjames123

AlphaWolf

Specious Misanthrope

jimmyjames123

Geo

Mostly Harmless

Pete

Moderate Nuisance

Heathen

Similar threads