Nvidia GT300 core: Speculation

Status
Not open for further replies.
Though you could argue that being so huge meant it couldn't be clocked high enough.

I'd say that its usual for almost any new architecture. RV770 couldn't have reached RV790 peaks upon its launch either.

For what it's worth I suspect GDDR5 and 40nm will enable NVidia to make GT300 fairly spectacular.
Which would suggest a fairly spectacular increase in raw bandwidth. IMHLO I'd expect raw bandwidth to scale by the smallest factors of many other aspects.
 
No it means their production ramp was brought forwards. We saw the same again with the paper launch of GTX275.
GTX275 was made to counter 4890 and wasn't final until 4890 specs become finalized.
GTX260 is a salvage part for which you need to have something to salvage and for that you need to have full production running for some time. And you can't have production running and not sell what is produced. GT200 was ready to go in production from March 08 AFAIK, but the production cycle was started only when AMD decided on the RV770 release time frame.
 
The performance per fixed-function unit of GT200 is appalling. 80 TMUs and 32 ROPs can barely beat RV770's halved configuration for both these things. NVidia's math is also considerably slower per transistor, particularly double-precision.

Moore's law is about using your transistors smartly as well as stamping out more of them. NVidia put so much effort into CUDA they forgot the graphics.

For me the real question here is:
Did this pay off? Will nVidia's current products be significantly better in OpenCL than ATi's?
If so, that would mean they could be on the right track for the future... if not, then GT300 had better be a different architecture, which balances graphics, GPGPU and transistorcount in a more favourable way.

As an aside, it's pretty ironic that AMD is trying to 'squeeze' nVidia by getting better performance per transistor, and turning it into a price war... while Intel is doing the same thing to AMD in the CPU department.
 
GTX275 was made to counter 4890 and wasn't final until 4890 specs become finalized.
GTX260 is a salvage part for which you need to have something to salvage and for that you need to have full production running for some time. And you can't have production running and not sell what is produced.
Yeah, sure, TSMC only produced 1 wafer a week for NVidia, back then.

It's a salvage part because most dies (don't like calling em dice) on the wafer aren't capable of being GTX280. If the majority of dies were capable of being GTX280 then NVidia badly miscalculated.

If NVidia was stock-piling GT200s for the launch of GTX280 then most of that pile would have been GTX260. What's the ratio for them in consumer's hands? 3:1?

Everything points to a rush job.

GT200 was ready to go in production from March 08 AFAIK, but the production cycle was started only when AMD decided on the RV770 release time frame.
NVidia back then, in its own mind, wasn't competing with AMD, because AMD had already become "irrelevant". NVidia launched 9800GX2 back when you say they could have launched GT200:

http://www.techreport.com/articles.x/14355

which indicates GT200 was not ready.

So, maybe you mean "go in production" as the start of final wafer manufacture? If a wafer takes a minimum of 2 months to pass through TSMC and supposedly 3 months for normal production, then starting production in March merely enables them to have launched when they did.

Jawed
 
It's a salvage part because most dies (don't like calling em dice) on the wafer aren't capable of being GTX280. If the majority of dies were capable of being GTX280 then NVidia badly miscalculated.

If NVidia was stock-piling GT200s for the launch of GTX280 then most of that pile would have been GTX260. What's the ratio for them in consumer's hands? 3:1?

Everything points to a rush job.
So explain me then how's this "rush job" went so that the majority of "salvage" dies weren't ready while the minority were? In your own words: NV had more GT200s for GTX 260 at the same time as it had GT200s for GTX 280. Why wasn't GTX 260 available with GTX 280 then?
I'm saying that the production levels of GT200 at that time wasn't ready for high volume part because they started GT200 production only when they were sure that RV770 is coming. And that has nothing to do with when GT200 silicon was ready to go into production. What don't you understand?

NVidia back then, in its own mind, wasn't competing with AMD, because AMD had already become "irrelevant".
Well, AMD was largely irrelevant since R580 and during R520 before that so can you really blame NV for overlooking RV770 wonder? They became arrogant, sure, but it's not like they had no reasons for that.

NVidia launched 9800GX2 back when you say they could have launched GT200:
9800GX2 was launched as an answer to 3870X2 and as far as i know GT200 wasn't ready by the time they launched 9800GX2 (production ramp takes some time so even of GT200 was ready to go in production in March it doesn't mean that it was ready to go to market in March). But i have to say that 9800GX2 launch was always considered to be a mistake by me and some of my collegues.

So, maybe you mean "go in production" as the start of final wafer manufacture? If a wafer takes a minimum of 2 months to pass through TSMC and supposedly 3 months for normal production, then starting production in March merely enables them to have launched when they did.
You may launch a product a month or so after the production start although this won't be normal. The point here is that NV was waiting during Spring'08 for RV770. GT200 was ready to go into production. So this 6 months figure for GT300 from tapeout to market is actually very close to the timing of GT200. December-June is 6 months after all even if NV may have launched GT200-based products earlier than it did.
 
For me the real question here is:
Did this pay off? Will nVidia's current products be significantly better in OpenCL than ATi's?
I imagine this'll depend on who writes the OpenCL - from what I've seen so far of CUDA/Brook+ programming one has to tweak the algorithm extensively for the granularities of the architecture (if the memory system is at all dominant as a factor in performance).

A simple example is SGEMM. On NVidia you can only get decent performance if you use shared memory. On ATI you get better performance (in absolute terms), and if you use LDS then you get worse performance - ATI caches just work well enough that putting data into LDS slows things down. That's an extreme example, I'm sure - but still, it's entertaining.

On the other hand, there's folding@home, which runs 2-3x better on NVidia because of shared memory. But this factor tails-off as the molecule increases in size.

In the experimentation I've been doing with Brook+ (which compiles to IL) and the resulting "experimentation" with compilation from IL into machine code, I see lots of immaturity and brokenness. For example machine code sometimes doesn't even pack scalar registers (it usually does) into vectors - vector registers are the only allocation possible, though they can be freely accessed as scalars. This wastes register allocation on a massive scale :oops:

So it seems to me that AMD's compilers are going to look immature. EDIT: I suspect we can see evidence for this in games like Far Cry 2 where performance takes months to get to where it should be.

Jawed
 
Last edited by a moderator:
So explain me then how's this "rush job" went so that the majority of "salvage" dies weren't ready while the minority were? In your own words: NV had more GT200s for GTX 260 at the same time as it had GT200s for GTX 280. Why wasn't GTX 260 available with GTX 280 then?
It's very simple, they rushed GTX280 by air and let GTX260 arrive by ship. This is the hallmark of a rushed launch.

I'm saying that the production levels of GT200 at that time wasn't ready for high volume part because they started GT200 production only when they were sure that RV770 is coming. And that has nothing to do with when GT200 silicon was ready to go into production. What don't you understand?
What I don't understand is how long "ready to go into production" means? All you've convinced me of, so far, is that GT200 was at best 1 or 2 weeks ready for production, because GTX260 was between 1 and 2 weeks after GTX280 and shipping time is of the order of 2 or 3 weeks.

Well, AMD was largely irrelevant since R580 and during R520 before that so can you really blame NV for overlooking RV770 wonder? They became arrogant, sure, but it's not like they had no reasons for that.
Who would expect anything less of NVidia?

9800GX2 was launched as an answer to 3870X2 and as far as i know GT200 wasn't ready by the time they launched 9800GX2 (production ramp takes some time so even of GT200 was ready to go in production in March it doesn't mean that it was ready to go to market in March). But i have to say that 9800GX2 launch was always considered to be a mistake by me and some of my collegues.
Well, that was a mistake founded on the hames NVidia made of 65nm. And the shenanigans where G92 was launched to fight RV670 when in fact G94 would have done the job (but G92 seems to have been late so ...). And by launching G92 when they did they royally fucked their load-out with all that "it's an 8800, no it's not that GTS, no it's a 9800, no it's really a +" bullcrap.

You may launch a product a month or so after the production start although this won't be normal.
That only works if you have a stock of wafers already, e.g. speculatively made. Since wafers take so long to process that stock needs to cover the gap between the speculative/pre-production batch and full production, or you'll have an embarrassing hole in availability.

Though the irony is that demand for GT200 was so low after launch that taking this approach would have worked out for NVidia.

But the fact that GTX260 was not on sale at the same time means GT200 was not in any meaningful way "waiting for RV770".

So this 6 months figure for GT300 from tapeout to market is actually very close to the timing of GT200. December-June is 6 months after all even if NV may have launched GT200-based products earlier than it did.
Except that you've been trying to assert that GT200 could have been, oh, I dunno, 4 months from tapeout to launch? You refuse to put a hard date on this, so I dunno.

As I say, all you've convinced me of is that GT200 might have been a week or 2 in hand and that's a generous interpretation.

Jawed
 
It's very simple, they rushed GTX280 by air and let GTX260 arrive by ship. This is the hallmark of a rushed launch.
Again: rushed GTX launch has nothing to do with when the GT200 was ready to go into production.
GTX275 uses GT200b which is selling since the end of last year and it's still 'rushed launch' by your terms.
The simple thing to understand here is that both GTX280/260 and GTX275 were answers to new AMD cards. And in both cases NV used chips which were ready to be used for some time at the moment those launches happened.
Is it so hard to understand this?

What I don't understand is how long "ready to go into production" means?
It's a state, not a process.

Who would expect anything less of NVidia?
Let's begin the usual general NV bashing? OK - GT200 was bad!

Well, that was a mistake founded on the hames NVidia made of 65nm. And the shenanigans where G92 was launched to fight RV670 when in fact G94 would have done the job (but G92 seems to have been late so ...). And by launching G92 when they did they royally fucked their load-out with all that "it's an 8800, no it's not that GTS, no it's a 9800, no it's really a +" bullcrap.
Actually, G92 was available faster then they thought it would.
G94x2 would do the trick in countering 3870X2, i agree.
Marketing names are marketing names - shock!

That only works if you have a stock of wafers already, e.g. speculatively made. Since wafers take so long to process that stock needs to cover the gap between the speculative/pre-production batch and full production, or you'll have an embarrassing hole in availability.
Well they probably had some wafers of GT200 from the experimential production. Maybe they used them for hard-launching GTX280 low volume part but there wasn't enough bad GT200s for launching GTX260 off them.

But the fact that GTX260 was not on sale at the same time means GT200 was not in any meaningful way "waiting for RV770".
I don't see how's that means anything like this.
If anything the whole "simultaneous" GT200/RV770 launch thing screams that someone was waiting to see what's the other one will put to market. And from what i know (and logic should tell you) it wasn't AMD.

Except that you've been trying to assert that GT200 could have been, oh, I dunno, 4 months from tapeout to launch? You refuse to put a hard date on this, so I dunno.
Technically they could've launched it in April or May from what i heard.

As I say, all you've convinced me of is that GT200 might have been a week or 2 in hand and that's a generous interpretation.
It's not really my intention to convince you in anything...
 
I imagine this'll depend on who writes the OpenCL - from what I've seen so far of CUDA/Brook+ programming one has to tweak the algorithm extensively for the granularities of the architecture (if the memory system is at all dominant as a factor in performance).

Thing is, you can't run Cuda on ATi GPUs and you can't run Brook+ code on nVidia GPUs.
So you have no way to make a direct comparison whatsoever. OpenCL will work on both.

Besides, the point behind OpenCL should be: write once, run anywhere.
Ofcourse you'll always be able to tweak for a certain architecture, but that only goes so far.
With CPUs it's common to have multiple paths for various architectures. This can and probably will be done with OpenCL aswell, if required... But even if you have tweaked paths for CPU A and CPU B, there will only be one that is the fastest.
There are even plenty of examples of CPUs that were slower DESPITE having tweaked code. Tweaking code is no guarantee for making a CPU deliver best-in-class performance.

So the argument 'who wrote the code' probably won't hold in practice (except for developers with a hidden agenda). Decent programmers will either write blended code that avoids performance pitfall on both architectures, or they will supply you with two different versions, tweaked for either architecture.

Given those circumstances (which I deem to be both fair and realistic), I wonder if nVidia will be able to show an advantage in most tasks (I'm sure there will always be exceptions... even the mighty Core i7 can't win *every* benchmark out there). Because that's what they focused on in the past few years, and that's largely the reason why their chips are so much bigger than ATi's.
That's where the payoff needs to be.

A simple example is SGEMM. On NVidia you can only get decent performance if you use shared memory. On ATI you get better performance (in absolute terms), and if you use LDS then you get worse performance - ATI caches just work well enough that putting data into LDS slows things down. That's an extreme example, I'm sure - but still, it's entertaining.

Well, if ATi has performance problems with shared memory, that could be an issue... Shared memory is a standard feature of OpenCL. There could be a bit of payoff for nVidia there.

On the other hand, there's folding@home, which runs 2-3x better on NVidia because of shared memory. But this factor tails-off as the molecule increases in size.

Yea, funny how something like that can completely throw the performance per mm2-argument upside-down.
That's basically what I'm talking about here... Sure, in terms of graphics, nVidia seems to have dies that are 'too large'... But what if they are 2-3x faster than ATi's in OpenCL applications?
Then suddenly the die-size is completely justified, and ATi will look like underpowered, ineffecient outdated junk.
Not that I expect this to happen, but still... GPGPU may make nVidia's architecture look better than it does today.

In the experimentation I've been doing with Brook+ (which compiles to IL) and the resulting "experimentation" with compilation from IL into machine code, I see lots of immaturity and brokenness. For example machine code sometimes doesn't even pack scalar registers (it usually does) into vectors - vector registers are the only allocation possible, though they can be freely accessed as scalars. This wastes register allocation on a massive scale :oops:

So it seems to me that AMD's compilers are going to look immature. EDIT: I suspect we can see evidence for this in games like Far Cry 2 where performance takes months to get to where it should be.

That would be another payoff then... nVidia having invested quite some resources in the Cuda compilers over the past years.
 
Yeah as much as Nvidia "ignored" graphcs this time around you could say the reverse is true for AMD and GPGPU. By that same metric Intel is also ignoring graphics with Larrabee. But the two will eventually converge as graphics workloads incorporate more general compute algorithms. We aren't there yet though and as AMD demonstrated with RV770 there's still a lot of benefit to specialization.
 
Again: rushed GTX launch has nothing to do with when the GT200 was ready to go into production.
GTX275 uses GT200b which is selling since the end of last year and it's still 'rushed launch' by your terms.
The simple thing to understand here is that both GTX280/260 and GTX275 were answers to new AMD cards. And in both cases NV used chips which were ready to be used for some time at the moment those launches happened.
Is it so hard to understand this?
Where's the evidence, or even a decent rumour, that GT200 was able to launch at any time after March?

It's a state, not a process.
If that's true, how come you reckon it could have been launched in April or May. Make up your mind.

Let's begin the usual general NV bashing? OK - GT200 was bad!
Pathetic, yeah, NVidia's R600.

Actually, G92 was available faster then they thought it would.
G94x2 would do the trick in countering 3870X2, i agree.
Marketing names are marketing names - shock!
The last 18 months have been a fantastic testament to the downfall of NVidia's marketing. It's just lucky that G92 was so good and cheap that NVidia's inept marketing didn't spoil the party. G92 gave them their best ever quarter.

Well they probably had some wafers of GT200 from the experimential production. Maybe they used them for hard-launching GTX280 low volume part but there wasn't enough bad GT200s for launching GTX260 off them.
Or, maybe they thought that between 9800GTX at $300 and 9800GX2 at ~$450 and GTX280 at $650 they didn't need GTX260 :LOL:

I don't see how's that means anything like this.
If anything the whole "simultaneous" GT200/RV770 launch thing screams that someone was waiting to see what's the other one will put to market. And from what i know (and logic should tell you) it wasn't AMD.
OK, so "screams" is the fact we're debating, is it?

Technically they could've launched it in April or May from what i heard.
Any one else heard this?

It's not really my intention to convince you in anything...
Your own insistence that the codename "GT200" was complete nonsense, and we were all idiots for using that name:

http://forum.beyond3d.com/showthread.php?p=1146273#post1146273

was especially un-convincing :p

I think in the end you confuse something that NVidia has working internally with something that's ready for production. When something tapes out there should be chips available to play with well before 6 months are up. But that doesn't mean that 6 months isn't the usual interval between tape out and shelf, or that the plan is to be able to launch a chip 3 months after tapeout.

We know that an enthusiast class chip scheduled for no later than Q4 2007 was cancelled and GT200 took its place. Its reasonable to assume, in light of that, that NVidia launched GT200 as soon as it could. NVidia was losing revenue with a large population of 8800GTX owners who had nothing they wanted to buy ;)

If you could offer a convincing theory and maybe find some other nuggets then good. e.g. "GT200 was able to launch so rapidly after the abandonment of G100 (as I like to think of it) because GT200 was nothing more than G100 with double-precision ALUs bolted on. So, being a relatively minor design change meant that GT200 was able to go from concept to shelf extremely rapidly." I'd have some sympathy with that.

But currently all we have is you asserting that NVidia can go from tapeout to shelf in 4 months. Fine, lets see them do it... The inside info I have is they need to be far faster than that if tapeout is really happening in June, but what's my inside info worth, eh?

And, well, a rumour of tapeout really isn't worth much. Damn, R600. Damn, R520.

Jawed
 
Thing is, you can't run Cuda on ATi GPUs and you can't run Brook+ code on nVidia GPUs.
So you have no way to make a direct comparison whatsoever. OpenCL will work on both.
I didn't make a direct comparison :???:

Besides, the point behind OpenCL should be: write once, run anywhere.
Ofcourse you'll always be able to tweak for a certain architecture, but that only goes so far.
Famous last words.

With CPUs it's common to have multiple paths for various architectures. This can and probably will be done with OpenCL aswell, if required...
That OpenCL Quick Reference Card and the Jumpstart Guide that I linked in the other thread are very informative. OpenCL expressly provides options/pragmas for platform targetting.

So the argument 'who wrote the code' probably won't hold in practice (except for developers with a hidden agenda). Decent programmers will either write blended code that avoids performance pitfall on both architectures, or they will supply you with two different versions, tweaked for either architecture.
Developers don't need a hidden agenda, merely cash/free hardware and code that's been re-written for them.

Given those circumstances (which I deem to be both fair and realistic), I wonder if nVidia will be able to show an advantage in most tasks (I'm sure there will always be exceptions... even the mighty Core i7 can't win *every* benchmark out there). Because that's what they focused on in the past few years, and that's largely the reason why their chips are so much bigger than ATi's.
RV770 is functionally equivalent to any of NVidia's CUDA capabilities as far as I can tell. Perhaps you'd like to indicate why they're bigger?

I can think of two things: they've built a hideously complicated instruction scheduler that can sometimes run some code faster and they've built a double-precision ALU with bloat.

That's where the payoff needs to be.
I've got my fingers-crossed that NVidia will do something really smart like dynamic warp formation, which will suddenly make their lard-arsed scheduler look really cool.

Well, if ATi has performance problems with shared memory, that could be an issue... Shared memory is a standard feature of OpenCL. There could be a bit of payoff for nVidia there.
I think you need to re-read what I wrote. SGEMM is faster on ATI (faster than any NVidia GPU) despite not using shared memory, because their caching (and register files) work so efficiently. NVidia's reliance upon shared memory, in this case, hinders performance. So, don't go generalising - I was merely illustrating the breadth of the "optimisation problem" that OpenCL presents - that an optimisation for a cache-efficient architecture can be contrary to an architecture that has crap caches.

You can't even discern the quality of ATI's shared memory in this case because you don't know what its absolute performance is.

Yea, funny how something like that can completely throw the performance per mm2-argument upside-down.
No it doesn't. The ATI client hasn't been written to use shared memory on RV770. So it means absolutely nothing.

GPGPU may make nVidia's architecture look better than it does today.
Well feel free to buy shares in NVidia.

That would be another payoff then... nVidia having invested quite some resources in the Cuda compilers over the past years.
As long as it's not undermined by significant changes. I suspect not, for what it's worth.

Jawed
 
Where's the evidence, or even a decent rumour, that GT200 was able to launch at any time after March?

How about the date on the chip?

The last 18 months have been a fantastic testament to the downfall of NVidia's marketing. It's just lucky that G92 was so good and cheap that NVidia's inept marketing didn't spoil the party. G92 gave them their best ever quarter.

It has nothing to do with marketing, if you have competitive products you end up an equilibrium. If AMD didn't go into a price war we wouldn't have seen such drastic changes in revenues outside of the the losses from the bad economy.


We know that an enthusiast class chip scheduled for no later than Q4 2007 was cancelled and GT200 took its place. Its reasonable to assume, in light of that, that NVidia launched GT200 as soon as it could. NVidia was losing revenue with a large population of 8800GTX owners who had nothing they wanted to buy ;)

If you could offer a convincing theory and maybe find some other nuggets then good. e.g. "GT200 was able to launch so rapidly after the abandonment of G100 (as I like to think of it) because GT200 was nothing more than G100 with double-precision ALUs bolted on. So, being a relatively minor design change meant that GT200 was able to go from concept to shelf extremely rapidly." I'd have some sympathy with that.


what if the g100 is the gt200? nV's naming convention has been changing almost every gen now.

But currently all we have is you asserting that NVidia can go from tapeout to shelf in 4 months. Fine, lets see them do it... The inside info I have is they need to be far faster than that if tapeout is really happening in June, but what's my inside info worth, eh?

4 months is enough but which tape out, A1 or A2?

And, well, a rumour of tapeout really isn't worth much. Damn, R600. Damn, R520.

rumours are rumours, but the r600 was ready when? the date on the chip doesn't lie. when was it released?
 
RV770 is functionally equivalent to any of NVidia's CUDA capabilities as far as I can tell. Perhaps you'd like to indicate why they're bigger?

Functionally perhaps but we have no solid evidence of RV770 prowess in compute applications. Whereas there's a relative wealth of those for CUDA. So unless RV770 can put up...well you know the rest.

I can think of two things: they've built a hideously complicated instruction scheduler that can sometimes run some code faster and they've built a double-precision ALU with bloat.

I've got my fingers-crossed that NVidia will do something really smart like dynamic warp formation, which will suddenly make their lard-arsed scheduler look really cool.
Not sure why you hate on this approach so much. It's obviously more flexible and has fewer corner cases than AMD's pre-determined clause scheduling that maps much better to traditional graphics workloads. Fine-grained scheduling requires more hardware yes but it's probably the way of the future. And as I said above, where are all the compute apps that demonstrate the viability of AMD's approach?
 
I didn't make a direct comparison :???:

I didn't say you did, just that it's very hard to get a good idea of how GPGPU relates if you can only compare Cuda to Brook+, with both using a different programming model and using different compilers etc. OpenCL will even the playing field in many respects, just as OpenGL/D3D did for regular graphics tasks.

RV770 is functionally equivalent to any of NVidia's CUDA capabilities as far as I can tell. Perhaps you'd like to indicate why they're bigger?

Well, first the obvious:
RV770 is functionally equivalent to G80, a chip that is 2 years older!

Then the not-so-obvious:
Functionally equivalent means nothing.
Phenom II is 'functionally equivalent' to Core i7. They are both built on a 45 nm process, and both are more or less the same size. Yet Core i7 is loads faster.
It's all in the implementation, which is far superior in Core i7.

In nVidia's case, the chip is larger because of the implementation they chose... We are now about to find out if this implementation is going to pay off or not, in GPGPU tasks.

I've got my fingers-crossed that NVidia will do something really smart like dynamic warp formation, which will suddenly make their lard-arsed scheduler look really cool.

I think you need to re-read what I wrote. SGEMM is faster on ATI (faster than any NVidia GPU) despite not using shared memory, because their caching (and register files) work so efficiently. NVidia's reliance upon shared memory, in this case, hinders performance. So, don't go generalising - I was merely illustrating the breadth of the "optimisation problem" that OpenCL presents - that an optimisation for a cache-efficient architecture can be contrary to an architecture that has crap caches.

You said ATi was slower than nVidia when shared memory is involved.
Now I am willing to go as far as stating that shared memory will be a very important tool in many GPGPU algorithms (unlike graphics).

You can't even discern the quality of ATI's shared memory in this case because you don't know what its absolute performance is.

Yea, it's a shame ATi doesn't have any actual software out there.

No it doesn't. The ATI client hasn't been written to use shared memory on RV770. So it means absolutely nothing.

No, it does mean something:
ATi didn't have shared memory until recently. nVidia had it for years. nVidia is now reaping the benefits.
And it proves my point that there are GPGPU algorithms that can benefit significantly from fast shared memory.
Question remains: if the ATi client is rewritten to use shared memory, will it once again become competitive?
I wouldn't be surprised if they can't quite close the gap.

Well feel free to buy shares in NVidia.

Oh please, as if your bias wasn't showing enough already.
This remark was completely uncalled for, and makes you look like a silly frustrated fanboy.

As long as it's not undermined by significant changes. I suspect not, for what it's worth.

Nope, unlike ATi's Stream.
 
The performance per fixed-function unit of GT200 is appalling. 80 TMUs and 32 ROPs can barely beat RV770's halved configuration for both these things. NVidia's math is also considerably slower per transistor, particularly double-precision.
While I won't argue with you about the stciker-speed of theoretical DP throughput (but I'd love to see some real world numbers on real applications), especially TEX is a different beast since Catalyst AI saves them TUs quite some (mega)bits of fetching and filtering. Without the filtering stuff, some texturing benchmarks show the expected numbers wrt to frequency and unit count on both architectures.
 
What does Catalyst AI (not) do?

My guess is they secretly change textures to 16-bit or compressed formats and such, after 'analysis' has determined that the visual difference is negligible.
So that means when you THINK you're benchmarking 32-bit uncompressed texture operations, you're actually doing less.

At least, that's a trick that's been done for years by various vendors.
 
So it seems to me that AMD's compilers are going to look immature. EDIT: I suspect we can see evidence for this in games like Far Cry 2 where performance takes months to get to where it should be.

Jawed

I have yet to see a case where a driver update carried more general improvements than hand tuning for specific game content, i.e. improving more general shader perf (and that's true for both AMds and Nvidias drivers).
 
Status
Not open for further replies.
Back
Top