David Kirk finally admits weak DX9 vs DX8 performance - GeFX

Anyone here thinks (hence excluding those that know and won't answer anyway ;)) that ATi will stick with FP24 for R4xx to kill NV4x in speed? It sounds logical to me, unless they didn't find any other use than FP32 for the die space 0.13 buys! :eek:
 
MfA said:
You guys make it sound like Microsoft settled on 24 bits before any line of HDL was written, I have a hard time believing that ... want to mention some timeframes when ATI "and" Microsoft settled on 24 bit?
ATI had to settle on 24-bit FP about two years before the release of the R300. Microsoft may or may not have had any part in that decision.
 
nyt said:
Anyone here thinks (hence excluding those that know and won't answer anyway ;)) that ATi will stick with FP24 for R4xx to kill NV4x in speed? It sounds logical to me, unless they didn't find any other use than FP32 for the die space 0.13 buys! :eek:

I'm almost sure that R420 will be FP24 and that ATI won't adopt FP32 until their next new core. R420 VS NV40 will be interesting I think :) not like what we are seing nowadays : I'm tired of all this R3x0 vs NV3x stuff :( improvements from the original R300 are so small.
 
thing is, i can't help but think it would be the smartist move they could make technology wise. it would be a pr nightmare though.
 
Chalnoth said:
MfA said:
You guys make it sound like Microsoft settled on 24 bits before any line of HDL was written, I have a hard time believing that ... want to mention some timeframes when ATI "and" Microsoft settled on 24 bit?
ATI had to settle on 24-bit FP about two years before the release of the R300. Microsoft may or may not have had any part in that decision.

Nope. Sireric revealed in another thread (1 or 2 months ago) that:

a) MS did not make the final decision on DX9 precisions until "about a year before the launch of NV30", i.e. Q1 02, only two quarters ahead of the launch of R300.

b) ATI would have been able to accomodate a decision in favor of FP32 without any hit to R300's release schedule or performance (but with the obvious die size cost)
 
Dave H said:
b) ATI would have been able to accomodate a decision in favor of FP32 without any hit to R300's release schedule or performance (but with the obvious die size cost)
I find that very hard to believe. ATI would have had to have been simultaneously designing both cores, and even then there would be significant complications. In particular, ATI was already pushing the limits of the .15 micron process with the R300 core. Any reasonable increase in die size would have been a major setback.
 
but most of the r300 core does support 128bit color eh? so then it is just the shader units which would need to be modifed to handle higher precision.
 
kyleb said:
but most of the r300 core does support 128bit color eh? so then it is just the shader units which would need to be modifed to handle higher precision.
No. The vertex units obviously have to support 32-bit FP, but other than that, only the FP texture input and output support 32-bit.

The main problem with switching to FP24 would be that with the pixel shader units increased in size by some amount, everything would have to be shifted around.
 
Pete said:
Oh, come on now, Dave. Game pics rendered using ATi's ASCII shader don't count as part of the article text. :p

<laughs> Bastard! You made me choke on my Dr. Pepper thinking about that! :LOL: :LOL:
 
Chalnoth said:
kyleb said:
but most of the r300 core does support 128bit color eh? so then it is just the shader units which would need to be modifed to handle higher precision.
No. The vertex units obviously have to support 32-bit FP, but other than that, only the FP texture input and output support 32-bit.
Right, everything but the fragment shader registers and functional units.

The main problem with switching to FP24 would be that with the pixel shader units increased in size by some amount, everything would have to be shifted around.
Placement and routing is unlikely to be a particularly lengthy process for modern GPU development; in fact, I'd be surprised if it wasn't almost entirely automated. It's an interesting problem in bleeding-edge CPU design, where large pieces of logic have custom designed circuits that play tricky games to eek out the last bit of clockability. But not in GPUs that are built (as I understand it) almost entirely with standard cells, run at a significantly lower clock speed (almost an order of magnitude), and have a tremendous number of clock stages, thus significantly reducing the difficulties due to clock skew and making large scale changes in layout much easier to incorporate without affecting the rest of the design.

So: yes, everything would have to be shifted around; but no, that wouldn't necessarily delay the part at all, even at so late a time frame.

In particular, ATI was already pushing the limits of the .15 micron process with the R300 core. Any reasonable increase in die size would have been a major setback.
There are no such "limits" to "push". Rather the problem is just that, as die size increases, yield-per-wafer decreases super-linearly, because each die takes up a larger fraction of the wafer, plus each die is more likely to contain a fatal defect due to its larger size. That is, a higher transistor count for R300 would have meant slightly higher than linear increase in costs per good die. Whoop-dee-doo. This would eat into ATI's margins a bit, certainly, but in no way would have made R300 impossible as a .15u part.

I find that very hard to believe.
Well I was surprised to hear it too, but considering sireric was part of the team that actually designed the damn thing he presumably knows what he's talking about, wouldn't you think?
 
Dave H said:
Right, everything but the fragment shader registers and functional units.
No.
It would be better to say that to move to FP32 in the fragment shader, all that would need to change are the PS math units and the registers. That doesn't mean the rest of the chip is FP32.

And that further doesn't mean that it wouldn't be a significant challenge to change from FP24 in one portion of the chip to FP32.

Placement and routing is unlikely to be a particularly lengthy process for modern GPU development; in fact, I'd be surprised if it wasn't almost entirely automated.
Of course it's mostly automated. But we've also heard that the R300 core was "hand tweaked," and you have no idea how much processing power it actually takes to figure out the optimal path. A huge number of variables need to be taken into consideration. With more than a hundred million transistors, the R300 is no simple electronic circuit.

We also know that it has typically taken at least six months for a minor redesign of a chip to ship to the public. In the case of ATI, they've never shipped a "refresh" part with anywhere close to the architectural change as is being described here.

I contend that in modern processors, tracing and routing is the primary obstacle that must be overcome by engineers. It's not quite as simple as placing multiple "pieces" together, as you still have to deal with things like signal noise and synchronization. While I am not involved in the design of modern processors, I have taken enough physics to understand that particularly at the scales modern chips are being built, there are significant problems to be overcome that are not easily solved for such large systems.
 
Chalnoth said:
While I am not involved in the design of modern processors, I have taken enough physics to understand...

Stick to the dark matter, C. 8)

Dave H said:
considering sireric was part of the team that actually designed the damn thing he presumably knows what he's talking about
 
SvP said:
Dave H said:
considering sireric was part of the team that actually designed the damn thing he presumably knows what he's talking about
And Sireric wasn't the one that just stated that the R300 could have been switched to FP32 with no delay. That was Dave H.

I'm saying that that sounds highly dubious, and would require that the development team be simultaneously working on two different designs. Every time they would want to test a chip with the fab, they would need to send two designs instead of one.

I just don't believe that's the case, beyond the obvious possibility that a move to FP32 would cause additional heat, and therefore clock speed, problems.

And so I'm saying that I want to see the exact quote. I think Dave H misread Sireric, or Sireric himself was mistaken on how easy of a time his design team would have had.
 
Ok, I misremembered a bit: sireric actually said the decision was made final 1.5 years before NV30 launched, or roughly 1 year before R300 launched. Still not anything like the two years Chalnoth (and I, before sireric set me straight) posited. (Of course I was off by a factor of two as well saying it was 6 months and not a year.)

And he did specifically say that the R300 design had not committed to FP24 at that point.

sireric said:
Dave H said:
Or, since you might not be able to speak to Nvidia's design process: was R3x0 already an FP24 design at the point MS made the decision? If they'd gone another way--requiring FP32 as the default precision, say--do you think it would have caused a significant hit to R3x0's release schedule? Or if they'd done something like included a fully fledged int datatype, would it have been worth ATI's while to redesign to incorporate it?

No, it wasn't. I don't think FP32 would of made things much harder, but it would of cost us more, from a die cost.

Read the whole post, it deals with all these issues quite specifically. In fact, read the whole thread, it was a good one.
 
By the way, he said "at least 1.5 years before NV30," so I expect it was more than that.

Regardless, one year before launch is much more believable, but I would still expect there would have been a bit of a delay if they had gone FP32 at that time. In particular, I still think that the R300 itself was already stretching the .15 micron process, so any increase in die size would have made it more challenging to produce the processor in terms of yields and clockspeeds.

And remember that he said he doesn't think going FP32 would have made things much harder. He didn't say it wouldn't have hurt their release schedule at all.
 
Chalnoth said:
In particular, I still think that the R300 itself was already stretching the .15 micron process, so any increase in die size would have made it more challenging to produce the processor in terms of yields and clockspeeds.

No reason why it should have impacted clockspeeds, assuming extra clock stages were added to keep the slightly longer FP32 execution unit path out of the critical path. As I said, it would impact yields modestly (particularly in terms of good dies per wafer), but it's not as if there's some die size "limit" to the .15u process above which yields suddenly fall off a cliff.
 
Dave H said:
No reason why it should have impacted clockspeeds,
1. Adding more stages would add even more transistors.

2. More transitors in use = more heat.

Obviously it wouldn't have been an insurmountable problem to have the same clock speeds if the move was made to FP32, but it's not a trivial one.
 
Dave H said:
Chalnoth said:
In particular, I still think that the R300 itself was already stretching the .15 micron process, so any increase in die size would have made it more challenging to produce the processor in terms of yields and clockspeeds.

No reason why it should have impacted clockspeeds, assuming extra clock stages were added to keep the slightly longer FP32 execution unit path out of the critical path. As I said, it would impact yields modestly (particularly in terms of good dies per wafer), but it's not as if there's some die size "limit" to the .15u process above which yields suddenly fall off a cliff.
But that is one of the ways yields go down. It's just probability...

For example, let's assume that for a particular process/fab/machine there is a (pessimistic) 99.9% chance that a particular square millimetre of silicon is error free. The chance that a small 50mm^2 chip works is then (0.999^50) = 95%. A chip that is just 50mm^2 bigger is only likely to work in 90% of cases and a 150mm^2 chip will only work 86% of the time. Another 50mm^2 and you're down to 80%.

When you combine the facts that (a) as the chips increase in area you are getting fewer off each wafer and (b) that there's a diminishing chance that each will work, you get a rapid rise in the effective cost per working chip.
 
Back
Top