David Kirk finally admits weak DX9 vs DX8 performance - GeFX

Zeross said:
nyt said:
Anyone here thinks (hence excluding those that know and won't answer anyway ;)) that ATi will stick with FP24 for R4xx to kill NV4x in speed? It sounds logical to me, unless they didn't find any other use than FP32 for the die space 0.13 buys! :eek:

I'm almost sure that R420 will be FP24 and that ATI won't adopt FP32 until their next new core. R420 VS NV40 will be interesting I think :) not like what we are seing nowadays : I'm tired of all this R3x0 vs NV3x stuff :( improvements from the original R300 are so small.

I'm thinking that R420 will use FP32 because the pixel shader and vertex shader are going to be unified, right? There's only going to be one shader core, with full texture access, and it will work on some vertices, buffer them, then rasterize them and switch to pixel shading, and so forth. Since for vertices both ATI and NVidia have always used FP32, I don't see ATI reducing this precision, especially since precision issues can be more easily noticed as a displacement error rather than a colour error.

This will definately make efficiency much higher, as most of the time only one of the vertex or pixel shaders is working near it's peak. When drawing large triangles, the VS is waiting. When drawing small ones, the PS is waiting. I think 16 Vec4 FP32 general shaders as opposed to 4 Vec4+scalar FP32 and 8 Vec4 FP24 shaders is not going to be that much higher silicon cost considering it is a next generation part, and we should get twice the peak performance in pixel shading, 4x in vertex shading, plus fully featured vertex texturing.

I'm definately looking forward to these next gen parts.
 
Mintmaster said:
I think 16 Vec4 FP32 general shaders as opposed to 4 Vec4+scalar FP32 and 8 Vec4 FP24 shaders is not going to be that much higher silicon cost considering it is a next generation part.

Area of multiply arrays scale with the width^2. Going from FP24 to FP32 increases the mantissa with 50% and hence the area of the multipliers with 125%. This, -and adding another 4 shaders seems unrealistic to me. More likely 12 unified shaders (if unified at all, DX10 seems a long way off).

Cheers
Gubbi
 
I hope you're right, Minstmaster, I hope you're right.

As for the extra die space required, Gubbi, remember that we don't know exactly what percentage of the total die space the multipliers require.
 
Mintmaster, you seem to be confusing R400 and R420. Common confusion of course.

Just wait till the NV50 comes, THEN we'll see serious FP32 power coupled with astonishing flexibility and finesse.
Oh wait, wasn't that the NV30? :devilish: ;)


Uttar
 
Uttar said:
Just wait till the NV50 comes, THEN we'll see serious FP32 power coupled with astonishing flexibility and finesse.
Oh wait, wasn't that the NV30? :devilish: ;)
:LOL:

You have no idea how much I'm looking forward to your article! ;)
 
digitalwanderer said:
Uttar said:
Just wait till the NV50 comes, THEN we'll see serious FP32 power coupled with astonishing flexibility and finesse.
Oh wait, wasn't that the NV30? :devilish: ;)
:LOL:

You have no idea how much I'm looking forward to your article! ;)

Well, frankly, there's not a single mention of the NV50 in it. I do explain the NV30 debacle's causes AFAIK in more detail though.

And, sorry, ULE has been delayed again. No kidding.
The reason? The person editing it seems so busy in RL that he didn't even reply to the PM including the URL to download the final, non-edited version *grins*

Which is surprisingly near to the URL I communicated you just kidding around - it was ule4.doc.
But sorry, it's gone now ;)


Uttar
 
I can proof read it for you.

In fact if you post it here...we all can proof read it for you :)
 
Yeah, yeah, MAYBE it's vapourware, but then again, MAYBE I'll reveal the REAL secret of Quack in it :devilish:

Oh wait, we know that already, damn :p

Hmm... No, but seriously...
Is it MY fault if the guy who's editing is VERY busy in RL?
*thinks a bit*

Oh wait, it IS my fault... *sigh* ( and don't try to figure out how that's possible, even a psychic couldn't guess it )

Ollo: Nah, it's really Uttar's Late Editorial :devilish: :D ;) :p


---

Okay, to compensate for me being evil and teasing you all, here's a small part of it:

Wired magazine illustrated some key elements of NVIDIA’s corporate culture amusingly well in their 10.07 issue, archived here: http://www.wired.com/wired/archive/10.07/Nvidia.html?pg=2

Huang is Nvidia's amiable patriarch, doling out equal doses of reassuring hugs and tough love. He roams the halls of company headquarters, chatting and laughing with workers, remembering the names of their spouses and asking after their children. But he has little tolerance for screwups. In one legendary meeting, he's said to have ripped into a project team for its tendency to repeat mistakes. "Do you suck?" he asked the stunned employees. "Because if you suck, just get up and say you suck." The message: If you need help, ask for it.

The first sentence cannot be overemphasized: NVIDIA employees do seem to respect good ole Jen Hsun a lot. But from my understanding, the suckage part frightens employees more than anything: what better way to operate, then, that to hide your mistakes?

The emperor has no clothes, but since he shoots the messenger, nobody’s gonna tell him he needs to change his wardrobe! They just lie, and give him the message he wants to hear: “Yes, Jen Hsun, next generation will be even better.†I talked of misinformation earlier in the editorial – well, it’s an appropriate description again, I’d even say better than ever. This time, though, it’s internal misinformation, and employees get so fed up they have to tell someone who understands the hypocrisy. Certainly one of the biggest sources of the leaks on sites like NFI.

That part was already edited a while ago FYI.



Uttar

P.S.: And I swear you won't get anymore of it before release :p
 
Uttar said:
Hmm... No, but seriously...
Is it MY fault if the guy who's editing is VERY busy in RL?
*thinks a bit*

Hey, I'm good with editing AND not busy right now--feel free to toss it over and I'll get right to work. :D The earlier it gets done, the more money you can extort from the likes of Dig to get it posted as quickly as possible. ;)
 
cthellis42 said:
Uttar said:
Hmm... No, but seriously...
Is it MY fault if the guy who's editing is VERY busy in RL?
*thinks a bit*

Hey, I'm good with editing AND not busy right now--feel free to toss it over and I'll get right to work. :D The earlier it gets done, the more money you can extort from the likes of Dig to get it posted as quickly as possible. ;)

LOL, yeah, Dig, how much do you give me to post it faster? :D

More seriously though...
The person in question generally posts on a nearly daily way on internet forums. His last postings I can find seem to be dated October 24th.

I received a read receipt for the messages I sent the 25, but no response. I did not receive a read receipt for the message I sent on the 27.

I do not believe however that I have any reason to worry, considering it took him over a week to correct Part 5 & 6 - he claims he doesn't go to the computer daily anymore due to RL, and the reasons I'm aware of for that are perfectly reasonable.

Although if a certain person responds and accepts, who knows, he might be moving to Canada - if so, that could delay the whole thing an awful lot more *grins* - although I admit it's unlikely.


And feel free to think it's my fault ;) :p


Uttar
 
Simple question: is there anywhere even a hint of a good number of highly experienced employees retiring during the NV25 era? I don't seem to be able to find anything about it so far.

It shouldn't be in anybody's interest to elaborate on a matter like that but according to what I know and MHO it played a very significant role in recent releases.
 
Re: David Kirk finally admits weak DX9 vs DX8 performance -

g__day said:
http://www.guru3d.com/article/article/91/2

"....
Mr. Kirk showed a slide illustrating that with the GeForce FX architecture, its DX9 components have roughly half the processing power as its DX8 components (16Gflps as opposed to 32Gflops, respectively).[/b] I know I'm simplifying, but he did point it out, very carefully. "

On editor´s day I asked David Kirk about this slide and he told me that the GFLOPS figures are a mistake. He wanted to indicate that the shader units can operate at 16 instr/clock and the new fp-units in NV35/36/38 are operating at 32 instr/clock. From a second source I got that this is simply derived from 4 components/vector x 4 units resp. 4 comp/vector x 8 of the newer units. Another slide made the comparison of 48 instr/clock in total against 32 instr/clock with the R300 (4 components/vect x 8 pipes), so Nvidia wins.

ATI´s own counting is (1tex op + 1 vec3 op + 1 scalar op) x 8 pipes, which gives 24 instr/clock (or 40 instr/clock with MAD as two operations), which sounds more reasonable than counting per-component-operations.

This shows, that even printed and technical looking informations direct from the leading technicians sometimes can be misleading.

Manfred Bertuch, c´t magazin fuer computertechnik
 
Ailuros said:
Simple question: is there anywhere even a hint of a good number of highly experienced employees retiring during the NV25 era? I don't seem to be able to find anything about it so far.

It shouldn't be in anybody's interest to elaborate on a matter like that but according to what I know and MHO it played a very significant role in recent releases.

Hmm, nope, sorry.
Although I don't know since my ex primary source was hired by NV slightly after the NV20 IIRC.

Probably not the best man to have realized that...


Uttar

P.S.: I'll check to see if I can get any info about that though.
 
Uttar said:
Mintmaster, you seem to be confusing R400 and R420. Common confusion of course.

So you're saying that R420 won't have this unified architecture? Do you know if they're going to add texture units to the vertex shader or give them access to the pixel shader texture units (or some other possibility I haven't thought of yet)?

I though R420 was basically the shader heart of R400 transplanted into R300's good balance of everything else, or something like that, but I really have no basis for this assumption. I knew a fair amount about R400 from working at ATI, but that was over a year ago, so lots would have changed.

It's a shame, though, if they don't get the shaders unified on hardware for the next generation. I guess R420 isn't as big of a step forward as I would like it to be, but on the bright side my R300 will avoid obsolescence a bit longer :)
 
Mintmaster: I wish I knew, eh. Best man to know this type of stuff right now should be Fudo IMO, although I don't know if he got that technical with the R420 yet...

Although something I heard, with kinda low reliability though, is that Loki/R420 is R400's VS finesse + R300's PS brute force. Of course, it's not that simple, but that the VS would have a lot more in common with the R400 than with the R300, and the other way around of the PS.

I do expect them to have the VS (ab)using the PS texture lookup units though :)


Uttar
 
Simon F said:
But that is one of the ways yields go down. It's just probability...

For example, let's assume that for a particular process/fab/machine there is a (pessimistic) 99.9% chance that a particular square millimetre of silicon is error free. The chance that a small 50mm^2 chip works is then (0.999^50) = 95%. A chip that is just 50mm^2 bigger is only likely to work in 90% of cases and a 150mm^2 chip will only work 86% of the time. Another 50mm^2 and you're down to 80%.

When you combine the facts that (a) as the chips increase in area you are getting fewer off each wafer and (b) that there's a diminishing chance that each will work, you get a rapid rise in the effective cost per working chip.
I understand that cost per working die increases superlinearly with die size. The point is that this dynamic exists over the entire die-size curve (although it is overshadowed by countervailing per-die costs, like packaging and testing, at the small die size end). There is no die size at which using .15u suddenly becomes infeasible; it just continues to get more expensive as die size increases. So yes, if R300 had used FP32 for the pixel pipeline that would have cut into ATI's margins somewhat; but no, it is in no way infeasible or even significantly more difficult to make such a chip.

While we're on the subject, as the vast majority of GPU transistors are taken up by multiple copies of a particular functional unit operating in parallel, GPUs would be amenable to the trick of including redundant copies of functional units as a way to hedge against bad chips due to random silicon defects. In doing so, you pay an extra penalty in die size--reducing the potential number of dies per wafer--but can reduce the probability of bad chips due to random defects in the silicon down arbitrarily close to zero.

AFAIK graphics IHVs don't make significant use of this technique (except in the very crude instance of the 9500 NP). Of course this is because it imposes extra design and testing burdens, plus the die size penalty I already mentioned. But that's precisely the point: if GPU die sizes were so big that random defects were catastrophically lowering yields, it would be worth it for the IHVs to make use of these techniques. The fact that it's not yet worth it goes to show how GPUs--even large die size chips like R300--already achieve decent yields.

Of course I'm not saying that die size isn't critically important to costs, or that ATI isn't significantly better off from a cost standpoint getting away with FP24 instead of being required to use FP32. But to suggest that an FP32 R300 wouldn't have been viable on .15 is going way too far IMO.
 
Back
Top