NVIDIA Fermi: Architecture discussion

I looked at that article. And I'm no Charlie fan. But did he actually say that? I didnt see it. It'd be pretty obsurd.

Nope, I didn't. I am simply saying that TSMC is making way too many mistakes that shouldn't be made. This means they are being less than honest about the reasons for it, or there is something else going on in the background.

I talked to a bunch of semi guys, and all said they never heard of that type of error, or that it was an early bringup error, not a mid-process error. All were rather astounded that it didn't get caught immediately, metrology should have nailed it at the first check, two or three in at the worst. No one had an explanation for how this could have happened.

-Charlie
 
Besides that I'm not willing to believe that TSMC is the convenient scapegoat for all possible problems either. If I'd point fingers I'd rather point three for TSMC/NVIDIA/AMD for varying degrees of mistakes that could lead to such low supplies.

The obvious mess right now does not sound to me like it's by 100% TSMC's responsibility. Now I can't obviously say or even estimate where each party could have gone wrong, but I can't believe either that easily that TSMC is the only one to blame here.

I would blame ATI and NV, except for the minor point that ATI yields were fine, then they tanked for no particular reason. I am not sure about NV yields, I will ask next time I get the chance. It isn't a design problem at that point. :) Do you have a good explanation as to why things going along, then suddenly tank could be an ATI or NV issue?

-Charlie
 
It probably is, for the workstation/GPGPU market. For the gaming market POV, I think it's more a "who cares" situation.

No, from the gaming point of view, it is a lot of transistors that are not driving frame rates up. It is a huge loss from an areal efficiency point of view.

If you NEED it, you need it. If you don't, it is a millstone. I would assume that more than 99.9% of users don't NEED it.

-Charlie
 
Of course, but I only remember AMD promoting their HD 48xx series with the "over 1 TeraFLOP of power" thingy, as if it mattered. Obviously for Regular Joe, that doesn't even know what that is, it may caused some appeal :)

It was Michael Hara that started it before G92 came out, he already mentioned that GT200 would be close to 1TFLOP/s (May 2007.) Later they promoted that their Tesla board would get between 1 and 1.33 TFlOP/s. they just got 1-UPed (remember the moon landing picture.) This was all posted by Arun.

http://www.howtofixcomputers.com/fo...rformance-1-1-33-tflop-155136.html#post633554

And before that in 2005 NV was pushing FLOPs hard for RSX. Sony and MS were both caught up in the game and there were endless threads concerning FLOPs and utilization. FLOPs were also marketed with NV2A. (Of course triangles and before that MIPs were also common fodder). Anyhow, the FLOPs game has gone round and round in GPU circles for a while. Anyhow, NV isn't some innocent in the FLOPs issue. They have routinely used them as a metric to measure their chips against the competition. The question as always is how relevant that PR metric is.
 
The question as always is how relevant that PR metric is.

I'm sure relevance goes hand in hand with your performance in that metric relative to your competitor at the time. If your up, promote it, if your not, downplay it. Nvidia seem to be experts in this type of PR see-saw.
 
Respective generational architectural progressions though, relative FLOPs can be read and extrapolated.
2x flops for Cypress compared to RV790 turned into like 1.4x performance increase in games. So even in this case extrapolating anything from RV790 flops would probably give you wrong results. And GF100 has far less in common with GT200 than Cypress in regards to RV790 which means that extrapolating from flops here is even more pointless.
 
2x flops for Cypress compared to RV790 turned into like 1.4x performance increase in games. So even in this case extrapolating anything from RV790 flops would probably give you wrong results. And GF100 has far less in common with GT200 than Cypress in regards to RV790 which means that extrapolating from flops here is even more pointless.

Wow, you have a game that ONLY requires FLOPS on the GPU and nothing else? What level are you on?

-Charlie
 
No, from the gaming point of view, it is a lot of transistors that are not driving frame rates up. It is a huge loss from an areal efficiency point of view.

If you NEED it, you need it. If you don't, it is a millstone. I would assume that more than 99.9% of users don't NEED it.

-Charlie

Must be like that tesselation unit, since the R600 days. It wasn't used by anyone, but you promoted it like it was the best thing ever! They must've NEEDed it :)
 
Is tesselation a big area consumer? The Xbox 360 GPU has tesselation; the Xenos design and the way tesselation was implimented made it a relatively cheap iirc. Dave?
 
Must be like that tesselation unit, since the R600 days. It wasn't used by anyone, but you promoted it like it was the best thing ever! They must've NEEDed it :)


The tesselator is an absolutely tiny transistor budget compared to the kind of stuff Nvidia are claiming for Fermi, and at least the tesselator gets used for DX11. That massive Nvidia transistor investment in things like double precision and ECC don't benefit the gaming community. If you're only interested in HPC, they are great - if you're interested in gaming, they are a big fat albatross around your neck.

Unless of course Nvidia is going to spend time and money designing a whole new chip that's effectively a cut down Fermi design with massive fundamental changes.
 
The tesselator is an absolutely tiny transistor budget compared to the kind of stuff Nvidia are claiming for Fermi, and at least the tesselator gets used for DX11. That massive Nvidia transistor investment in things like double precision and ECC don't benefit the gaming community. If you're only interested in HPC, they are great - if you're interested in gaming, they are a big fat albatross around your neck.

Unless of course Nvidia is going to spend time and money designing a whole new chip that's effectively a cut down Fermi design with massive fundamental changes.

Not really. The tesselator unit did squat for the HD 2000 and HD 3000 line. I'm not sure they even kept it in the HD 4000 line, which is what would make sense. Only now is it (or will it) be used.

As for Fermi, Bill Daily already said as much. ECC is not a feature for the GeForce line and he even mentioned that they could reduce DP FLOPS capacity for the GeForces aswell. Tesla chips and GeForce chips based on Fermi are different.
 
As for Fermi, Bill Daily already said as much. ECC is not a feature for the GeForce line and he even mentioned that they could reduce DP FLOPS capacity for the GeForces aswell. Tesla chips and GeForce chips based on Fermi are different.
So when is nvidia planning on taping out this other chip?

-FUDie
 
Not really. The tesselator unit did squat for the HD 2000 and HD 3000 line. I'm not sure they even kept it in the HD 4000 line, which is what would make sense. Only now is it (or will it) be used.
It is there in HD 4000 series, in fact it was improved upon.
 
So when is nvidia planning on taping out this other chip?

-FUDie

That's certainly something only NVIDIA can answer, but the tape out for the GeForce chip should be the one we've been hearing about. The one that's slated for Q1 2010.
The "bigger" version, the one for Tesla with all the features designed for the HPC market, is slated for May 2010.
 
Not really. The tesselator unit did squat for the HD 2000 and HD 3000 line. I'm not sure they even kept it in the HD 4000 line, which is what would make sense. Only now is it (or will it) be used.

But the tesselator is a very small and discrete part of the AMD chips. The "non-useful to gamers" parts of Fermi are much larger and much more fundamental to it's design. The millstone that Fermi carries for gamers is much heavier, and much harder to work around in it's gamer parts (not that we've seen any indication of such parts so far).
 
That massive Nvidia transistor investment in things like double precision
I have my doubts it's massive. The ALU design appears to be fp32 + int32. Two of these, i.e. 2xfp32 + 2x int32 appear to satisfy the bulk of the DP implementation (i.e. will compute one MAD). If NVidia's decided to keep the full-range subnormal support, then a wodge of extra bits are required for the adder - but adder bits come pretty cheap comparatively speaking.

and ECC don't benefit the gaming community.
NVidia's sacrificing bandwidth/capacity in order to implement off-die ECC.

On-die memories presumably have ECC always-on. That's a memory cell overhead very much in the same ballpark as redundancy. Redundancy for memory cells is pretty bloody cheap as memory is super-dense - and the addressing overhead is nearly non-existant, too.

So the real cost of ECC is the logic to compute the checksums, signal problems and perform corrections. I don't remember any specification of its strength and I don't know how much these things cost.

Unless of course Nvidia is going to spend time and money designing a whole new chip that's effectively a cut down Fermi design with massive fundamental changes.
No need in my opinion.

NVidia can save the cost-cutting techniques for the smaller chips, just like AMD has restricted DP to RV870.

Jawed
 
Can someone fix the handling of pasted URLs, it's broken as this shows:

Transparent error correction code memory system and method

For the real link go here:

http://v3.espacenet.com/publication...=B1&FT=D&date=20061003&DB=EPODOC&locale=en_EP

The present invention provides flexible and efficient memory configuration that is capable of economically addressing both resource consumption and ECC concerns. A memory system facilitates transparent ECC operations without dedicated ECC connections. A first dynamic random access memory structure stores data, wherein the data connections to the memory system are limited to the width of the first dynamic random access memory structure. A second dynamic random access memory structure dedicated to storing error correction code information, wherein the error correction code information is accessed via the data connections. In one exemplary implementation, the first memory structure and the second memory structure the data and ECC are included in the same memory bank. In an alternate implementation, the first memory structure and the second memory structure the data and ECC are included in the different memory banks and are accessed in parallel.
 
Back
Top