NVIDIA Fermi: Architecture discussion

That Fermi will fail in the GPU market is simple physics and clear eyed reality, fanboyism and bias are sideshows to that central fact.

In the GPU market, even if Nvidia experienced a miraculously smooth development/fabrication cycle with Fermi, it still starts with a physical size/superflous functions/384 bit interface albatross around it's neck compared to the 5800 series and so will inherently cost more per chip, per board and will have a lower power efficiency in GPU applications. That's built into the chip architecture.

So even in a best case scenario Nvidia faced an uphill battle in a cost/performance contest with Evergreen.

But Fermi's progress is far from a best case scenario ...

No leaked benchmarks ~= even in an optimized set-up the hard numbers aren't faring well against the 5870 ... nothing to crow about for sure or they would be crowing.

“We expect [Fermi] to be the fastest GPU in every single segment. The performance numbers that we have [obtained] internally just [confirms] that. So, we are happy about this and are just finalizing everything on the driver side” ~= they NEED, as fully as possible, to harness the GPGPU functions for gaming/benchmarks to make up for reduced SPE's and clocks - under no circumstances can they release a card that is at or close to par with the 5870, that is a PR nightmare and disastrous in nearly every way possible.

So AMD starts with a big advantage, the 6 month release differential increases that advantage, the reduced Fermi capabilities increases that advantage still further ... Fermi will NEVER be cost/performance competitive with the 5000 series, perhaps not even close to it.

Add in the coming AMD respins, the coming 6000 series, Bulldozer (by all reports ahead of schedule)and Fusion which will start with incorporating the 5000 architecture and upgrade to each new GPU architecture on the year following, and that AMD says the 6000 series will be a brand new architecture (and at 28 nm) - not likely to be good news for Nvidia on the GPGPU front and that fight unfolding WHILE AMD rules the cash rich GPU space from top to bottom - and Nvidia is looking at a very grim future scenario, one in which missteps and mistakes will be very costly.

And that's not even bringing Intel into the picture.
 
From a corporate perspective, competition is better defined in terms of fps/area. No idea how it is shaping up in that regard.
Isn't in more in terms of total board cost/(performace+features) vs the same from the other guys? You would ask if the wider bus and the added board complexity is worth it relative to the cost and performance gained vs what the other guys can do with 256bit.
 
Yep, and he makes a lot of sense.

That's why there's a lot of speculation (despite Nvidia's denials) that they are going primarily for HPC with mainstream markets now a secondary thing. They can't compete with ATI in the mainstream space (unless they are willing to effectively subsidise every chip), so they are sidestepping that battle and trying to carve out their own market in the form of GPU-based HPC. The alternative is to follow AMD's "smaller is more" design philosophy, or come up with something spectacularly ahead of the times. It looks like current processes or their designers are not up to the latter.

Nvidia have pretty much done the same thing with their chipset business. There were lots of denials there, and now it all seems to be concentrated on Ion and Tegra with the desktop chipset business effectively abandoned by Nvdia in the face of crippling competition from AMD and Intel. Nvidia can no longer charge a premium now that they no longer provide extra features and performance and Intel and AMD have eaten that business.

Nvidia has had to snake out into a different field, and it looks to me that they may be doing the same with Fermi and what seems to be a heavy focus on GPGPU/HPC markets. It carries extra cost for the mainstream market whilst not giving benefits to that market for that extra cost.
 
There must be some black magic inside Fermi GPU to beat HD5970 in games. No hardware DX11 pipeline, only dedicated shaders to emulate it via drivers. Expected reduction in core numbers from 512 to 448 (-12.5%) won't speed up this process and best batches of silicon gone in Tesla units are not good for GF family either. Whole GPU is optimized for HPC and big chunk of the transistor budget is spent in cache, DP FP units and other blocks that are dedicated to general-purpose computations and supporting high-level programming languages like C++. Having in mind this I can say that the transistor count that means something to FPS in games is actually greater in Cypress architecture then in Fermi's.
So do you still think that a single Fermi GPU will beat HD5970 in Dirt2 or BF BC2.
 
Add in the coming AMD respins, the coming 6000 series, Bulldozer (by all reports ahead of schedule)and Fusion which will start with incorporating the 5000 architecture and upgrade to each new GPU architecture on the year following, and that AMD says the 6000 series will be a brand new architecture (and at 28 nm) - not likely to be good news for Nvidia on the GPGPU front and that fight unfolding WHILE AMD rules the cash rich GPU space from top to bottom - and Nvidia is looking at a very grim future scenario, one in which missteps and mistakes will be very costly.
.

IMHO (layman perspective):

Series 6000, Bulldozer or even Larrabee are not the biggest threat for Fermi regarding GPGPU, but Fusion.

Fusion is said to have 480 ALUs and ~1 TFlop SP performance. Therefore the DP speed should be around 200 GFLops. On top, Fusion has access to the complete main RAM and 4 K10.5 cores.

If AMD made no big mistakes and included support for ECC and more than one HT3 interface it should be simple to plug 2 - 4 Fusion on a Server board and voila HPC performance en mass. Bandwidth could be a problem though.
 
If AMD made no big mistakes and included support for ECC and more than one HT3 interface it should be simple to plug 2 - 4 Fusion on a Server board and voila HPC performance en mass. Bandwidth could be a problem though.

As far as I know all their Opteron models support ECC, it would be weird if they suddenly didn't.
Main difference ofcourse being Fermi ECC being calculated on the GDDR chip and Opterons use an ECC chip on the DIMM.
 
That's why there's a lot of speculation (despite Nvidia's denials) that they are going primarily for HPC with mainstream markets now a secondary thing.

That's a nice theory except for the inconvenient fact that HPC cannot sustain Nvidia. I'm not seeing any evidence of Nvidia leaving the mainstream market. It's funny really to see people interpret expansion into new markets as a sign of abandoning existing ones. Last I checked Nvidia was still making more money off graphics cards than anyone else.
 
Main difference ofcourse being Fermi ECC being calculated on the GDDR chip and Opterons use an ECC chip on the DIMM.
ECC is calculated on the chip in all cases. There is no such thing like an "ECC chip on the DIMM". ECC DIMMs store 72 instead of 64 bits for every line. That's all.

The problem is that there are no way to build a 72 bit GDDR5 interface, so NVIDIA takes some of the 64 bits to do ECC. And that's why you have less total memory available when ECC is enabled.
 
Main difference ofcourse being Fermi ECC being calculated on the GDDR chip and Opterons use an ECC chip on the DIMM.

Yep and thats IMHO one of the most weird things. AFAIK ECC is normaly done with 9bit DRAM with the ninth bit used for error correction. Fermi uses normal GDDR5 with a 8bit "organisation" and access (multiples of 8). Therefore with 1 bit reserved for the error correction Fermi can, as far as I understand it (remember: layman perspective) only access data in 7bit increments. Thats seems strange and inefficient.
 
There must be some black magic inside Fermi GPU to beat HD5970 in games. No hardware DX11 pipeline, only dedicated shaders to emulate it via drivers.
There's dedicated hardware for DX11 in many places. There's still a setup engine and ROPs and TMUs. The tessellation part is the one that is believed to reside in a partially or mostly emulated state.
 
AFAIK ECC is normaly done with 9bit DRAM with the ninth bit used for error correction.
No. You couldn't even do error correction with 1 extra bit, only error detection for 1 bit errors. ECC provides 1 bit error correction and 2 bit error detection.

Therefore with 1 bit reserved for the error correction Fermi can, as far as I understand it (remember: layman perspective) only access data in 7bit increments. Thats seems strange and inefficient.
No it's not done that way.
 
That's a nice theory except for the inconvenient fact that HPC cannot sustain Nvidia.

Nvidia thinks they can grow the market and get into a new area at the ground floor. It's all "cloud computing" now. It's where they see the big profit margins. I dunno, maybe they are deluding themselves. Maybe they have nowhere else to go. Maybe they see themselves caught between the hard place and rock of AMD and Intel, and need to be creative about their future.

I'm not seeing any evidence of Nvidia leaving the mainstream market. It's funny really to see people interpret expansion into new markets as a sign of abandoning existing ones.

You mean like the rumours at the beginning of the year of Nvidia leaving the chipset business? How about their inability to supply their high end products, or their EOLing of products with no replacement but the MIA Fermi? Their major AIB manufacturers ceasing trading? There's a lot of stuff going on, and there's no doubt to me that Fermi as a mainstream product has been compromised for it to be Fermi as a HPC product. Part of those compromises have given Nvidia the crippling delays, poor yields and process problems.


Last I checked Nvidia was still making more money off graphics cards than anyone else.

Are they? There have been a lot of rumours about the cost of their very large chips and complex boards for quite some time, and supply seems to be in constraint at the higher end. IIRC, they lost a lot of market share in the last quarter, and it will get worse once a full quarter of AMD's DX11 cards (at the same time as Christmas and the Win 7 launch) come through on the figures. That's even despite the constrained supply which will clear and make the situation worse for Nvidia.
 
Novum, if you are going to tell him he is wrong you might as well tell him how it does work ... ECC modules for PCs are 72 bits, with 8 bits ECC code for every 64 bits of data, implemented by simply using more memory chips in parallel.

As for Fermi ... my assumption is that they simply interleave the ECC codes with the data. IMO if it wasn't for legacy and inertia this should be the way to implement it in PCs as well.
 
ECC is calculated on the chip in all cases. There is no such thing like an "ECC chip on the DIMM". ECC DIMMs store 72 instead of 64 bits for every line. That's all.

I think I'm pretty right. It's called an EOS (ECC On SIMM) chip and. You'll find that any A brand (not Dell ;) ) x86/x64 has EOS chips on their DIMM.

Check this DIMM and notice the small IC in between the two rows of Memory chips. That's the ECC module.
 
Nvidia thinks they can grow the market and get into a new area at the ground floor.

Yes, that's what expanding into new markets means. What does that have to do with leaving existing ones? Have you heard anything from Nvidia that indicates they think HPC can replace graphics revenues by Q1 2010?

You mean like the rumours at the beginning of the year of Nvidia leaving the chipset business?

Is there an analogy there somewhere? Unless AMD or Intel can revoke Nvidia's license to use the PCIe bus I'm not sure there is. Besides Nvidia is a graphics company, not a chipset company.

How about their inability to supply their high end products, or their EOLing of products with no replacement but the MIA Fermi? Their major AIB manufacturers ceasing trading? There's a lot of stuff going on

Yeah that would all be relevant if a few months of distress dictated the fate of billion dollar companies. Fortunately, it doesn't.

Are they?

Yes, read their financials. I agree with Bob, it seems "numbers" and "facts" have gone out of style :LOL:
 
I think I'm pretty right. It's called an EOS (ECC On SIMM) chip and. You'll find that any A brand (not Dell ;) ) x86/x64 has EOS chips on their DIMM.

Check this DIMM and notice the small IC in between the two rows of Memory chips. That's the ECC module.
I'm pretty sure that's only to catch transmission errors, not random bit flips in memory. Otherwise, you'd find that your 1 GB DIMM only stored 8/9ths of a GB. ECC DIMMs have 9 RAM chips on them.
 
I think I'm pretty right.
Yet, you are clearly not. I'm 100% positive, because I know quite a bit about coding theory.

Check this DIMM and notice the small IC in between the two rows of Memory chips. That's the ECC module.
That's a registered DIMM. Entirely different story than ECC.

ECC RAM is often registered because they are primarily used in servers where registered RAM is used to have more DIMMs per channel.
 
Back
Top