NVIDIA Fermi: Architecture discussion

It wouldn't make sense for Nvidia to stick A2/3 and get a neutered chip when a silicon respin would fix a lot of their problems and increase yields...

A silicon respin could certainly improve upon many performance and efficiency metrics. However, a bigger question, at least for me, is whether there is a reasonable probability of this happening, assuming Fermi 2/Fermi's shrink is due this winter.
 
A silicon respin could certainly improve upon many performance and efficiency metrics. However, a bigger question, at least for me, is whether there is a reasonable probability of this happening, assuming Fermi 2/Fermi's shrink is due this winter.

TSMC and Global Foundries have stated 28nm isn't going to be ready until H2 2011. That means Nvidia need to get more mileage out of their current designs.
 
Unless they were ditching GF100 entirely as ATI did with R520/R600. And instead there's a new modified chip coming out for the fall.
GF100B (or whatever they'll call it in the end) is what's coming in the Fall. There may be a more or less new (still Fermi-based at it's core) 40G GF100B replacement down the road but its fate will depend on a lot of factors, and I won't be surprised if they'll wait for 28HP for their next top-end GPU.
 
TSMC and Global Foundries have stated 28nm isn't going to be ready until H2 2011. That means Nvidia need to get more mileage out of their current designs.

Must have missed it. Do you have a recent link?

Also, it means at best, we can expect a hybrid part this year from AMD.
 
Must have missed it. Do you have a recent link?

Also, it means at best, we can expect a hybrid part this year from AMD.

I can't find the link, but it was from the TSMC Fab 15 article somewhere. The CEO said 40nm was their concern right now and 28nm is delayed until later in 2011, probably H2. Global Foundries have a similar outlook as 32nm for AMD and ARM is their primary concern.
 
I can't find the link, but it was from the TSMC Fab 15 article somewhere. The CEO said 40nm was their concern right now and 28nm is delayed until later in 2011, probably H2. Global Foundries have a similar outlook as 32nm for AMD and ARM is their primary concern.

As far as I'm aware, this is GlobalFoundries' latest public roadmap:



And I haven't heard of any changes to the 28nm schedule since then.
 
GF104, GF100 Core Architecture Comparison
http://news.mydrivers.com/Img/20100730/02501268.jpg
GF104 SM architecture (part of the speculation)
http://news.mydrivers.com/Img/20100730/02503995.jpg
GF100 SM architecture
http://news.mydrivers.com/Img/20100730/02504021.jpg
NVIDIA graphics core in recent years, the evolution diagram
http://news.mydrivers.com/Img/20100730/02521912.jpg
G80, GT200, GF100, GF104 contrast the core memory and multithreading
http://news.mydrivers.com/Img/20100730/02521937.jpg
 
Did Nvidia beef up GF104's texture units? Was just browsing Damien's english review and it seems FP16 and RGB9E5 are now full speed as opposed to half speed on GF100.

texturing.png
 
Of course, thanks. Saw it on my second read through :) Wonder why they bothered.
My first guess would be that it was something that was intended for the GF100 all along, but there was a bug in the hardware implementation that forced them to implement these modes with reduced performance.

As for why they would have wanted to go this route in the first place, well, that would make sense if they feel that these modes will become more and more common as time goes forward, and if the added hardware cost was minimal.
 
Maybe the full-speed fp16 was just a later addition which didn't make it for GF100.
That said, it would imho make more sense for GF100 than GF104, since GF100 has lower tex:alu ratio (and also higher memory bandwidth / tex). Unless you think it doesn't matter for GF100 since it looks more useful for non-gaming usages anyway..
 
Interesting that the fp formats have seen performance increases from GF100->GF104, but the int formats have seen performance decreases. Also, there appears to be a hard cap @ 33.3 GTexels/s for 3 of the formats. Any thoughts as to what might be causing this? Is it a lack of cache or cache bandwidth? Some other architectural limitation? I don't think it's a lack of VRAM or VRAM bandwidth since GF104 out-performs GT200b in 2 of the 3 formats.
 
Also, there appears to be a hard cap @ 33.3 GTexels/s for 3 of the formats. Any thoughts as to what might be causing this? Is it a lack of cache or cache bandwidth? Some other architectural limitation? I don't think it's a lack of VRAM or VRAM bandwidth since GF104 out-performs GT200b in 2 of the 3 formats.
It's the theoretical max throughput of the 56 TMUs * 0.675 GHz = 37.8 GTexel/s. Obviously the efficiency (88%) is slightly lower than on AMD GPUs (~98% or so) for this simple tasks.
 
It's the theoretical max throughput of the 56 TMUs * 0.675 GHz = 37.8 GTexel/s. Obviously the efficiency (88%) is slightly lower than on AMD GPUs (~98% or so) for this simple tasks.
I think the more interesting comparison is GTX470/480 - 60 TMUs *0.7 GHz = 42 GTexels/s and it is achieving 41.4 GTexels/s (for int8 only though) - 99%. So for some odd reason GF104 can achieve less of the peak potential of the tmus.
 
I'm showing (almost) the same here. 33.8 GTex is the maximum i can get out of a stock GF104 with bilinear filtering. With trilinear it's a more expected 18.9 GTex/s. Together with the point sampling result of - again - 33.8 GTex/s I'm guessing, it's maybe interpolation or adress bound.

An HD5830 is literally miles away at 43.6 and 22.4 GTex/s.
 
Back
Top