NVIDIA Fermi: Architecture discussion

rpg.314 · Jul 25, 2010

NathansFortune said:
It wouldn't make sense for Nvidia to stick A2/3 and get a neutered chip when a silicon respin would fix a lot of their problems and increase yields...

A silicon respin could certainly improve upon many performance and efficiency metrics. However, a bigger question, at least for me, is whether there is a reasonable probability of this happening, assuming Fermi 2/Fermi's shrink is due this winter.

Alexko · Jul 25, 2010

There can't be any shrink this winter, since TSMC's 40nm process is the smallest available.

NathansFortune · Jul 25, 2010

rpg.314 said:
A silicon respin could certainly improve upon many performance and efficiency metrics. However, a bigger question, at least for me, is whether there is a reasonable probability of this happening, assuming Fermi 2/Fermi's shrink is due this winter.

TSMC and Global Foundries have stated 28nm isn't going to be ready until H2 2011. That means Nvidia need to get more mileage out of their current designs.

DegustatoR · Jul 25, 2010

Silent_Buddha said:
Unless they were ditching GF100 entirely as ATI did with R520/R600. And instead there's a new modified chip coming out for the fall.

GF100B (or whatever they'll call it in the end) is what's coming in the Fall. There may be a more or less new (still Fermi-based at it's core) 40G GF100B replacement down the road but its fate will depend on a lot of factors, and I won't be surprised if they'll wait for 28HP for their next top-end GPU.

rpg.314 · Jul 25, 2010

NathansFortune said:
TSMC and Global Foundries have stated 28nm isn't going to be ready until H2 2011. That means Nvidia need to get more mileage out of their current designs.

Must have missed it. Do you have a recent link?

Also, it means at best, we can expect a hybrid part this year from AMD.

NathansFortune · Jul 25, 2010

rpg.314 said:
Must have missed it. Do you have a recent link?

Also, it means at best, we can expect a hybrid part this year from AMD.

I can't find the link, but it was from the TSMC Fab 15 article somewhere. The CEO said 40nm was their concern right now and 28nm is delayed until later in 2011, probably H2. Global Foundries have a similar outlook as 32nm for AMD and ARM is their primary concern.

Blazkowicz · Jul 25, 2010

How interesting, as People's Republic of China domestic CPU industry (loongson processors) stated they aim for 32nm at end of 2011

.

Alexko · Jul 25, 2010

NathansFortune said:
I can't find the link, but it was from the TSMC Fab 15 article somewhere. The CEO said 40nm was their concern right now and 28nm is delayed until later in 2011, probably H2. Global Foundries have a similar outlook as 32nm for AMD and ARM is their primary concern.

As far as I'm aware, this is GlobalFoundries' latest public roadmap:

And I haven't heard of any changes to the 28nm schedule since then.

aaronspink · Jul 26, 2010

Blazkowicz said:
How interesting, as People's Republic of China domestic CPU industry (loongson processors) stated they aim for 32nm at end of 2011 .

They've stated a lot of things over time and they contract out the fabrication to non Chinese companies.

Man from Atlantis · Jul 31, 2010

GF104, GF100 Core Architecture Comparison
http://news.mydrivers.com/Img/20100730/02501268.jpg
GF104 SM architecture (part of the speculation)
http://news.mydrivers.com/Img/20100730/02503995.jpg
GF100 SM architecture
http://news.mydrivers.com/Img/20100730/02504021.jpg
NVIDIA graphics core in recent years, the evolution diagram
http://news.mydrivers.com/Img/20100730/02521912.jpg
G80, GT200, GF100, GF104 contrast the core memory and multithreading
http://news.mydrivers.com/Img/20100730/02521937.jpg

trinibwoy · Aug 2, 2010

Did Nvidia beef up GF104's texture units? Was just browsing Damien's english review and it seems FP16 and RGB9E5 are now full speed as opposed to half speed on GF100.

Alexko · Aug 2, 2010

Damien's reviews usually deserve a bit more than a quick browsing…

Moreover, the texturing units have been improved to filter FP16 textures (as well as FP11, FP10 and RGB9E5) at full speed.

http://www.behardware.com/articles/795-2/report-nvidia-geforce-gtx-460.html

trinibwoy · Aug 3, 2010

Of course, thanks. Saw it on my second read through

Wonder why they bothered.

KimB · Aug 3, 2010

trinibwoy said:
Of course, thanks. Saw it on my second read through Wonder why they bothered.

My first guess would be that it was something that was intended for the GF100 all along, but there was a bug in the hardware implementation that forced them to implement these modes with reduced performance.

As for why they would have wanted to go this route in the first place, well, that would make sense if they feel that these modes will become more and more common as time goes forward, and if the added hardware cost was minimal.

mczak · Aug 3, 2010

Maybe the full-speed fp16 was just a later addition which didn't make it for GF100.
That said, it would imho make more sense for GF100 than GF104, since GF100 has lower tex:alu ratio (and also higher memory bandwidth / tex). Unless you think it doesn't matter for GF100 since it looks more useful for non-gaming usages anyway..

ShaidarHaran · Aug 3, 2010

Interesting that the fp formats have seen performance increases from GF100->GF104, but the int formats have seen performance decreases. Also, there appears to be a hard cap @ 33.3 GTexels/s for 3 of the formats. Any thoughts as to what might be causing this? Is it a lack of cache or cache bandwidth? Some other architectural limitation? I don't think it's a lack of VRAM or VRAM bandwidth since GF104 out-performs GT200b in 2 of the 3 formats.

TKK · Aug 3, 2010

ShaidarHaran said:
I don't think it's a lack of VRAM or VRAM bandwidth since GF104 out-performs GT200b in 2 of the 3 formats.

Also, if it was the case there should be a difference between the two GTX 460 variants, which isn't the case.

Gipsel · Aug 3, 2010

ShaidarHaran said:
Also, there appears to be a hard cap @ 33.3 GTexels/s for 3 of the formats. Any thoughts as to what might be causing this? Is it a lack of cache or cache bandwidth? Some other architectural limitation? I don't think it's a lack of VRAM or VRAM bandwidth since GF104 out-performs GT200b in 2 of the 3 formats.

It's the theoretical max throughput of the 56 TMUs * 0.675 GHz = 37.8 GTexel/s. Obviously the efficiency (88%) is slightly lower than on AMD GPUs (~98% or so) for this simple tasks.

mczak · Aug 3, 2010

Gipsel said:
It's the theoretical max throughput of the 56 TMUs * 0.675 GHz = 37.8 GTexel/s. Obviously the efficiency (88%) is slightly lower than on AMD GPUs (~98% or so) for this simple tasks.

I think the more interesting comparison is GTX470/480 - 60 TMUs *0.7 GHz = 42 GTexels/s and it is achieving 41.4 GTexels/s (for int8 only though) - 99%. So for some odd reason GF104 can achieve less of the peak potential of the tmus.

CarstenS · Aug 3, 2010

I'm showing (almost) the same here. 33.8 GTex is the maximum i can get out of a stock GF104 with bilinear filtering. With trilinear it's a more expected 18.9 GTex/s. Together with the point sampling result of - again - 33.8 GTex/s I'm guessing, it's maybe interpolation or adress bound.

An HD5830 is literally miles away at 43.6 and 22.4 GTex/s.

NVIDIA Fermi: Architecture discussion

rpg.314

Alexko

NathansFortune

DegustatoR

rpg.314

NathansFortune

Blazkowicz

Alexko

aaronspink

Man from Atlantis

idk

trinibwoy

Meh

Alexko

trinibwoy

Meh

KimB

mczak

ShaidarHaran

hardware monkey

TKK

Gipsel

mczak

CarstenS

Moderator

Similar threads