NVIDIA GF100 & Friends speculation

Broadly speaking, int32 mul will be needed for load/store operations, while spfp for alu ops. So spfp:int should be higher than 1:1.
I won't disagree. ATI's 1:4/1:5, but the nature of the architecture (i.e. 5-issue) makes it tend towards being comparatively lower. By contrast a low ratio on NVidia is more costly for throughput, because a larger proportion of the chip's ALUs' capability is "blocked" while computing an int32 MUL (if emulated).

The tricky question is working out what's a reasonable throughput.

To fit 3 GPCs in GF104 and be < 300mm², the GPCs would have to be ~20% smaller. Anyone got any ideas for how much per-GPC space would be saved deleting DP and int32 MUL?

Jawed
 
That is completely ridiculous. This would, if it were true, have nVidia designing not one, but two DX11 architectures, instead of just designing one DX11 architecture and scaling it down.

It was regarding to specs, I didn't say it's GT212 with DX11 tacked on, but they seem related when just looking at the basic specs. GF104 is larger than Cypress.
 
It was regarding to specs, I didn't say it's GT212 with DX11 tacked on, but they seem related when just looking at the basic specs. GF104 is larger than Cypress.

Define basic specs. For me basic specs are like:
  • Number of Shaders
  • Clocks
  • Memory bus
  • TMUs
  • ROPs
From this you can define if an architecture is closer to GT200 or GF100?
A straightforward question, if you really know anything: Are there GPCs, or not, on GF104? On what ground do you base yourself then, to say they are related?
 
That's the same as when I would say:

George W. Bush seems to be an intelligent guy

Hmm... So you are aknowledging you dont know anything then... other than rather basic specs.. And based on that you are speculating it seems related to GT200.. Really??? Thats quite a weird logic you have there...
 
That is completely ridiculous. This would, if it were true, have nVidia designing not one, but two DX11 architectures, instead of just designing one DX11 architecture and scaling it down.

Chill out man. He's just saying gf104 doesn't have caches as we know them in gf100. There's more to an architecture than caches and his statement was quite general in nature.
 
The tricky question is working out what's a reasonable throughput.
I would look at where it is used. And on the current evidence, it doesn't seem to be used outside addressing.

AFAIK, even in cpu codes, int mul is rarely used for compute as such.
 
To say that neliz doesn't know anything is a joke in itself...

My interpretation of his comments is that, DP is cut, caches are simplified/minimized and that the base specs are very similar to what GT212 was shaping up to be. Sorta like how RV770 and Juniper have similar specs but are different "architectures."
I don't seem to understand why his comments are so confusing to some... Maybe because I have been reading his posts for the last couple of years?
 
To say that neliz doesn't know anything is a joke in itself...

I don't know. Maybe i should ask my GTX480 which i bought for under 500€:
He had no information about the ramping of GF100...
Or wait, we should ask Mr. Huang what he think about the ramping process:
"With our new Fermi-class GPUs in full production, NVIDIA's key profit drivers are fully engaged," said Jen-Hsun Huang, NVIDIA CEO and president. "We shipped a few hundred thousand Fermi processors into strong consumer demand.
http://phx.corporate-ir.net/phoenix.zhtml?c=116466&p=irol-newsArticle&ID=1426701&highlight=

And the CFO:
Much of this increase was attributable to revenue from GeForce GTX 470/480 and GeForce GT320M which ramped very successfully late in the quarter.

The increase was largely due to higher-than-expected yields of 40nm products. In particular, good yields of GTX470 and GTX 480 meant we had insufficient back end capacity to assemble, test and ship these products before the end of the quarter.
http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9NDYxNTV8Q2hpbGRJRD0tMXxUeXBlPTM=&t=1
 
Last edited by a moderator:
Something doesn't add-up here. 3 GPCs, 32 ROPs and 256-bit bus, based on a 529mm² GF100 = 333mm² for GF104. 2 GPCs would make it 267mm².

Anyone want to have a go at making a better estimate based on the GF100 die shot?

There appears to be some recanting regarding die size going on in the orginal thread, cfcnc post #82 when asked by napoleon to confirm the < 300mm2 die size:
邪恶之数不够鸟。
(Evil person to count not enough birds)
ie the original post has not counted the die area as big enough.

Also some doubt on the amount of castration the chip will orginally have in post #86:
卖油条的大伯貌似只有第二行还靠谱点
Fried breadstick selling uncle only 2nd point is still reliable point.
ie the full version of chip is 384sps/256bit, the initial castration may not end up being so severe.

Finally a new cheaper product replacing older more expensive product at the same performance level. Given limited capacity, there is no case to wait at all. You simply initally raise the price of new product and lower old product price till they are roughly equal to the customer preserving average margins. As the mix shifts to the new product can try as hard as you can to preserve the new progressively higher margins. This is sort of like buying stocks in your portfolio that have since fallen in price to reduce your overall average buy price of your holdings.

The only sensible explanation i can see for what appears to be happening is that they cannot yet make sufficient quantities of the full verson of the chip.
 
Last edited by a moderator:
GF104 is larger than Cypress.

~375mm2 according to rumors for quite some time now. I wouldn't be surprised though if transistor density would end up a tad more generous on 104 compared to GF100.

Define basic specs. For me basic specs are like:
  • Number of Shaders
  • Clocks
  • Memory bus
  • TMUs
  • ROPs
From this you can define if an architecture is closer to GT200 or GF100?
A straightforward question, if you really know anything: Are there GPCs, or not, on GF104? On what ground do you base yourself then, to say they are related?

Could be completely wrong (as many times before...) but I'd imagine 104 to look like that:


  • [8*(2*16)]
  • =/>700/1400/850+
  • 4*64bits
  • 8 TMUs / SM
  • 8 ROPs/partition


Hypothetically counting in pixels it might be capable of (8*2) 16 pixels/clock for rasterizing and 2 Tris/clock theoretical max. If true it would have 16 additional ROPs for AA efficiency and since a mainstream part isn't really meant for ultra high resolutions it might pose relatively good in that regard wherever bandwidth isn't a bottleneck.

Rumors of the past wanted the GF100 top dog to have a 3dmark P Vantage landing zone of 22k (at targeted 725/1450) and a 104 at 750/1500 ~15k.

All wrapped up the only possible common places the canned GT212 might have had with the GF104 are the amount of TMUs/SM. SP count is different (and the amount of SPs/SIMD), frequencies are different since only GF10x the majority of core runs at 1/2 hot clock, geometry throughput has vast differences, ROP amount per partition etc etc.
 
There appears to be some recanting regarding die size going on in the orginal thread, cfcnc post #82 when asked by napoleon to confirm the < 300mm2 die size:

ie the original post has not counted the die area as big enough.

Also some doubt on the amount of castration the chip will orginally have in post #86:

ie the full version of chip is 384sps/256bit, the initial castration may not end up being so severe.
I suppose it's a bit like the 480-550mm² arguments :p Oh well.

Anyway, while I'm dubious that cutting out DP (and int32 MUL) could save 20% of a GPC's area, I still can't rule it out.

The only sensible explanation i can see for what appears to be happening is that they cannot yet make sufficient quantities of the full verson of the chip.
No doubt NVidia will say that the chip is yielding perfectly, like GF100 is :rolleyes:

Jawed
 
That is completely ridiculous. This would, if it were true, have nVidia designing not one, but two DX11 architectures, instead of just designing one DX11 architecture and scaling it down.
That's making perfect sense!
You build one architecture from ground up, adress some critical points in the pipeline, add a ton of features and hope, you can get to market in time, while at the same time, you're doing a bolt-on-refresh of your older arch, just in case something goes wrong with the new one. Then, after being late half a year or so, with you new arch, you try and sell the back-up-plan to people also - as a mainstream variant.

No - wait a second. Something doesn't add up here...

edit:
What i could imagine, taking the allegedly cut down computation focus on GF104 as a given, is a simpler cache. No ECC on them for example, and no configurable L1-Cache, but only 32k Shared-Memory. You could also go back and put an Octo-TMU onto each SM and, forfeiting the L/S-Units, use the TMUs for getting all data into the chip. But that would hurt Atomics big time, wouldn't it?


It is available, but if the rumors and analyst speculations are correct, there's less chips out there total now than ATI had Cypresses on launch week, the demand just isn't as high
There's a slight difference in launch strategies. AMD had beend bulding up inventory for weeks if not months and only launched Cypress that "early", because it had to be on the showcase tour with Microsofts Windows 7, where as Nvidia had both legs and arms stretched to launch in somewhat-Q1 at all.

In the conf-call, they said something about a couple of hundred thousand chips (so maybe 200k). When did AMD announce their 2M-mark? Sometime around Dec 2009?
 
Last edited by a moderator:
Back
Top