NVIDIA GF100 & Friends speculation

Aiming rather low aren't we?
GTS450/GF106 at high clocks should make short work of this.
Well, I'm sorta hoping so. We shall see. Last time I looked at the power consumption figures, though, they were rather disappointing.

ATI already did it with the 5750/70, it's double slot just for the fact that Eyefinity screams ports; new cards are technically single slotted but bracketed for two. If perf/W gets closer to Evergreen for GF10X then you might see your card there.

But then again unless you didn't jump on a 4850 (which is essentially that sans DX11) then this seems marvelous.
Well, I've had enough issues with ATI on my Linux laptop that I'd rather not go ATI again, at least for a little while.
 
Well, "cache structure" is a lot different from "cache"… I don't think anyone really expected GF104 to have no cache at all.

I disagree with that.. If you look at nVIDIA slides, you will see GTX460 outperforming Radeon 5830, something which doesnt happen in Gigabyte slides.
The catch? Different configuration. Resolution is lower on NVIDIA slide than Gigabyte and there is no AA at nVIDIA one.
That tells me the problem in Gigabyte one is running out of memory/bottlenecks on the 192 bit memory bus.

Probably a bit of both. I guess it does run out of memory from time to time, and when it doesn't, its low geometric power (relative to GF100) doesn't make up for the loss of framerate when it's out of memory.
 
So now Nvidia can have it's dual GPU thing ?

I am guessing if GTX460 1GB is better or equal to GTX465 , then a full GF104 with 384 SPs and 64 TMU can be equal or slightly better than GTX 470 , Nvidia could call it GTX 475 , and release it's Dual GPU using two of them , which could be equal or slightly better than HD5970 !
 
So now Nvidia can have it's dual GPU thing ?

I am guessing if GTX460 1GB is better or equal to GTX465 , then a full GF104 with 384 SPs and 64 TMU can be equal or slightly better than GTX 470 , Nvidia could call it GTX 475 , and release it's Dual GPU using two of them , which could be equal or slightly better than HD5970 !
I dunno a full GF104 beating GTX470 sounds like a stretch. These leaks show the GTX460 768MB to be consistently slower than the GTX465, despite a theoretical shader power advantage. It does have a bit less memory bandwidth and rops though. So let's say that maybe the 1GB version is indeed as fast as a GTX465, but really the GTX470 looks totally out of reach. A full GF104 would only add another 48 ALUs (I assume clocks can't be significantly increased due to power consumption), whereas the GTX470 adds (over GTX465) 96 cores, and more rops/memory bandwidth.
That said a dual GF104 card would still be quite nice, not going to beat HD5970 however.
I wonder though what would a full single GF104 card be called? There are no gaps in numbering left (well if you assume nvidia wants to stick to numbers divisible by 5). Maybe just replace the GTX465 with it and call it the same? Somehow I find it hard to believe they can't release GF104 in full form.
 
I dunno a full GF104 beating GTX470 sounds like a stretch. These leaks show the GTX460 768MB to be consistently slower than the GTX465, despite a theoretical shader power advantage. It does have a bit less memory bandwidth and rops though. So let's say that maybe the 1GB version is indeed as fast as a GTX465, but really the GTX470 looks totally out of reach. A full GF104 would only add another 48 ALUs (I assume clocks can't be significantly increased due to power consumption), whereas the GTX470 adds (over GTX465) 96 cores, and more rops/memory bandwidth.
That said a dual GF104 card would still be quite nice, not going to beat HD5970 however.
I wonder though what would a full single GF104 card be called? There are no gaps in numbering left (well if you assume nvidia wants to stick to numbers divisible by 5). Maybe just replace the GTX465 with it and call it the same? Somehow I find it hard to believe they can't release GF104 in full form.
Maybe full GF104 will be the GTX 465 384 a la GTX 260 216 or how there were both G80 and G92 based 8800 GTS?

I'm thinking the true successor to the GTX 470 will remain GF100 based, probably a GTX 475 480SP part with a 320-bit memory bus possibly with slightly higher clocks assuming yields improve or they go to a new stepping. This'll no doubt complement a full GF100 512SP, 384-bit memory bus GTX 485 Fermi.
 
Well, it seems like nVIDIA did manage to increase memory clocks up by 100MHz on the GTX460. Thats 900MHz (effectively @ 3600MHz) compared to 800Mhz (effectively @3200MHz), although the total bandwidth is about 20GB/s less due to the 192bit bus.

The 1GB version is also clocked higher at 725MHz core and 1450MHz shader frequency. I think a full fledged GF104 could be able to take on a GF100 to some extent. For instance the full fledged GF104 will have 64TMUs compared to 56 found on GTX470. Also, the lack of ALUs can be compensated by increase in the core/shader frequency. Note that the core clock should also affect texturing performance.

So something like a card that has 384 cores clocked at 750/1500MHz, 64 TMUs and 256bit with GDDR5 memory clocked at maybe ~1000MHz. Not sure about the number of ROPs though. The TDP figure would surely be lower with the GF104 too (based on the fact that the increase in clocks and memory bus/size for the GTX460 results in a 10W increase in TDP).
 
Does anyone know the arrangement of the GF104 core? Is it 336 or 352 or 384 CCs? GF100 is 512 CCs (16 groups of 32 shaders); similarly I'm guessing GF104 is 384 CCs (8 groups of 48 shaders). The 460 is labled 336 CCs & 7 'groups'. Any chance GF104 is 12 groups of 32 shaders?
 
Does anyone know the arrangement of the GF104 core? Is it 336 or 352 or 384 CCs? GF100 is 512 CCs (16 groups of 32 shaders); similarly I'm guessing GF104 is 384 CCs (8 groups of 48 shaders). The 460 is labled 336 CCs & 7 'groups'. Any chance GF104 is 12 groups of 32 shaders?

The GTX460 has 7 SMs out of 8 total operational. Fully spec'ed GF104 = [8*(3*16)], 8 TMUs/SM, 2 GPCs (2 raster units/2 tri setups), 4*64bit bus with 8 ROPs/partition.
 
The GTX460 has 7 SMs out of 8 total operational. Fully spec'ed GF104 = [8*(3*16)], 8 TMUs/SM, 2 GPCs (2 raster units/2 tri setups), 4*64bit bus with 8 ROPs/partition.

So NVIDIA made the SMs bigger... That just makes the salvaging issue worse, doesn't it?

I mean 48/384 = 12.5%, they have to disable at least 12.5% of the [strike]chip[/strike] shaders if there's one unrecoverable defect… which is exactly what is going on with the GTX 460, and the lower bin (GTS 450?) would have 288 SPs at most.
 
Last edited by a moderator:
I mean 48/384 = 12.5%, they have to disable at least 12.5% of the chip if there's one unrecoverable defect…
The chip is more than just GPCs, so it's less than that. And NVidia is salvaging SMs and ROPs independently.

I think more fuss should be made over redundancy in HD5850. Why are cores being lost when RV770's salvage model seemed fine. For a junk part like HD5830, yeah, who cares.

Cypress's ROP salvage model is pretty brutal.

Obviously, 40nm woes are a factor here - hitting both IHVs' salvage models badly.
 
So NVIDIA made the SMs bigger... That just makes the salvaging issue worse, doesn't it?

I mean 48/384 = 12.5%, they have to disable at least 12.5% of the chip if there's one unrecoverable defect… which is exactly what is going on with the GTX 460, and the lower bin (GTS 450?) would have 288 SPs at most.
I don't think that's too much of a problem. Cypress, Juniper also have 10% granularity for disabling (shader) parts, whereas it's in fact 20% for Redwood. Also, 288SPs is still more than what most people assumed a full GF104 would have in the first place earlier... I would only think this is a problem if you really have to disable multiple SMs due to bad yields, then of course a finer granularity might well be desired, but we don't know yet if GF104 follows in GF100 steps there (I'd sure hope not!).

Well, it seems like nVIDIA did manage to increase memory clocks up by 100MHz on the GTX460. Thats 900MHz (effectively @ 3600MHz) compared to 800Mhz (effectively @3200MHz), although the total bandwidth is about 20GB/s less due to the 192bit bus.

The 1GB version is also clocked higher at 725MHz core and 1450MHz shader frequency. I think a full fledged GF104 could be able to take on a GF100 to some extent. For instance the full fledged GF104 will have 64TMUs compared to 56 found on GTX470. Also, the lack of ALUs can be compensated by increase in the core/shader frequency. Note that the core clock should also affect texturing performance.

So something like a card that has 384 cores clocked at 750/1500MHz, 64 TMUs and 256bit with GDDR5 memory clocked at maybe ~1000MHz. Not sure about the number of ROPs though. The TDP figure would surely be lower with the GF104 too (based on the fact that the increase in clocks and memory bus/size for the GTX460 results in a 10W increase in TDP).
A GF104 with these specs would be quite close in theory to GTX470. I would consider that a best case scenario though since I suspect it's likely voltage has to be increased significantly for these clocks which would mean TDP goes out of control again. And seeing these GTX460 vs GTX465 numbers, it looks like GF104 is just somewhat "slower per theoretical ALU throughput" to me, though I wouldn't know why. I guess it has less SFUs (per normal ALUs), but I'm not convinced that's important here. Maybe the cache changes really do make a performance difference in practice.
 
The chip is more than just GPCs, so it's less than that. And NVidia is salvaging SMs and ROPs independently.

I think more fuss should be made over redundancy in HD5850. Why are cores being lost when RV770's salvage model seemed fine. For a junk part like HD5830, yeah, who cares.

Cypress's ROP salvage model is pretty brutal.

Obviously, 40nm woes are a factor here - hitting both IHVs' salvage models badly.

Yes, I misspoke, that should be 12.5% of the shaders, which is still pretty high. As for the HD 5850, I would indeed blame 40nm woes, but perhaps there's also a bit of intentional market segmentation going on. After all, the 4850 and 4870 had different kinds of memory (GDDR3 and 5) for differentiation, while both 5850 and 5870 (have to) use GDDR5.

I don't think that's too much of a problem. Cypress, Juniper also have 10% granularity for disabling (shader) parts, whereas it's in fact 20% for Redwood. Also, 288SPs is still more than what most people assumed a full GF104 would have in the first place earlier... I would only think this is a problem if you really have to disable multiple SMs due to bad yields, then of course a finer granularity might well be desired, but we don't know yet if GF104 follows in GF100 steps there (I'd sure hope not!).

If I'm not mistaken, Cypress has 20 80-wide SIMDs, so 80/1600 = 5%, not 10. Since Juniper is a half-Cypress, it's 10% indeed, but then again Juniper is a much smaller chip than Cypress and GF104, so yields are probably not much of an issue to begin with. The same goes for Redwood.

As for GF104 following in GF100's footsteps, well… for now there's no indication of it being released with the full chip enabled, which doesn't exactly bode well.
 
Back
Top