NVIDIA GF100 & Friends speculation

I expect the whole disitributed architecture to remain of course.
Sure. It wasn't my intention to imply you were thinking along the same lines as i was. :)


Another thing:
How sure are we, that GF100 really is 16+16 in terms of scheduling rather than (8+8)+(8+8)? The latter would be a much more conservative approach and not sound so overly innovative, wouldn't it? By getting rid of the MUL, Nvidia could even have lowered complexity of the individual schedulers a bit.
 
The problem with exposing the MUL is that there's no issue port or decode logic for that now.
Of course there is. The GF100 can issue 2*16 instructions to any combination of units (ALU/ALU, ALU/SFU, ALU/LS, SFU/LS). If nvidia decides to remove one of the blocks with 16 ALUs as suggested by Damien, you would only loose the ALU/ALU combination (as there is only a single ALU block left). But if the MUL functionality in the SFUs is still there (it most likely is to at least some extent, as one needs some multipliers for the special functions either way), it could work quite well. As adders are really cheap, nvidia could even add some of them to the SFUs.

Nvidia stated the GF100 has a clear load execute structure. That means any operand coming from the cache/local memory has to be loaded to a register by a command through the L/S units either way (GT200 ALUs had the capability to directly access the local memory). This could then happen in parallel to ALU instructions. That means the ALU utilization for code with memory instructions or SFU instructions would be higher (in GF100 one ALU block is idle when a L/S or SFU instruction gets issued). So in fact, the throughput per ALU would be higher if one of the blocks per SM would be removed. One would have enough issue ports and register bandwidth to ensure the MULs of the SFUs could be used to 100%. So in my opinion, it isn't the worst idea one could have.
 
Last edited by a moderator:
Im surprised noone has caught the new piece of Charlie yet.

What are Nvidia's GF104, GF106 and GF108?

http://www.semiaccurate.com/2010/06/21/what-are-nvidias-gf104-gf106-and-gf108/

He does a valid question in the middle of all that rant, though.

Another good example of how unchanged the GF104 is can be seen by the shape of the GF104 at EXPreview. If you take three of the four roughly square clusters of 128 shaders from the GF100 and arrange them in a shape that wastes the least area, what do you get? A rectangle, long and thin, like this. That more than anything shows the cursory nature of the cut and paste job done by Nvidia, don't expect any real changes.

If nVIDIA was really changing the SPs organization, it might have been a square chip after all? As people talking about it still having 16 SMs? 16 SMs would be a square chip? :idea:

At the same time he has a pretty lame, and sort of a hitting-at-all-points-so-i-can-always-be-right excuse for the 336 cuda cores screenshot :LOL:

The 336 shader number is likely an artifact of the stats program not being fully aware of the new chip yet, but Nvidia might have added the ability to disable half a cluster.

However at same time he seems to be confirming the reports of its close to GTX470 level, while overclocked, which I dont consider a bad thing at all, but as usually he does :p

If Nvidia raises the clocks enough to make them competitive, the chips not only blow out the power budget, and drop yields, but they also obsolete the GF100 based 470.

Too a more hilarious note, Charlie went from a "FERMI is not scalable" perspective to "nVIDIA just made a cut and paste job" with a modular design, including the possibility of a 384 core chip beat a 448 if overclocked :rolleyes:

So much for a not scalable architecture. All in all a confusing article which, instead of informing, its just adding more mud into the rumour mill. After reading it, get the feeling he clearly doent have as much info on GF104, as he had on GF100. At least he doesnt seem so secure.
 
At the same time he has a pretty lame, and sort of a hitting-at-all-points-so-i-can-always-be-right excuse for the 336 cuda cores screenshot :LOL:

I think he quite clearly claims that core counts aren't set yet:

Charlie said:
Nvidia was tweaking clocks and shader counts three weeks before 'shipping' in late March, and we hear much of the same hand wringing is going on now.
 
Gtx460:

gtx4601.jpg
 
Supposedly 8" long, and i can see only six memory chips so looks like the 192 bit version. From the components at the back of the chip, it indeed does look rectangular. The cooler also looks beefy(i see two heatpipes) so the TDP is probably high ish

Edit: There are two six pin PCIE connectors , which suggests TDP of close to or over 150W
 
Last edited by a moderator:
Supposedly 8" long, and i can see only six memory chips so looks like the 192 bit version. From the components at the back of the chip, it indeed does look rectangular. The cooler also looks beefy(i see two heatpipes) so the TDP is probably high ish

Edit: There are two six pin PCIE connectors , which suggests TDP of close to or over 150W

Like the GTX470/480 before it, it too will draw more power than is suggested.
180W for the GTX460(256CC/192bit) in those special circumstances, GT(X/S)450 should be 192CC and 150W.
 
Are you sure those are legit? I've never heard of any board manufacturer painting over the solder points (top 2 PCIE power points).

Also with 2x6 pin PCIE, that's up to 225 watts possible as it can also draw 75 W from the PCIE slot.

Regards,
SB
 
sorry for more off-topic..
Moved.

Power/thermals cannot be reason for disabling some of the cores;

More cores at lower clock speed(giving same performance) will consume less power/create less heat than smaller number of cores at higher clock speed.

(power is linear to core count, but dynamic power is f^3 when voltage is adjusted accordingly)
The stuff that's disabled in GTX480 should be turned off entirely, fused. That's my understanding.

Regardless, the issue is basically that to get all of the chip functioning requires higher voltages - some cores just need more voltage to work, at all. x86 has the concept of per-core power planes with independent voltage control. But in GTX480 core voltages are common across the entire chip as there's no concept of separate power planes per core. So an increase in voltage for the entire chip, to fix-up one core, takes thermals/power into a very uncomfortable place. Remember that NVidia's TDP figures for consumer cards are a fabrication, much like GTX285/280 and HD4800 series.

So NVidia found a way to minimise the loss in GTX480 performance by tuning voltages, core count and frequencies. With performance heavily dominated by ROP count, that seems like the right thing. Still don't know if there's truth in the idea that the original specification had 128 TMUs - nor how much difference that would make if it was true.

Now, if NVidia is not fusing off some cores in GTX480, merely making it a BIOS switch, then that appears to be a marketing effort in order to persuade buyers that there's value to be unlocked. Similar to AMD's Black edition processors with their unlocked multipliers.

If someone is prepared to hand-test and tune each card, then clearly it's possible to squeeze the maximum out of the core. Maybe there's a decent chance NVidia will launch a "full spec" card, say at the end of summer in time for the back to school buying season?

Jawed
 
GF100 High Power Consumption: An effect from the distributed architecture?

I was thinking the other day about the chip distribution in GF100, and its possible relation with GF100 rather inneficient perf/watt. And I got the following conclusions. I would like some oppinions on it, as its something that I havent seen yet anywhere. Charlie just says that the chip is power hungry, which we all know it is, but without an explanation, other then a bad 40nm process, thats just a la palice truth.

So, to kick it off, what if the chip is power hungry mainly because of its distribution? My argument follows:

In a not distributed architecture, without GPCs, and TMU's attached to the SMs, there would be just a single TMU structure right? So all the "energy costs" associated with the TMU's are concentrated in "one space". Sort of like economy fixed+variable costs, if you know what I mean. For a given TMU there would be fixed costs, but as all TMU's are in the same place, they get really diluted and negligible.

Now picture GF100, with the TMU's distributed in the GPC's, through the SM's. Here you would have a lot of fixed "energy costs" multiplicated by the number of TMU's blocks. In this case the fixed costs are not diluted by the low number of TMU's in each block, exacerbating total energy cost.
To put it in a more simple way: where before you had 1xFixed Cost, now you have 16xFixed Cost. Of course each of the 16 fixed energy cost is lower than the 1xFixed Cost of before, but probably not 1/16 of it. In the end power consumption would be higher.

What do you think? Does it make sense? I think this could be the ground for the GF100 architecture is broken stance.
 
Back
Top