NVIDIA GF100 & Friends speculation

If all these leaks are true, and i bet they are, the GTX470/480 kinda sucks. It's not an R600 or an NV30(And anyone claiming as of now that it is, is biased as hell), but still rather disappointing.

Here's hoping the mainstream derivative(The one I can actually afford... >_>) doesn't suck.
 
I think the sandbagging theory may hold some water, as ATI wasn't even doing the regular AI mipmap optimizations on Evergreen, IIRC.
AMD states the driver policy with new chips is stability first, performance later - so what looks like sandbagging is theoretically nothing more than simple driver lifecycle for a new chip. New games presumably suffer the same fate with poor performance (particularly for the CrossFire afflicted) at launch. Though benchmarking shenanigans for "important" games (like flickering textures, which seems to be a recurrent theme) are obviously a factor.

I get a sense that 10.3a are sufficiently un-stable, for instance, that a few review sites will reject them for upcoming comparisons with GF100.

Jawed
 
I guess engineers would have it a LOT easier if creating an entire family of GPUs from one top dog it truly would be an as simple task as just copy/paste. No IHV has a magic wand in that regard.

AMD was clever enough to develop Cypress and Juniper in parallel but as far as I understand things that's a design decision you either make from the beginning or in a very early development stage.

NVIDIA today obviously hasn't had any parallel development for any other GF10x family members and it seems like they have to get the top dog first out the door before they can start with the smaller derivatives.

In all that I would also factor in any possible TSMC supply constraints which according to TSMC's claims we know that it'll really take off in the third quarter this year.

Exactly, AMD decided since the RV670 days to go for the sweetspot strategy and it seems to be paying off now, and this gen looks like its gonna hurt NV. I suspect we wont stop seeing NV's big die strategy but they might start developing the mid range in parallel or at least decraese the lag time for the next gen. Otherwise they're gonna give AMD a free reign in the mid range market in each gen(and in this gen because of fermi's delays even the top end market)
 
You do know that since the geometry is distributed all over the chip, that each polymorpgh engine has its own private communication channel to the others? Do you think that it is very easy to do? Im not saying its extremely hard, just that its not any easier to do than in earlier architectures and probably even harder.

Then maybe you should compare previous architectures with this one, because you clearly haven't...

Erinyes said:
Again if you did read my post clearly i said Fermi is probably harder to scale down compared to earlier chips, i never said it isnt scalable :rolleyes:

I certainly won't start a pointless discussion over semantics and whether you really meant what you implied...

Erinyes said:
Again i repeat that same question i asked, if it was so easy why havent we seen any fermi derivatives till now? I gave you the example with G80 where we saw derivatives in six months. We're now 5 months i guess after the time GF100 should have come out(im hypothesising Nov 09). If it was easier we should have seen a derivative out by now. Period.

No, because you are forgetting (willingly or not) that that's not how historically NVIDIA has worked. Ailuros and I even discussed that not long ago, about the possibility of a strategy change, where instead of focusing on the high-end first, NVIDIA would also be working in parallel, on the mid-range. Being an arm-chair engineer/company manager and saying stuff like "it's not like this, period", isn't really why things don't happen you know ?

Erinyes said:
Also if you think that in chip design its all very easy and just copy pasting one thing to another, then why dont we see all chips of an architecture out at once?

Do you have any idea of the R&D costs involved in multiple chips at once ? I certainly don't know exact numbers, but I'm pretty sure it's not cheap. PLus as I mentioned before, that isn't/wasn't NVIDIA's strategy up to this point. Maybe they will change that with Fermi 2.

Erinyes said:
heh, except for the last line your post had hardly anything to do with GF100 derivatives. And again why do you keep on harping about the fact that fermi is Highly scalable?

Because based on the architecture specs, it is ?

Erinyes said:
If there were derivatives they wouldnt have DP anyway. Even in the case of ATI only their high end chip had DP enabled. GT200 didnt bring anything new to the table(i know there were minor architectural differences) which is why they didnt need any new derivatives and could make do with G9x

So you seem to be agreeing with me that there was no need for mid-range and low-end parts based on GT200, when G9x was taking that place without problems and that saves a lot of money in R&D...

Anyway, this is getting off-topic. If you want to discuss more of this, PM me or let's take it to another thread. Let's get back to GF100 speculation, which hopefully will end soon anyway :)
 
Oh, neat. The limitations suggest that they aren't doing what I suggested, so do you think they lengthened the ALU pipeline? I can't imagine that Cypress can do A*B*C with 8 cycles latency.
Actually, Evergreens can do two dependent muls with 8 cycles latency.
The 4 xyzw ALUs are be organized in two pairs (x/z and y/w), which can do dependent operations (two pairs of it to be exact). In case the second operation is a simple add, the first operation can be *any* 32bit floating point op. In case the second operation is a mul, the first one has to be also a mul (at least according to the ISA docs), so A*B*C ist definitely possible within the 8 cycle physical latency.
The operations
a = b*c*d;
e = f*g*h;
can be executed in a single VLIW instruction:
w slot: MUL f*g
z slot: MUL b*c
y slot: MUL_prev h (result: f*g*h)
x-Slot: MUL_Prev d (result: b*c*d)

Eventually, the final result of the two ALU pairs (both executing dependent operations) can be added together. But for this addition, the two results are not normalized before addition, so it is not a feature one can use in all instances. But this possibility is used for the DOT4 instruction which delivers its result also after 8 cycles:
Code:
  x    z       y    w
A*B+ (C*D) [COLOR=red]+[/COLOR] E*F+ (G*H)
The red "+" symbolizes the addition without normalization, which is probably done to shave off some latency and to fit the whole DOT4 operation into the available 8 cycles. The final result is only available in the PV.x register, so it is not written to the register file in this case. Some other combinations are also possible.
 
Exactly, AMD decided since the RV670 days to go for the sweetspot strategy and it seems to be paying off now, and this gen looks like its gonna hurt NV. I suspect we wont stop seeing NV's big die strategy but they might start developing the mid range in parallel or at least decraese the lag time for the next gen. Otherwise they're gonna give AMD a free reign in the mid range market in each gen(and in this gen because of fermi's delays even the top end market)

If they don't go for parallel development it won't change anything seriously IMO. It should be feasible; if not I'd like to hear why it shouldn't.
 
I'm not sure which pairs of lanes can participate in _PREV.
The _prev instructions can be used only in the x and y lanes. The x lane uses the result from the z ALU and the y lane the one from w ALU.

One has basically paired ALUs x and z is one pair, y and w the other one.

Edit:
I just see that this wouldn't fit the description of the interpolation instructions. Guess I have to try to trick the compiler into generating those instructions and see what comes out. With all those small errors in the ISA docs I don't really trust what stands there. It wouldn't make much sense in my view to create just another datapath for the interpolation instructions which is basically already there. A swizzle of the xyzw designations in the interpolation description would align the functionality quite well with the _prev instructions.
 
Last edited by a moderator:
http://www.fudzilla.com/content/view/18243/1/

GTS 450 = 256 SPs, 64 TMUs, 32 ROPs, 256 bit memory interface
GTS 430 = 192 SPs, 48 TMUs, 24 ROPs, 192 bit memory interface

So probably Neliz was right all along and GF100 was to have 128 TMUs, seeing as GF104 has same number, and even higher than GTX470...
Now I wonder how would that affect its performance. If GF100 is around 25% faster with 64 TMUs, it could be really a beast with 128 :eek:
 
So probably Neliz was right all along and GF100 was to have 128 TMUs, seeing as GF104 has same number, and even higher than GTX470...
Now I wonder how would that affect its performance. If GF100 is around 25% faster with 64 TMUs, it could be really a beast with 128 :eek:

IIRC in the GTX480 480SP models the TMU's are supposed to be cut to 60, too?
 
GTX480 benchmark:
25qw2lc.jpg
 
GTX480 benchmark:

Seems to me that the higher power consumption is reflected on performance. So it might be at more or less the same perf/watt of Cypress. And some of those numbers are really close to HD5970 :oops: Also it seems to be pulling 230W, a bit less than the 250W TDP...
 
Last edited by a moderator:
This is the first time I've heard something like this regarding 10.3a, care to elaborate? Official 10.3 drivers were also released yesterday.
I've just looked again at the Rage3D thread where I saw complaints relating to BFBC2 and it seems they're to do with how the game was "resumed" - so that doesn't seem like a real problem, now. I saw flickering in the HardOCP Eyefinity 6 preview during Metro 2033 gameplay, but now I'm not sure which driver Kyle was using.

Jawed
 
I've just looked again at the Rage3D thread where I saw complaints relating to BFBC2 and it seems they're to do with how the game was "resumed" - so that doesn't seem like a real problem, now. I saw flickering in the HardOCP Eyefinity 6 preview during Metro 2033 gameplay, but now I'm not sure which driver Kyle was using.

Jawed

Kyle said that issue was present on both nVIDIA and ATI cards, though.
 
Back
Top