NVIDIA GF100 & Friends speculation

Nvidia's schedule for the GF100 based cards (Google English Translated)

Benchmarks issued on the 29th? :(
If I'm understanding the translation correctly, it appears nVidia is sending out some last-minute BIOS updates to reviewers. My guess would be that there was a significant bug that wasn't detected until reviewing began, and so the delay in benchmark releases was necessary to get reasonable results.

This would also mean that most all present benchmarks are likely wrong.
 
And where did you get that idea ?
DId you even look at the architecture specs ? Did you see the GPC structure ?

Fermi is highly scalable and also, there have been many rumors about derivatives (GF104 and GF108 specifically). It seems you either missed or avoided them...

Erm yeah ive seen the architecture specs, and the GPC structure. Did you miss the part where all the SM's have to communicate with each other? Or is that very easy to do?

For crying out loud who is saying Fermi isnt scalable? Why is it Highly scalable? Any architecture is inherently scalable

Ok so "GF104 taped out, shipping in Q2" and "GF108 taped out, shipping in Q3" are the many rumours?

And what does that have to do with GF100 ? Some G80 derivatives came before ATI even launched the HD 2900 XT and that's exactly what ATI did this time as well. They launched their derivatives before NVIDIA even launched any DX11 product.
As for GT200, they didn't need derivatives, because 1) the design wasn't really that scalable and 2) G92 was happily taking the mid and low range and wasting money on R&D on a GT200 based chip that would just rival an existing one, would be...well...a waste. As for Fermi, if the latest rumors are true, GF104 will be out 3 months after the release of GF100.

Im just pointing out the general trend wrt derivatives which has everything to do with GF100's derivatives. If you actually read the part of my post you would have seen i was talking about the derivatives and not GF100.

1)Seriously? Then please tell us why wasnt GT200 really that scalable?

2)True that part i agree with, G92 and G94 along with their 55nm shrinks were doing very well and really didnt need replacement.
 
My guess would be that there was a significant bug that wasn't detected until reviewing began, and so the delay in benchmark releases was necessary to get reasonable results.

According to SKYMTL (HardwareCanucks) and Hilbert (Guru3D) there will be reviews posted after PAX.
 
According to SKYMTL (HardwareCanucks) and Hilbert (Guru3D) there will be reviews posted after PAX.

Sampsa also confirmed that Muropaketti's review will be public as soon as the NDA ends, which is pretty much exactly 12 hours from now
 
According to SKYMTL (HardwareCanucks) and Hilbert (Guru3D) there will be reviews posted after PAX.
Tomshardware also seems to corroborate this:
http://www.tomshardware.com/news/fermi-gf100-geforce-gtx-480-gtx-470,9985.html
There's less than 24 hours until all will be officially known on Fermi. Until then, here's a trip looking back on all the leaks we've come across during our travels.

Would definitely be more fun to get benchmarks earlier, even if waiting to Monday would, in the end, lead to more careful analyses.
 
Erm yeah ive seen the architecture specs, and the GPC structure. Did you miss the part where all the SM's have to communicate with each other? Or is that very easy to do?

What ? I'll be waiting to see you explain that part where all the SMs have to communicate with each other and how that hinders the process of disabling SMs in GPCs, to scale down the design...

Erinyes said:
For crying out loud who is saying Fermi isnt scalable? Why is it Highly scalable? Any architecture is inherently scalable

Uh...you did ?

Erinyes said:
And anyway as others have stated, with Fermi's design, it is more likely to be harder to design derivatives compared to the old gen.

It's not harder, because it is in fact very scalable. GPCs are basically copy-pasted from each other and can be disabled, just like SMs within GPCs can be disabled too.

Erinyes said:
Ok so "GF104 taped out, shipping in Q2" and "GF108 taped out, shipping in Q3" are the many rumours?

And how do you define "many rumors" then ? 24/7 news of the evolution of the chip ? :rolleyes:

Then I'm sorry to say that no chip related rumors were ever "many rumors"...

Erinyes said:
Im just pointing out the general trend wrt derivatives which has everything to do with GF100's derivatives. If you actually read the part of my post you would have seen i was talking about the derivatives and not GF100.

And so was I. If GF100, which is the father of all the Fermi based chips in the future, is highly scalable, derivatives will have no problems in appearing.

Erinyes said:
1)Seriously? Then please tell us why wasnt GT200 really that scalable?

The addition of the DP units (which most agree was rushed for market reasons) made it just too big. The only option would be to disable those completely in derivatives, but the chip wouldn't be competitive enough with G92 to justify the cost of R&D of a new chip, which leads me to 2).

Erinyes said:
2)True that part i agree with, G92 and G94 along with their 55nm shrinks were doing very well and really didnt need replacement.

There you go.
 
Would definitely be more fun to get benchmarks earlier, even if waiting to Monday would, in the end, lead to more careful analyses.

I think folks have had the card for a good while now. I'm gonna lean on hardware.fr for the synthetic testing and analysis and there are a few sites that consistently deliver good benchmark reviews with attention to detail. I hope people cover the claimed transparency AA support in CSAA. It would be huge for me if that works with minimal perf hit.
 
I think folks have had the card for a good while now. I'm gonna lean on hardware.fr for the synthetic testing and analysis and there are a few sites that consistently deliver good benchmark reviews with attention to detail. I hope people cover the claimed transparency AA support in CSAA. It would be huge for me if that works with minimal perf hit.

Heh, Rys's been awfully quiet for a while now. I'm guessing he has his GTX 480 for some time now as well :)
 
I think folks have had the card for a good while now. I'm gonna lean on hardware.fr for the synthetic testing and analysis and there are a few sites that consistently deliver good benchmark reviews with attention to detail. I hope people cover the claimed transparency AA support in CSAA. It would be huge for me if that works with minimal perf hit.

Speaking of which, Marc says there should be 200 GTX 470s and 200 GTX 480s at launch in France. Yes, you read that right, that's four hundred total.

The population in France is about 65 million, just for reference. Perhaps that means we can expect (310/65) × 400 = about 1900 cards in the US...
 
Cypress is very fast at global atomics, but *much* faster at local atomics. I am surprised you don't get any benefit from using the LDS on the 5850

Sorry, I didn't make it clear. The second experiment works on 16x larger data set, so actually using local atomics is 16x faster :)

Since Cypress has larger local memory, I think it may benefit for more splits (16 banks maybe?).
 
What ? I'll be waiting to see you explain that part where all the SMs have to communicate with each other and how that hinders the process of disabling SMs in GPCs, to scale down the design...

You do know that since the geometry is distributed all over the chip, that each polymorpgh engine has its own private communication channel to the others? Do you think that it is very easy to do? Im not saying its extremely hard, just that its not any easier to do than in earlier architectures and probably even harder.

Uh...you did ?

Again if you did read my post clearly i said Fermi is probably harder to scale down compared to earlier chips, i never said it isnt scalable :rolleyes:

It's not harder, because it is in fact very scalable. GPCs are basically copy-pasted from each other and can be disabled, just like SMs within GPCs can be disabled too.

Again i repeat that same question i asked, if it was so easy why havent we seen any fermi derivatives till now? I gave you the example with G80 where we saw derivatives in six months. We're now 5 months i guess after the time GF100 should have come out(im hypothesising Nov 09). If it was easier we should have seen a derivative out by now. Period.

Also if you think that in chip design its all very easy and just copy pasting one thing to another, then why dont we see all chips of an architecture out at once?

And so was I. If GF100, which is the father of all the Fermi based chips in the future, is highly scalable, derivatives will have no problems in appearing.

heh, except for the last line your post had hardly anything to do with GF100 derivatives. And again why do you keep on harping about the fact that fermi is Highly scalable?

The addition of the DP units (which most agree was rushed for market reasons) made it just too big. The only option would be to disable those completely in derivatives, but the chip wouldn't be competitive enough with G92 to justify the cost of R&D of a new chip, which leads me to 2).

If there were derivatives they wouldnt have DP anyway. Even in the case of ATI only their high end chip had DP enabled. GT200 didnt bring anything new to the table(i know there were minor architectural differences) which is why they didnt need any new derivatives and could make do with G9x
 
I guess engineers would have it a LOT easier if creating an entire family of GPUs from one top dog it truly would be an as simple task as just copy/paste. No IHV has a magic wand in that regard.

AMD was clever enough to develop Cypress and Juniper in parallel but as far as I understand things that's a design decision you either make from the beginning or in a very early development stage.

NVIDIA today obviously hasn't had any parallel development for any other GF10x family members and it seems like they have to get the top dog first out the door before they can start with the smaller derivatives.

In all that I would also factor in any possible TSMC supply constraints which according to TSMC's claims we know that it'll really take off in the third quarter this year.
 
Well at least they'll be able to say that they sold out of cards at release and there's demand ;)
 
Oh, neat. The limitations suggest that they aren't doing what I suggested, so do you think they lengthened the ALU pipeline? I can't imagine that Cypress can do A*B*C with 8 cycles latency.
A MUL is supposed to be do-able in 3 cycles.

I now suspect that MULADD is done in a few cycles with breathing room for the DOT and sequential stuff. The little ASCII art diagram I did here:

http://forum.beyond3d.com/showthread.php?p=1398243#post1398243

is subtly wrong, i.e. I think it should show MULADD as a single operation (it implements either MUL, ADD or MULADD, basically). The final set of ADDs in each lane are the dot-product adders, and so would be retained. They're basically non-IEEE.

I suppose lanes are staggered in their timing - I'm not sure which pairs of lanes can participate in _PREV. I wasn't aware of _PREV when I made that diagram, only INTERP_*, so was thinking everything was based on the DOT adder (which I also assume is the basis of double-precision).

I haven't really experiemented to find out if the compiler is using these things, yet.

Nothing is a major bottleneck right now, but it's still there, particularly during framerate minimums. I also think it might be a bigger problem in the future. I'd say faster backface and frustum culling would be a great first step if not setup.
Hopefully someone will do some in-depth testing here... I still think the absolute count of triangles (a miserly <2M) in the Heaven tessellation horror story indicates a serious misdirection. There's been no analysis of the difference with shadowing on or off, for example...

Jawed
 
Well at least they'll be able to say that they sold out of cards at release and there's demand ;)

Which would be the case whether it's 400 or 4000. Only difference is that the former is a far more nutritious nugget for the FUD monster.
 
I heard the same numbers for other countries, but in a 2:1 ratio for 470 versus 480. basically 50 cards per brand per country, for those kinds of markets.

Which seems more consistent with GF100's crappy yields. I'm not sure why it is (or should be) different for France...
 
Back
Top