NVIDIA GF100 & Friends speculation

Now where have I heard this line about pleasant surprises before? Another odd similarity, it came from Greece back then too:p

With the hw specifications and rumored performance of the GTX470, the also rumored $299 price makes more sense than ever. In any case Greeks attacked Troy because "four" wasn't accessible at the time. Now let's see if you can find the answer to that riddle that easily :p
 
My guess is that GTX470 will have similar performance to 5850 but worse power and heat and GTX480 will be similar to 5870 with the same power/heat problems. It won't be until Nv can get proper performance silicon out with higher yields and better heat characteristics that this will be worth owning.

So your guess is the worst possible one for the GTX 400 series ?
Given the difference between GTX 285 and the new HD 5850 and HD 5870, you are implying that Fermi based GeForces won't be much faster than the GTX 200 series...At best, a 30-35% increase over a GTX 285...
The GTX 470 being equal to the HD 5850 is a total disaster, because it would barely be 10% faster than the GTX 285. You are also implying that the major architectural changes have absolutely no advantage, which is also funny...

It seems to me that many are taking that $299 price rumor for the GTX 470 too seriously. I'm personally betting on $399, with the GTX 470 being a bit faster than the HD 5870.

NathansFortune said:
The key for Nv is to not let the pain of Fermi go to waste. They should have been designing the successor for at least a year by now and it should be ready to launch within a year at least (or less preferably). Fermi looks like it will scale upwards quite well so the next gen for Nv could be interesting. I will be sitting the gen out entirely, 5870 is all to difficult to get my hands onto at a decentish price and I fear the same fate with the GTX480. I mean I'm still running my venerable 8800GTX and it plays SupCom pretty well!

A year is too much. We will be hearing about the next iteration of Fermi much sooner than that. October/November 2010 is my guess.
 
It seems to me that many are taking that $299 price rumor for the GTX 470 too seriously. I'm personally betting on $399, with the GTX 470 being a bit faster than the HD 5870.

Agreed. Making the GTX470 (and therefore the 480) faster than the 5870 must have been their absolute minimum goal in the past 6 months. This way they can price it slightly higher than the 5870 and can claim performance leadership. I don't think they will start a price war with AMD, because that seems like a war they can never win.
 
You seem to misunderstand me.

I'm trying to say that short term Nvidia have got too many problems with heat and power to get any real performance gains out of the new architecture. Once they respin and get the power ratings down (or clock speeds up) then the new architecture will show it's strength. This sadly is still 3-6 months away. So I think GTX470b will be very good, but by then one would hope that Nv are working on 32/28nm for GTX475/485.

I think Fermi has a lot of mileage in it which is why I said up to a year for a successor. I just can't see Nvidia having the ability to execute, they have shown complete lack of ability with the last two generations and nothing serious has changed internally to suggest otherwise.

The best thing we can hope for from Nv is 28nm GTX485 with 512 working SP before the end of the year is out. I think that chip will be worth owning...
 
I wouldn't believe anything about Fermi until Anand or some other unbiased person has it benchmarked and has pubishable pricing info.

Anything that we hear now about Fermi either being the best thing since sliced bread or, indeed the end of mankind as we know it is largely going to be fluff. My guess is that GTX470 will have similar performance to 5850 but worse power and heat and GTX480 will be similar to 5870 with the same power/heat problems. It won't be until Nv can get proper performance silicon out with higher yields and better heat characteristics that this will be worth owning.

The key for Nv is to not let the pain of Fermi go to waste. They should have been designing the successor for at least a year by now and it should be ready to launch within a year at least (or less preferably). Fermi looks like it will scale upwards quite well so the next gen for Nv could be interesting. I will be sitting the gen out entirely, 5870 is all to difficult to get my hands onto at a decentish price and I fear the same fate with the GTX480. I mean I'm still running my venerable 8800GTX and it plays SupCom pretty well!

Anand unbiased ... :runaway:
 
Shouldn't it be 27bx27b multipliers?

I don't follow. The source operands are limited to 24 bits of mantissa precision, so why would one make the multiplier array wider? If this is related to rounding, doesn't that just involve examining the 24 + X MSBs of the 48 bit result (where X is some number that I can't be bothered to figure out)?
 
AFAICT, only the fetch and decode is being reused between the two cores. Each core had it's own 128 bit FPU in barcelona. They just have been put together in bulldozer.

As far as I understand Vol. 6 from AMD and the talk in sites and pages it appears that if both "left" and "right" thread of a module use FP-calculations the FPU behaves as 2x 128bit wide SIMD FPUs which goes twice over the YMM registers (like on K7 the 64bit FPUs did go twice over XMM registers). In case that only one side of the module uses FP-calculations in the current time-slice, it can occupy the entire FPU doing a full 256bit SIMD on a YMM register.

How flexible that load-balancing really is, has to be seen, but with some afford I can imagine they can load-balance on instruction-granularity.

Intel suggested a real strange mode-switch between 128bit and 256bit SIMD instructions, similar to the EMMS-switch between "regular" FPU and SIMD FPU. Basically they did that for not storing the entire YMM register-state on task-switches or something like that. I can't really follow the reasoning, but hey they're suppose to be the smart guys, let them be smart.

I can't find any mentioning of AMD following the lead, but such an instruction allows basically compiler/os emitted explicit FPU-split/merge (besides all the other nastiness which doesn't make it worth IMHO).
 
Time to crush the half rate 32-bit integer nonsense. According to the Cuda 3.0 Programming Guide, Section 5.4.1, table 5.1 32 bit integer multiplication is the same throughput as floating point mulitplication for compute 2.0 devices, i.e. Fermi. The half rate stuff is dead wrong.
32 bit integer multiply only produces the low 32 bits. If you want the high 32 bits:

__mulhi(x,y) computes the product of the integer parameters x and y and delivers the 32 most significant bits of the 64-bit result.


So 32-bit integer mul is half rate if you want all 64 bits. the mul24 function can only produce the low 32 bits, so it isn't a real 24-bit multiplier.

As for 24 bit integer multiplication, the reason it's slower is that there's probably no dedicated instruction (after all, if you have full speed 32 bit, there's no real reason for wasting op code space to include 24 bit...) for it on 2.0 hardware, meaning that it has to be done in software, using bitmasking to ensure correctness. This adds at least one additional instruction, thus making 24 bit slower on 2.0 hardware.
Both operands have to be masked to 24-bit, so it should be 3 instructions, or only 2 if squaring.

Jawed
 
Agreed. Making the GTX470 (and therefore the 480) faster than the 5870 must have been their absolute minimum goal in the past 6 months. This way they can price it slightly higher than the 5870 and can claim performance leadership. I don't think they will start a price war with AMD, because that seems like a war they can never win.

And what will AMD do?
And nVidia was the winner of the last war. ;)
 
Perhaops NV learnt something from G200. It makes no sense to launch with a high price just to be forced down by the competition days after the launch.

Maybe the card simpy sucks.
 
I don't follow. The source operands are limited to 24 bits of mantissa precision, so why would one make the multiplier array wider? If this is related to rounding, doesn't that just involve examining the 24 + X MSBs of the 48 bit result (where X is some number that I can't be bothered to figure out)?
I attempted to describe ATI's double-precision implementation here:

http://forum.beyond3d.com/showthread.php?p=1142400#post1142400

Any improvement on that welcome.

Jawed
 
Last edited by a moderator:
Rumor from Chinese sites:
GTX470 3Dmark Vantage Performance=167xx, Extreme=73xx
http://www.enet.com.cn/article/2010/0221/A20100221612409.shtml

Google translate:
"I Dream of language here predict the performance under the GTX470 and any similarity is purely coincidental, I am not responsible for, and refused to answer any questions. "
Bing translate:
"Due to the writer here dream language performance next GTX470 and is subject to the same thing happened, a coincidence, I do not accept any responsibilities for, and refused to answer any questions. "

And the title is "Performance Prediction".

So pure guesswork, or bad translation?
 
If true, although its a synthetic benchmark, puts the GTX 470 more than 10% slower than the 5870. I guess thats the first real performance indication.

Edit: I guess that puts it about neck and neck with the 5850 though, assuming they can price it the same.

Same thing happened back when the 2900XT launched. In 06 it was faster by quite a bit than the 8800GTS but in games it was equal to or slower than the 8800GTS.
 
Well, appears really weird and unnecessarily complex and counterproductive, from a naive math point of view. :)
mul24 in G80/GT200 is done on the multifunction interpolator ALU, I believe. Since this ALU is doing other fancy stuff, the added cost of int24 mul was a non-issue.

The other issue with signed 23bit mul is, of course, the sign, which I forgot. The masking I referred to earlier isn't enough when feeding a 32-bit multiplier, as the sign is in the wrong bit.

I think when ATI was on their shrink-on-die-logic-at-all-costs route with R700 (I think), they removed a huge amount of redundancy in the logics, I would guess a great deal of logic has been just slightly generalized to quit several other units.
The die photo shows something that looks very much like a 17th redundant x,y,z,w,t ALU set. I think the gains there probably had a lot to do with the library being used. Additionally there may be a small benefit from having the TU dedicated, unlike in R6xx GPUs - I'm guessing this reduces some intra-SIMD complexity because of routing data in-to/out-of the distributed register files (the register files are located close to the ALUs as register files in ATI are dedicated to the x,y,z,w,t ALU set).

A similar re-use approach is visible in Bulldozer, the ability to split the FPU in two is really great. I also saw somewhere a schematics of the FMA block where the single FMA-unit could be split into two, an independent MUL and an ADD, with very little logic overhead.
http://developer.amd.com/gpu/ATIStr...een-Family_ISA_Instructions_and_Microcode.pdf

Take a look at the INTERP_XY instruction :D

Though it's not that I don't understand sacrifice and consense and finding-the-middle in the context of GPUs ... ah and getting rid of exactly those past decisions which accumulated year over year ...
I expect NVidia decided that the pain for developers of dropping the high-speed mul24 was warranted. A one time thing, and they've been warned since 2007.

I understand this means, FP-"exceptions" raise only flags and you can query those "por gusto", correct?
I don't think there are flags - the programmer has to test the result of the instruction that might have generated the exception.

Jawed
 
Google translate:
"I Dream of language here predict the performance under the GTX470 and any similarity is purely coincidental, I am not responsible for, and refused to answer any questions. "
Bing translate:
"Due to the writer here dream language performance next GTX470 and is subject to the same thing happened, a coincidence, I do not accept any responsibilities for, and refused to answer any questions. "

And the title is "Performance Prediction".

So pure guesswork, or bad translation?

Translation is correct,but enet is a reliable chinese site,so it must be based on anonymous information,not pure guesswork.
 
Again mayn thanks for you very interesting insight in what is going on behind the scences. I hope you will inform us quickly if the Tape-Out has happened and if the chip was dead or not and how many re-spins they need. It is always nice to get first hand unbiased information.

I said in this thread about 2 weeks ago there WILL be another tapeout with a B1 stepping. And that it was being sent out in march to TSMC to start the silicon process with it.
 
Nothing has taped out, nor has any prep been made to do so, so tapeouts are unlikely to be imminent. That puts things at 6 months or so out minimum.

-Charlie


LOL, only in the world of disinformation or of someone that just likes to make things up, does that make any sense...

Do you really expect anyone to believe that "no prep has been made" for other Fermi based parts, other than the high-end ?

As an update, I am hearing, totally unconfirmed so far, that there may be one other tapeout either done or pending. I am far from 100% on this one though.

-Charlie

Funny it only took one post and a day and suddenly things change 180 degrees. And it was Silus who would have though(t).
 
Last edited by a moderator:
Back
Top