NVIDIA Fermi: Architecture discussion

And yeah, I'm saying that Fermi will run a MADD as an FMA in graphics mode unless the programmer specifies otherwise (which implies programmer control) because they want the old computational accuracy.

I was wondering in what kind of visual artifacts / issues this could translate into, if any... :?:

Clocks and perf wise I'm bullish, absolutely (and I don't put much stock into the SC'09 data). I could speculate why, but why feed that monster. If I'm wrong then I'll wear an "I love Groo!" t-shirt for a week :LOL:

:cool: :LOL:

The HD 5970 is actually a bit faster than a SLI of GTX 285s...

I think that on average, at 1920x1200 a HD5970 is like 70% faster than a GTX285... Obviously using a uber-rez or AA8x will put GT200 into shame and make HD5970 even quicker than 2x GTX285. So what I'm saying is that I see as unlikely that a single Fermi will outperform by 20%-30% an HD5970 at 1920x1200 (with AA4x). :)
 
Last edited by a moderator:
I think that on average, at 1920x1200 a HD5970 is like 70% faster than a GTX285... Obviously using a uber-rez or AA8x will put GT200 into shame and make HD5970 even quicker than 2x GTX285. So what I'm saying is that I see as unlikely that a single Fermi will outperform by 20%-30% an HD5970 at 1920x1200 (with AA4x). :)

I don't follow that logic. Why is it that a given situation (8xAA) that favors HD5970 can't favor GF100 to the same extent if that specific shortcoming has been addressed? I'm not making a case for it beating HD5970 but it's pretty clear that there will be specific situations where GF100 outpaces GTX 285 by an above average margin for exactly those same reasons :)

http://www.anandtech.com/video/showdoc.aspx?i=3679&p=8

Does fairly well in Anand's set of benchmarks.

Yep, that's why selection matters. But even Anand's numbers show 285-SLI within a few frames of the 5970 more often than not. They also used older Nvidia drivers than TR did. I guess the bottom line is that if GF100 approaches 285-SLI it will be competing with Hemlock, not Cypress. And well Gemini or whatever it's called will be home free if it ever makes it to market.
 
I don't follow that logic. Why is it that a given situation (8xAA) that favors HD5970 can't favor GF100 to the same extent if that specific shortcoming has been addressed? I'm not making a case for it beating HD5970 but it's pretty clear that there will be specific situations where GF100 outpaces GTX 285 by an above average margin for exactly those same reasons :)

In fact I'm saying that AA8x gives R800 an edge against GT200, not against GF100 ('cause we still don't know anything about it). :smile:
What I'm saying is that we have already seen that the G80/G200 performance doesn't scale linearly with the shader count (especially if you're not increasing everything by the same amount). GF100 is not equal to 2x GT200: it has more than 2x shaders, but not 2x bandwidth or TMUs.
That's why I'm a bit skeptical, don't know if my point makes sense or not... ;)
 
What I'm saying is that we have already seen that the G80/G200 performance doesn't scale linearly with the shader count (especially if you're not increasing everything by the same amount). GF100 is not equal to 2x GT200: it has more than 2x shaders, but not 2x bandwidth or TMUs.
That's why I'm a bit skeptical, don't know if my point makes sense or not... ;)
Exactly the same is the case with Cypress more of it is 2x of RV770 is, but you won't see 2x the performance across the board as everything (including memory, bus interfaces, CPU, etc., etc.) isn't 2x. RV770 was 2x the performance of RV670 in more cases because it actually had 2.5x the engine in many places.

There are many points that can cause a bottleneck and you'll be limited by any one of those at any time.
 
Yep, that's why selection matters. But even Anand's numbers show 285-SLI within a few frames of the 5970 more often than not. They also used older Nvidia drivers than TR did. I guess the bottom line is that if GF100 approaches 285-SLI it will be competing with Hemlock, not Cypress. And well Gemini or whatever it's called will be home free if it ever makes it to market.
Well a few frames down at the 30FPS range is quite a lot. 3 of those cases the perf delta is more like the 20% range. Others are getting into diminishing returns through CPU limitation.
 
Yep, that's why selection matters. But even Anand's numbers show 285-SLI within a few frames of the 5970 more often than not. They also used older Nvidia drivers than TR did. I guess the bottom line is that if GF100 approaches 285-SLI it will be competing with Hemlock, not Cypress. And well Gemini or whatever it's called will be home free if it ever makes it to market.
Thats a pretty big if, isnt it? And thats not some corner case, that needs to be the average case which makes that if bigger. Honestly if it was 2x GTX285 and would have launched this September, it would've easily displace G80 in my book as Nvidia's best GPU. But if, would've, could've (CUDA?) are the words I cant find comfort in right now ..
 
Ahh, well, maybe you'd think differently if you'd seen the waves it made in the industry at the time. There was a lot of industry stuff going on that thankfully hasn't been repeated since.

Does this mean I should start raising up the spectre of ATI's old cards that flopped? The pathetic drivers of the past (talking 1999 era) and so on? Somehow it seems pointless, but if there is no statute of limitations and this stuff should be dredged up every time just think about discussions in 20 years. It will take a hundred pages of history to get to whatever meaningless point is being raised. :p
 
That's a nice article at TechReport from Rys, but I'm not buying into the "high-end G92" clocks, at least not for the shader clock. The announced specs for the compute card (that is, slightly higher power consumption but slightly lower shader clock than the predecessor tesla card) do not indicate that they'll be able to get significantly higher shader clock than gt200 based cards (unless they'd do something silly like require two 8 pin power connector). At 1700Mhz shader clock it would indeed be quite a monster chip, and I'm confident even at "only" 1500Mhz shader clock it should still beat a rv870 in its fastest configuration in 3d applications. Of course, being the monster chip it is, it really must beat rv870 by a considerable margin, otherwise it's epic fail.
 
That's why I'm a bit skeptical, don't know if my point makes sense or not... ;)

I'm skeptical too. That's why I raised the question about GF100's design reducing bandwidth requirements since the nominal bandwidth increase isn't going to be sufficient. But the point I was making is that beyond specs you have to factor in the potential effencies gained from addressing GT200's specific shortcomings and also the improvements brought by DX10.1/DX11 that AMD's parts currently enjoy.

Well a few frames down at the 30FPS range is quite a lot. 3 of those cases the perf delta is more like the 20% range. Others are getting into diminishing returns through CPU limitation.

Well I've never been a fan of comparing one unplayable result to another. Between Anand and TR's articles it's clear that HD5970 isn't sitting comfortably at the moment.
 
Well I've never been a fan of comparing one unplayable result to another. Between Anand and TR's articles it's clear that HD5970 isn't sitting comfortably at the moment.
It's the fastest graphics card money can buy, it'll comfortable by the time Fermi launches, that is atleast wrt GTX285 SLI as I cant and have no idea of Fermi's gaming performance like Rys. :p
 
Well I've never been a fan of comparing one unplayable result to another. Between Anand and TR's articles it's clear that HD5970 isn't sitting comfortably at the moment.
Are you only looking at the one benchmark? Everything else is in the 50-170FPS range!
 
Are you only looking at the one benchmark? Everything else is in the 50-170FPS range!


you always have to look a precentage and take the erronous benchs out, its 30-40% faster and theoretically anything more is not possible so the rest, is just out the window. Is that fair (anything outside of the theoretical limits something else is going on bottleneck or bug)? oh sorry was thinking about the gtx 295 if it was the sli 285 the % is less
 
That's a nice article at TechReport from Rys, but I'm not buying into the "high-end G92" clocks, at least not for the shader clock. The announced specs for the compute card (that is, slightly higher power consumption but slightly lower shader clock than the predecessor tesla card) do not indicate that they'll be able to get significantly higher shader clock than gt200 based cards (unless they'd do something silly like require two 8 pin power connector). At 1700Mhz shader clock it would indeed be quite a monster chip, and I'm confident even at "only" 1500Mhz shader clock it should still beat a rv870 in its fastest configuration in 3d applications. Of course, being the monster chip it is, it really must beat rv870 by a considerable margin, otherwise it's epic fail.

I was sceptical at first too, when Nvidia announced the FLOPS range and thus the likely shader-clocks of Tesla-Fermi, which aren't spectacular to say it positive. :)

But after thinking it over a bit, I suspect that Tesla will not be clocked nearly as high as they could be. Why? Many reasons.

First, Tesla is targeted more at supercomputers and clusters than even it's first generation (desk-side-SCs). In that space, you usually do not scale with clock speed but with number of cores or devices.

Second, the above makes for a very good yield-recovery scheme. At first, you could sell clock-wise underperforming (with respect to desktop-processors) chips in that market, later you can ramp up clock speed, but disable a SM or two in return - if you keep your GFLOPS the same. I don't think, that the SC market is as spec-avid as the enthusiast gamers, who "want a 512 bit bus" for example and not a given amount of bandwidth.

Third, it's more critical for this environment to ensure long-term stability and, even more important, building a reputation. Nvidia, as a newcomer in this market, needs to convince people that their processors are as good an alternative as other - you do that not only by boasting TFLOPS numbers, but also by ensuring stable operation. So you cannot push your cards to the utmost limits. At least i wouldn't do it.
 
Does this mean I should start raising up the spectre of ATI's old cards that flopped? The pathetic drivers of the past (talking 1999 era) and so on? Somehow it seems pointless, but if there is no statute of limitations and this stuff should be dredged up every time just think about discussions in 20 years. It will take a hundred pages of history to get to whatever meaningless point is being raised. :p

Depends on whether history is being repeated or not. Remember this tangent was raised after someone called this "another R600", where NV30 was more spectacular, more relevant, and as far as I can tell, following a closer pattern.

History is what it is, the facts are what happened. If Nvidia commits the same "crimes against the buying public" as they did the first time around, I don't think they should get a free pass on what they did before just because it was a few years back. The pattern of their behaviour as a company tells us something about them, and about what they are willing to do in the future.

It might not be of interest to you, and that's okay, but it might be of interest to others. It's the historical perspective as well as the physical timetables that had me calling bullshit on Nvidia's claim of "Fermi by Christmas" when Jensun was waving around the "finished Fermi" at his spoiler event in October.
 
Carsten,

If the Tesla track record so far would had been as you describe above I wouldn't have many 2nd thoughts about it. I saw a maximum ALU frequency in 1070 configs of 1.44GHz stated on their site and a 1.476GHz hot clock for the GTX285. Unless you convince me that today's factors for such configs didn't apply in the past but do today, any doubts I'm afraid are justified.

Last but not least a hot clock of 1700MHz isn't entirely impossible, but anyone who has doubts they aren't unjustified too. Any high complexity chip since G80 had far more mediocre hot clocks and despite there's a chance they made significant changes to the ALUs to allow higher frequencies this time there's not one "but" to answer but several.

I'm not saying that it is or isn't; merely that any doubts aren't unjustified.
 
Carsten,

If the Tesla track record so far would had been as you describe above I wouldn't have many 2nd thoughts about it. I saw a maximum ALU frequency in 1070 configs of 1.44GHz stated on their site and a 1.476GHz hot clock for the GTX285. Unless you convince me that today's factors for such configs didn't apply in the past but do today, any doubts I'm afraid are justified.

As i wrote already, i think this has to do with the stronger focus for Tesla cards in large-scale installations like clusters and/or supercomputers (depending on if you'd like to separate these two), whereas the first generation was (IMO more, but surely not entirely) targeted at workstations and smaller installations. Plus, DP-performance of the first generation, which always has to double as an incentive for new technology uses, was not so superior over traditional CPUs.

Because of these two factors, no big x-factor over CPUs and smaller-scale systems, you'd want to make the individual products as fast as possible.

Last but not least a hot clock of 1700MHz isn't entirely impossible, but anyone who has doubts they aren't unjustified too. Any high complexity chip since G80 had far more mediocre hot clocks and despite there's a chance they made significant changes to the ALUs to allow higher frequencies this time there's not one "but" to answer but several.
I am not into specific clock targets - realistically, i also doubt that we'll see anything above 1,4-1,5 GHz in the first installment. But what I in this case really do not think, is that we're continuing to see Teslas as the hottest clocked models as they were in the past - implying a hot clock of barely 1 GHz for Geforce-Fermi.

I'm not saying that it is or isn't; merely that any doubts aren't unjustified.
True, i am also not saying that my view is in parts or entirely correct - it's just that: my view (and those have been wrong in the past, too). :)
 
Agreed that clocks are not so important in very large systems, but trust me, if they could they would clock it higher...

He has a few valid points though; I personally tend to be a bit more conservative since the so far data isn't very convincing.
 
Back
Top