NVIDIA Fermi: Architecture discussion

Are you seriously thinking this could compete against Cypress? 64 low speed TMU and 32 low speed ROP, along with slow rasterizer and at most 1/3rd of Cypress raw SP throughput is clearly not enough, even if we consider Cypress "bad" scaling is due to the GPU arch itself.

Yes I do believe it would be competitive with Cypress's salvage part (HD 5850), given the architectural differences and improvements over GT200, which could make this "half GF100" on par or a bit faster than the GTX 285, which competes with the HD 5850 right now, on most occasions.

Still, a while back, I revised my speculation of this chip's specs and added the same number of TMUs and ROPs as the GTX 285 (80 TMUs, 32 ROPs), which would make it a 320 SPs part. It should have absolutely no problems beating the HD 5850 across the board.
 
Thanks Mr.Dave for the remark , but I guess that was the whole point of Anand's article , to show that some games have native support for wider FOVs , the majority don't however , I didn't say that there is no way to support this , it can be done with driver optimizations of course , but that's AMD's job , accroding to Anand , that still needs more work .
No it is abslutely not a driver optimization path, this is entirely in the domain of the application. AMD's job is to evangelise the feature to the developers to expose the FOV in their app, and thats exactly what we are doing.
 
Yes I do believe it would be competitive with Cypress's salvage part (HD 5850), given the architectural differences and improvements over GT200, which could make this "half GF100" on par or a bit faster than the GTX 285, which competes with the HD 5850 right now, on most occasions.

Still, a while back, I revised my speculation of this chip's specs and added the same number of TMUs and ROPs as the GTX 285 (80 TMUs, 32 ROPs), which would make it a 320 SPs part. It should have absolutely no problems beating the HD 5850 across the board.

According to the latest ComputerBase article this GTX350 would need to be at least 20% faster than a GTX285 whilst being €50 cheaper to "compete" with HD5850 (ignoring the fact that the HD5850 sold for €180 at launch where it's €230 now.) Now I can see it compete on performance, but not on price anywhere soon. This benchmark set even includes the embarrassing B:AA numbers where the AMD cards are limited to 21fps, otherwise the performance gap would be close to 25% overall..
 
According to the latest ComputerBase article this GTX350 would need to be at least 20% faster than a GTX285 whilst being €50 cheaper to "compete" with HD5850 (ignoring the fact that the HD5850 sold for €180 at launch where it's €230 now.) Now I can see it compete on performance, but not on price anywhere soon. This benchmark set even includes the embarrassing B:AA numbers where the AMD cards are limited to 21fps, otherwise the performance gap would be close to 25% overall..

No, they don't add the B:AA number to the performance rating.
A GF104 would have 256SP, 64TMUs, 192bit - 256bit with GDDR5, 24ROPs-32 ROPs. With a high base clock (750MHz) it would have nearly the same fillrate.
 
Talking of derivatives... FWIW, I'm currently naively expecting NV's line-up to look something like this:

GF100: 480SP, 384-bit GDDR5 (no SKU with 512SP, some down to 416/448) [Q2]
GF102: 320SP, 256-bit GDDR5 [Q3]
GF104: 160SP, 128-bit GDDR5 [Q2]
GF106: 64 SP, 128-bit DDR3 [Q3]
GF108: 32 SP, 64-bit DDR3 [Q4]

Given how wrong I've been about every single of my family predictions in the last trillion years, I heavily suggest everyone to ignore this or even make a clear attempt to change their own predictions if they seem similar to mine :p

Hmm, lowend chip Q4? I guess that means a couple more renames would be needed for GT220 - err I meant GT315 :)
I find it interesting that your second fastest chip is beefed up compared to other speculation. Would make sense if it means it's competitive with rv870, though with that release date it's going to compete against a different part...
 
Well do we know that Cypress couldn't support a wider bus on its current die size? The general question still stands though as to whether throwing more of X unit at the problem actually results in tangible gains. Too bad it's not possible to disable individual SIMDs or ROPs like back in the day to do some sort of analysis on which units are superfluous.

Without seeing a die shot I couldn't say. I'm guessing from RV770 and assuming no major changes in I/O density (there were some claims that TSMC had improved this by some amount).

If the micrograph for RV770 is at least a decently close guide for the physical I/O perimeter, a little less than 3/4 of a ~260 mm2 chip's perimeter was devoted to GDDR5.
Cypress is still pretty square, so a modest increase in perimeter may not give much extra room if the pads are pretty close to what they've been previously.

The extra output for multimonitor support would take up some room too. While this doesn't help performance, it is a better product differentiator than some tens of percent additional bus width.

Fermi's wider bus width is a possible data point. If so, its rumored die size would be another.
 
I would say that more people will have the chance to experience ATi's deficiencies even more , as people pointed out , the software for Eyefinity just isn't there , it only supports some games , in the majority of games however , the picture gets stretched too far , enough to make all 3D pbjects look falt and ugly , compare that to the situation when Nvidia released their 3Dvision , and you will see the difference , Nvidia had a list of games compitabilities with a rating system , and they supported large number of old and new games .

Actually nvidia's 3D Vision has been available (and maintained) for more than ten years.
I first tried it on a geforce2 MX with anaglyphs.
 
Hmm, lowend chip Q4? I guess that means a couple more renames would be needed for GT220 - err I meant GT315 :)
A 32SP part seems unreasonable to me, as Fermi uses only about 15 sq. mm for each partition... I think it's way too low for a ~60 sq. mm part, and that's assuming the custom logic/L2 downscale well and, as they still need to fill 16 way SIMD, it's harder than with previous 8 way SIMD designs.
 
A micrograph would be interesting in that it would inform on why the package pinout is the way it is.
If we find X% of the perimeter is for memory, we could see how much more perimeter would be needed to have wider bus widths.
 
Is that true ? Fermi has two separate pipelines : Integer 32 and FP 32 , when doing DP , both pipelines are used FP32(8+24) + Int32 = FP64 (8+56) , which result in half the throughput of both pipelines , hence the ratio of 2:1 SP : DP , If so how saving the die area is possible ?

Guys I need confirmation whether this is true or wrong !

AFAIK today, Fermi has one SP and one DP (each 16 wide) block in each SM but only datapaths for two thirds of their combined consumers. So it can feed either (SP to SP and SP to DP) or (DP to DP and nothing to SP).
 
Talking of derivatives... FWIW, I'm currently naively expecting NV's line-up to look something like this:

GF100: 480SP, 384-bit GDDR5 (no SKU with 512SP, some down to 416/448) [Q2]
GF102: 320SP, 256-bit GDDR5 [Q3]
GF104: 160SP, 128-bit GDDR5 [Q2]
GF106: 64 SP, 128-bit DDR3 [Q3]
GF108: 32 SP, 64-bit DDR3 [Q4]

Given how wrong I've been about every single of my family predictions in the last trillion years, I heavily suggest everyone to ignore this or even make a clear attempt to change their own predictions if they seem similar to mine :p
Arun, I think Nvidia should hire you to do their product planning. Seriously. I have seen very apt guesstimates from you (even though they dont come true) and they would greatly benefit from the mess of a naming scheme, overlapping products etc.
 
AFAIK today, Fermi has one SP and one DP (each 16 wide) block in each SM but only datapaths for two thirds of their combined consumers. So it can feed either (SP to SP and SP to DP) or (DP to DP and nothing to SP).
I doubt that a bit as some beefed up operand collector could feed SP and DP subblocks simultaneously, even if the needed register bandwidth cannot be sustained in all cases. Real code doesn't exist exclusively out of FMAs with 3 source operands.

From the implementation efficiency I would favor two 16 ALU subblocks, which can be chained together with some additional circuitry to enable 16 DP results. Basically similar to what ATI does with the 4 ALUs in a VLIW, just that the nv units have beefier ALUs (most important the multiplier) to start with and one can get away by coupling only two of them.
 
The extra output for multimonitor support would take up some room too. While this doesn't help performance, it is a better product differentiator than some tens of percent additional bus width.
Thats not really the case. From a pin output perspective previous chips already supported a similar number of digital and anlogue output lanes. The primary difference is the number of display pipelines, which is internal to the chip.
 
Fermeye for the late try.

NVIDIA has their own "Eyefinity" for years, with their Quadro line. It should pretty straightforward to have it available for GeForces aswell.

Not that straightforward when it accompanies a new architecture and added on top of all the other problems being addressed. If, however, Fermi is not a 5870 beater, that late march~april time frame may well include a particular effort to include an eyefinity solution in an attempt to nullify at least one of the 5870 advantages. It is at least doable, in contrast to the heat/power/noise/cost disadvantages which are inherent in the chip design. Of course that would also necessitate some hardware modifications as there was no indication such support was originally planned for the card.

I reckon the saying 'up to their asses in alligators' would be a gross understatement in describing the situation across the entire Fermi hardware and software development teams.
 
Not that straightforward when it accompanies a new architecture and added on top of all the other problems being addressed. If, however, Fermi is not a 5870 beater, that late march~april time frame may well include a particular effort to include an eyefinity solution in an attempt to nullify at least one of the 5870 advantages. It is at least doable, in contrast to the heat/power/noise/cost disadvantages which are inherent in the chip design. Of course that would also necessitate some hardware modifications as there was no indication such support was originally planned for the card.

Noise is part of chip design now ? You are really reaching for straws now aren't you ?

And what are the "other" advantages the HD 5870 has over Fermi based GeForce ? You have no data to back that up, because there is no data to compare yet. And even if you mean power and maybe heat, well, let's look at RV770 vs GT200 where GT200 consumed less power @ idle. Let's wait and see before reaching such conclusions shall we ?
Unbless of course, you are trying to get the "lessthanaccurate.com" domain and that sort of conlusion is a pre-requisite :)

And no, being a new architecture doesn't hinder the efforts of adding the Quadro line feature, to GeForce cards. One thing has nothing to do with the other.

spigzone said:
I reckon the saying 'up to their asses in alligators' would be a gross understatement in describing the situation across the entire Fermi hardware and software development teams.

Despite your continuous efforts, everyone is still waiting for real numbers to reach such conclusions :)
 
@spigz - I really think you're grossly overestimating the draw of Eyefinity. I'm not seeing this unbridled lust that you describe or a rush to purchase triple monitor setups. Like Dave mentioned a few posts above, developers have to get on board and support it natively before it can even be considered a mainstream solution (and even then people have to be willing to shell out for 3 monitors). What I can guarantee you is that stuff like Eyefinity, 3D vision etc will always take a backseat to good old performance leadership.
 
I would say that more people will have the chance to experience ATi's deficiencies even more , as people pointed out , the software for Eyefinity just isn't there , it only supports some games , in the majority of games however , the picture gets stretched too far , enough to make all 3D pbjects look falt and ugly , compare that to the situation when Nvidia released their 3Dvision , and you will see the difference , Nvidia had a list of games compitabilities with a rating system , and they supported large number of old and new games .

And I wouldn't say that HD5870 can play every game out there maxed , it can't play Crysis or STALKER Clear Sky , or even Arma II , it has the same performance of GTX 295 , which is still defecient in those areas .

Of course the software isn't there yet. But in the larger picture it's where it's headed that's important. Moving forward to mid-june for example ... AMD will have had five driver release cycles to perfect it's game and eyefinity support across it's entire already released 5000 series. Eyefinity should be operating smoothly and have rapidly growing game support by then, and going forward in time as new games are released based on the 5000 series architecture and dx11 implementation (as that's the only dx11 hardware the developers had to work with for the last eight months ... and counting) the AMD driver situation will only improve as more and more games will have built in optimization for AMD's 5000 series cards, including eyefinity.

I said almost every game at 24" and below. At that, turn down the AA and AF and even those cards are covered.
 
I have a question for Rys , in an interview with Nvidia product manager he asked whether Fermi will have a dedicated hardware for tessellation , and they confirmed that , that dated back to Oct 07 .

http://forums.nvidia.com/index.php?showtopic=149550

But when Rys wrote his piece at techreport , he mentioned that he expects fermi to feature software tessellator , I sense he didn't trust Nvidia's statement , does he still beileve in that or what ?

in other words , will fermi have a hardware tessellator or not ?
 
Back
Top