NVIDIA GF100 & Friends speculation

*ARGL*

Of course and YES!!!
But # of VRM phases and power routing for a dual-gpu-solution should be a bit more difficult than for a single-gpu board.
 
And the generation after Fermi might do the very same to AMD's generation at the time...do you notice a trend by any chance? Why would I as a consumer care what each IHVs roadmap
Evergreen is a fully compliant DX11 GPU family and yes there are very good chances that future DX11 architectures from AMD will be far more efficient. But then again it's not coming that soon either..
.
.
Apart from funky marketing material show me where I can buy a GF100 and judge myself how it compares to X or Y. Until I see a number of independent tests/comparisons on final available products I won't dare to jump to any conclusions.

Yes, it is an andless cycle, just this time around ATI seems to be in the leading position. They have a product in the market know, but jave not revealed their "real" DX11 Design.

And no I won't care either directly as a consumer if IHV A has higher manufacturing costs then IHV B. What I'll rate personally is the price/performance ratio in strict combination with image quality and if the balance of all those factors is worth it I might even close an eye in terms of power consumption assuming it's not over Hemlock's power envelope. As for final MSRPs since I haven't seen anything yet related for GF100's you obviously know something definite we don't know yet.

I too do not care if the vendor makes money, but I care a lot about power consumption and other features.. The real launch will tell us if Germi is indeed a failure or not.
 
Hmm. Well I was cautiously optimistic when the initial Fermi information came out a couple of days ago, but with a bit more thought, I am less so. Really all we've had is a couple of canned benchmarks and PR material. Without all the other information such as independent gaming benchmarks, clocks, power, noise, heat, price, etc, there isn't really enough information to form an opinion on the product as a whole. Fermi could be ten times faster than the competition, but if it sounds like a jet engine and costs ten times the price, it won't be a viable mainstream product, even for the gaming high end. I agree wholeheartedly with Rhys - we still don't know enough, and what we do know is just from Nvidia marketing with their expected bias and spin.

Fermi is still several months away, and the ATI refresh will be straight on top of it, no doubt with price cuts on their current cards. This will change the context of Fermi as a product by the time it finally arrives in the market. When Fermi finally arrives in that new competitive landscape, and we actually know what it is beyond the current constrained PR spin, we can actually decide if it's any good, and more or less desirable than competitor offereings.

I suspect that even if Fermi wins the battle at the high end, ATI will win the war with their better yields, smaller dies, more latitude for price cuts, power/heat envelopes, and a full top to bottom DX11 range. I think Nvidia may not make much money if it has to cut prices, and may not sell many units if they cost so much more than an ATI product that offers nearly all the same gaming performance for significantly less money.

Also, more interesting will be what happens at the end of the year, with a Fermi re-spin to give us a full 512 SP product at better clockspeeds, but it may well be facing the R9xx - possibly on a 32nm Gobal Foundries process? I think the next 12 months will be very interesting, with frantic competition and some very interesting new products at great prices for gamers.
Well, I wouldn't go so far as to say that we can't conclude anything about GF100. There are a couple of immediate benefits to image quality (with coverage AA and otherwise improved anti-aliasing), and there's obviously a strong focus on geometry and PhysX performance with this part. As far as I can tell, anisotropic filtering is still an unknown, but we can expect it to be no worse than nVidia's current parts, which are quite good.

Then I'd also like to point out that being late on this part isn't necessarily as bad for performance to be late as we might naively think. If, say, portion of the chip X is what holds the whole thing back from final production for a few more months, then those working on parts A, B, and C have more time to further refine their designs as well. The real loss, I think, is in the lost time that nVidia had to sell their product. So if Anand is correct, and the primary reason for the delay is getting the out-of-order execution of the geometry units correct, then what we will be getting we should expect to be a faster part than what nVidia would have put out if there was no delay. As of right now, I think we should have a right to expect the GF100 to be competitive with ATI's refresh parts (at least at the high end). If it is only competitive with ATI's current parts, then nVidia will be in serious trouble.

Whether or not it will be competitive with ATI's refresh DX11 parts, of course, is still completely up in the air. ATI does have the advantage that they've actually produced a working part and have had time to much more extensively examine where they can improve their architecture's performance. It will be interesting in any case.

As an aside, though, I'd just like to throw out there that from what I've read about the architecture so far, it seems that nVidia has taken a complete about-face on the implementation of new features: it looks like they've gone all-out to make sure that tessellation will be extremely useful in the GF100. But we'll have to see how that pans out once the part hits the streets.
 
Due to the fact that Fermi is hardly able to veat an evolutionary ATI chip, I have little confidence for the future of that line. Especially if you consider the production problems and the TDW, which means one GPU from Fermi is more expensive than 2 RV870s. More expensive for NV and more expensive for the user, while lacking eyefinity.
Are you saying you don't expect Fermi to even be competitive, i.e. less than ~30% over Cypress?

Yep, an entirely reasonable way to look at things.

It's worth noting, though, that major DX version inflections have created serious problems in time to market, DX9 gave NVidia grief, D3D10 ATI and now D3D11...
So you blame DX11 for NV delay? Have the performance gains of the past (again time-wise) in architectures that didn't increment DX# been smaller than otherwise or has it only/mostly been a timing problem?

Fermi is still several months away, and the ATI refresh will be straight on top of it, no doubt with price cuts on their current cards. This will change the context of Fermi as a product by the time it finally arrives in the market. When Fermi finally arrives in that new competitive landscape, and we actually know what it is beyond the current constrained PR spin, we can actually decide if it's any good, and more or less desirable than competitor offereings.
Looking at the previous generation, unless I'm mistaken, ATI only gained ~10% or so with their refresh. The time between them was around 10 months. Do you expect them to deliver better results faster this time around?
 
And thats not really the case.
I would tend to expect that power regulation on two chips would actually be a bit easier, as if you have two chips rendering different things, the variations in power draw between the two of them are likely to be relatively uncorrelated, leading to an overall smoothing out of the total power that needs to be supplied. The only thing that would obviously make it more difficult, in my mind, is the larger total power consumption.
 
Are you saying you don't expect Fermi to even be competitive, i.e. less than ~30% over Cypress?

I would hope Fermi is competitive and noticeably faster than Cypress. We're talking about a new architecture, quite a large die size and increased cost for Nv, apparently massive power requirements, and what looks to be over $500 consumer price. Otherwise we'd be looking at their version of R600. It would be stupid if Fermi wasn't the new single GPU leader.
 
And thats not really the case.
If you say so, then I'll rest my case in this regard.

I would tend to expect that power regulation on two chips would actually be a bit easier, as if you have two chips rendering different things, the variations in power draw between the two of them are likely to be relatively uncorrelated, leading to an overall smoothing out of the total power that needs to be supplied. The only thing that would obviously make it more difficult, in my mind, is the larger total power consumption.

And I would have assumed that each GPU get's fed independetly by it's own circuitry - which I guessed would have made sense for 2D-, GUI and Video from a power consumption perspective. But as it seems, Dave has a different opinion there and since he's the one sitting closer to the IHV… Hope he isn't biased wrt to Dual-GPUs though.
 
Looking at the previous generation, unless I'm mistaken, ATI only gained ~10% or so with their refresh. The time between them was around 10 months. Do you expect them to deliver better results faster this time around?

And with Cypress they only archived 50% more performance over rv790.
 
The difference aligns with turning off 2 SMs either both in the same GPC or 1 in two different GPCs. But then again, we all know how *in*accurate some of nvidias descriptions and diagrams have been in the past...

For all we know the raster and setup units are right next to each other and output to a LL queue that the SM read from and the grouping they are showing doesn't really exist. Hopefully people still remember then trying to pass off x8 simd as 8 seperate cores, right?

Eyeballing the Fermi die shot, if I divide the cores into groups of 4, there are areas of silicon aligned at the midline of each group in that messy center area that look pretty much identical at each of those 4 midlines.

That might be where the quadrant-based raster hardware is.
 
And I would have assumed that each GPU get's fed independetly by it's own circuitry - which I guessed would have made sense for 2D-, GUI and Video from a power consumption perspective. But as it seems, Dave has a different opinion there and since he's the one sitting closer to the IHV… Hope he isn't biased wrt to Dual-GPUs though.
Well, to me this is all down to a bit of electrical engineering. When you design a dual-GPU board, you have a choice to make: should I build one single, large power regulation node and distribute that to all components requiring power? Or should I divide the power distribution between the different components?

Bear in mind, by the way, that there's a lot more on the board that requires power than just the GPU's. The memory, in particular, needs quite a bit of power. So even your single-GPU board is distributing a lot of power among a number of very different chips.

Now, there are two major drives here when distributing power:
1. You want to keep the voltage nearly constant, no matter the power draw.
2. You want to keep the cost down as much as possible.

Both of these points, it seems to me, are likely to be improved by using a single large power regulation setup than dividing the system up: any divided system would require at least some unnecessarily duplicated circuitry. And when you just have the one node, you have the added benefit of capitalizing on the fact that the different chips are likely to be at least somewhat uncorrelated in their power draw: if you're drawing, say, an extra 10 Watts on chip 1, but chip 2 requires 5 fewer Watts for a couple of seconds, then the circuitry for power regulation only has to deal with a 5 Watt jump in power, instead of dealing with both a 10-Watt jump and a 5-Watt jump.

Of course, all of this is very naive. I don't know in detail what the relative costs and benefits would be of the different choices. But from my rudimentary knowledge of electronics, it seems like having only one set of power regulation circuitry would almost invariably be preferable.
 
Yes, it is an andless cycle, just this time around ATI seems to be in the leading position. They have a product in the market know, but jave not revealed their "real" DX11 Design.

Ah so all the praise to AMD for being the first to market with DX11, now turns into "the real DX11 design is coming", after leaks suggest that Fermi is better in DX11 features than Cypress...Now that is funny :)
 
Ah so all the praise to AMD for being the first to market with DX11, now turns into "the real DX11 design is coming", after leaks suggest that Fermi is better in DX11 features than Cypress...Now that is funny :)
It's also very strange, given that ATI has traditionally not made very significant improvements on the parts they've released between major architectural changes.
 
As far as I can tell, anisotropic filtering is still an unknown, but we can expect it to be no worse than nVidia's current parts, which are quite good.
Are you sure of that? I think the hardware indeed very likely is no worse than previous parts. But things like brilinear filtering are adjustable by the driver. I don't doubt the texture units have improved in efficiency, but still you're looking at 4 tmus per 32 SPs, instead of 8 tmus per 16 SPs (g9x) or 8 tmus per 24 SPs (g2xx). Compared to g92, that's only 1/4 the number of tmus per SP (ok a bit more since the clock has improved a bit relative to shader clock), even considering efficiency that will be a lot less texturing throughput per flop. Hence the incentive to cheat a bit is probably much larger, should texturing turn out to be a bit limiting in some apps... Not saying it has to be, but I wouldn't be totally surprised if it suddenly showed similar artifacts from undersampling as do AMD's parts...
 
Are you sure of that? I think the hardware indeed very likely is no worse than previous parts. But things like brilinear filtering are adjustable by the driver. I don't doubt the texture units have improved in efficiency, but still you're looking at 4 tmus per 32 SPs, instead of 8 tmus per 16 SPs (g9x) or 8 tmus per 24 SPs (g2xx). Compared to g92, that's only 1/4 the number of tmus per SP (ok a bit more since the clock has improved a bit relative to shader clock), even considering efficiency that will be a lot less texturing throughput per flop. Hence the incentive to cheat a bit is probably much larger, should texturing turn out to be a bit limiting in some apps... Not saying it has to be, but I wouldn't be totally surprised if it suddenly showed similar artifacts from undersampling as do AMD's parts...
Well, it will be interesting to see, but I strongly suspect that as games become more shader-heavy, they are requiring many more flops per texture access than they did previously. So the only games that will really need a large number of texture units are old games that will run insanely fast anyway.
 
Well, it will be interesting to see, but I strongly suspect that as games become more shader-heavy, they are requiring many more flops per texture access than they did previously. So the only games that will really need a large number of texture units are old games that will run insanely fast anyway.
That'll make sense though I wonder how the situation really is with games currently used by reviewers.
The texturing rate isn't that low, in fact it is very comparable to what AMD has - AMD has 4 tmus for 80 alus, nvidia now 4 for 64 (clock-corrected) - factor in AMD doesn't get 100% utilization of their alus and it is very very similar.
 
and there's obviously a strong focus on geometry

I remain to be convinced based purely on a small part of a single benchmark supplied by Nvidia PR. I'll wait to see it running in a game and compared to competing hardware. After all, wasn't it Nvidia telling us just a few months back how DX11 tessellation wasn't that important?
 
I remain to be convinced based purely on a small part of a single benchmark supplied by Nvidia PR. I'll wait to see it running in a game and compared to competing hardware. After all, wasn't it Nvidia telling us just a few months back how DX11 tessellation wasn't that important?

No, this was xbitlabs and a lot of AMD fanboys.
 
Back
Top