NVIDIA GF100 & Friends speculation

That'll make sense though I wonder how the situation really is with games currently used by reviewers.
The texturing rate isn't that low, in fact it is very comparable to what AMD has - AMD has 4 tmus for 80 alus, nvidia now 4 for 64 (clock-corrected) - factor in AMD doesn't get 100% utilization of their alus and it is very very similar.

Maybe, but the bar for Fermi isn't to be similiar to Cypress. Nvidia claims Fermi's texturing units can achieve 40-70% higher throughput than GT200's depending on the application. Assuming that's true, there isn't much cause for concern. It's the downmarket parts that might have bit more trouble as the discrepancy there would be larger.

That doesn't make sense to me, though. . . you'd think AMD fanboys would be touting tessellation months ago and downplaying it now in the face of PolyMorph. Not that I give a monkey's ass either way.

Oh, no doubt that's exactly what's happening. I believe the phenomenon he's referring to was the widespread reporting that Fermi won't have anything for graphics, it'll do tessellation (slowly) in software, etc etc. Can't blame them too much though, Nvidia did downplay the importance of DX11 a bit.
 
I remain to be convinced based purely on a small part of a single benchmark supplied by Nvidia PR. I'll wait to see it running in a game and compared to competing hardware. After all, wasn't it Nvidia telling us just a few months back how DX11 tessellation wasn't that important?


Geometry performance was a major focus of this new architecture, as much as you want to wait for "indpendent" views on that, they showed enough demos, synthethics, etc at CES and editor's days to really show the potential of the polymorph engine. The whole point of starting this off now was to show everyone they haven't forgotten about gaming, about a month away from seeing these cards in action.
 
Maybe, but the bar for Fermi isn't to be similiar to Cypress. Nvidia claims Fermi's texturing units can achieve 40-70% higher throughput than GT200's depending on the application. Assuming that's true, there isn't much cause for concern. It's the downmarket parts that might have bit more trouble as the discrepancy there would be larger.

The possibility seems high, that filtering runs at hot-clock, while adressing is half hot-clock. So we are looking at 1.4GHz on 44.8Gtex/s bilinear, trilinear and 2x bi-AF (G80 style). HD 5870 would be in comparison 68/34/34.

A half-GF100 with ~1.5GHz would be close to GTX 280/285 and HD 5850.
 
Geometry performance was a major focus of this new architecture, as much as you want to wait for "indpendent" views on that, they showed enough demos, synthethics, etc at CES and editor's days to really show the potential of the polymorph engine. The whole point of starting this off now was to show everyone they haven't forgotten about gaming, about a month away from seeing these cards in action.

I've come to be sceptical when it comes to a company's PR department showing "potential" of a new architecture. I'd rather see what a gamer sees in an actual game.
 
March? HAHAHA, that's what they WANT you to think. In Feb, it will be April, in March it will be May, in April - June, in May - July, in June - Aug, in July - Sep, in Aug - Oct, in Sep - Nov, and then they will finally release it in time for xmas and they're master plan will be fulfilled.

See its all really a bar bet that Jen-Hsun made that the power of Nvidia was so strong that they didn't even need to release a product to prevent ATI's success, just endless reveals.

No, it will be out in March. Quantities on the other hand will be a little sub-optimal, leading to curious availability unless you happen to be a member of the press that has towed the line faithfully recently.

-Charlie
 
I've come to be sceptical when it comes to a company's PR department showing "potential" of a new architecture. I'd rather see what a gamer sees in an actual game.


Side by side benchmarks to the press which would be a pretty bad thing to do, its either they lied about the benchmarks, which I doubt, or they did what they stated, which I would put my wieght on that when they acutally showed them. They could be demo's or short showings, but they did show what they wanted to show, it wasn't the end all be all yet, but we will see that shortly.
 
Yep, they wanted to highlight their advantage in tessellation heavy scenes. Dastardly, ain't it? :)

No, but it is quite telling. When a company is confident of their product, they don't have to cherry-pick best cases for 'sneak previews'. This tells me that the numbers we are 'seeing' won't be anything close to the real disparity between the cards.

To give you an example of this, when ATI was demoing 'hybrid' in late 2007 (December?), they brought us into a room, I think it was myself, PCPerspective, Hot Hardware, and a few others, and said, "Here are demo systems, have fun." and let us play around, measure what we wanted, and generally try to break things.

About a month later, in a post-CES tech day, NV demo'd their 'hybrid'. It was shown to press with a few demos, no one was allowed near the boxes, and no exact settings were disclosed. No questions were answered, and that was it.

Step forward a year, and you have to go to the power management settings on a macbook, change power management levels, then reboot, for 'hybrid' to work. Heh.

NV is putting their best foot forward, and it is better than Cypress on a few well chosen parts of benches, not the full benchmark, but not faster than Hemlock. They can't beat AMD's currently shipping products with something that is going to ship in, at best, very low quantities. The 'mainstream' parts which they are telling the AIBs they are going to get, are well below that. 60% up on a best case test, -12.5% for fused off shaders, -12.5% for downclocking (1600 -> 1400MHz), and you are at ~20% up for your best case.

Where will the less favorable things lie? Remember, this is against Cypress, not Hemlock. It is going to be close, if not an outright win for a stock Cypress against GF100, and any OC'd/5890/refresh will probably have an easy time with GF100. Hemlock is in another class.

There are too many knobs ATI can turn on power and die size. Nvidia is pretty much out of room on both, and barely competitive there.

This is the problem with putting your best case forward early, especially if you can't match it in the real world. NV has 'issues' now, they have to deliver, and I don't think they can make enough parts to do so.

-Charlie
 
I havent seen anyone call out Xbit, new low indeed.

They claimed it.

Nvidia: DirectX 11 Will Not Catalyze Sales of Graphics Cards.
DirectX 11 - Not Important
Nvidia believes that special-purpose software that relies on GPGPU technologies will drive people to upgrade their graphics processing units (GPUs), not advanced visual effects in future video games or increased raw performance of DirectX 11-compliant graphics processors.

But nVidia only said this:
“DirectX 11 by itself is not going be the defining reason to buy a new GPU. It will be one of the reasons. This is why Microsoft is in work with the industry to allow more freedom and more creativity in how you build content, which is always good, and the new features in DirectX 11 are going to allow people to do that. But that no longer is the only reason, we believe, consumers would want to invest in a GPU,” said Mike Hara, vice president of investor relations at Nvidia, at Deutsche Bank Securities Technology Conference on Wednesday.
http://www.xbitlabs.com/news/video/...ill_Not_Catalyze_Sales_of_Graphics_Cards.html
 
That doesn't make sense to me, though. . . you'd think AMD fanboys would be touting tessellation months ago and downplaying it now in the face of PolyMorph. Not that I give a monkey's ass either way.

Heh and that's exactly what's happening :)
DX11 and tessellation was "amazing" a couple of months ago. Now that leaks suggest that Fermi may be quite a bit faster in it, than Cypress, it's a gimmick and "ATI has yet to release the real DX11 design".

Anyway, Fudzilla is saying that mainstream parts based on Fermi, are not delayed and should be around for a June release:

http://www.fudzilla.com/content/view/17290/1/

And also that the dual GPU Fermi based GeForce, will be released in April:

http://www.fudzilla.com/content/view/17291/1/

I'm extremely interested in the mainstream parts, if as was discussed in the past, NVIDIA can simply disable GPCs and end up with chips with:

512 SPs, 384 SPs, 256 SPs and 128 SPs.

It would be extremely interesting to see a 128 SPs part in the mid-low end market. Could this part be roughly 1/4 in size of the die of GF100 (maybe a bit bigger like ~150 mm2) ?

On second thought, the 384 SPs chip may not be very practical, since it would probably be a considerably large die aswell (maybe ~400 mm2).

Obviously there may be parts with disabled "CUDA cores" in each GPC, especially in the low-end (I'm thinking 64 SPs and 32 SPs)
 
Last edited by a moderator:
I remain to be convinced based purely on a small part of a single benchmark supplied by Nvidia PR.
I'm not talking about the benchmarks. I'm talking about the architecture. The move to multiple parallel geometry execution units is a huge change, and is according to nVidia the entire reason the product was delayed. Of course, if all of that hard work didn't pay off for nVidia, it would really suck for them. But it is nevertheless entirely clear that geometry performance is the primary thing this video card is designed to have over its predecessors.

After all, wasn't it Nvidia telling us just a few months back how DX11 tessellation wasn't that important?
That was when they didn't have a product that did a good job at it. Now that they're touting that it's extremely important, this seems to indicate that they are very confident that this is where their GF100 truly shines in real performance.
 
Atleast Charlie is laying it all on the line. We're plenty close to the launch so a few things will have to come true:

1. Cypress and GF100 will trade benchmarks.
2. Hemlock will wipe the floor with GF100
3. Quantities will be very limited for the flagship model
4. The mainstream models will be quite a step down from the flagship model
 
No, but it is quite telling. When a company is confident of their product, they don't have to cherry-pick best cases for 'sneak previews'. This tells me that the numbers we are 'seeing' won't be anything close to the real disparity between the cards.

Nvidia used the term "up to" in a lot of their comments about performance which explicitly indicates they're referring to maximums and not averages. If your point is that we shouldn't expect the numbers Nvidia is showing to represent the average case then...ummm...duh?

I agree with Sontin, it is getting a bit silly now. If you have specific opinions on the technical shortcomings of Fermi I'm sure we'd all love to hear them but we could do without the pointless fluff.

Nvidia said:
DirectX 11 by itself is not going be the defining reason to buy a new GPU. It will be one of the reasons.

Thanks, so I guess that's the harmless quote that sparked the premise that Nvidia doesn't care about DX11? It's ironic really since they've shown that they care quite a lot and then some.
 
Maybe, but the bar for Fermi isn't to be similiar to Cypress. Nvidia claims Fermi's texturing units can achieve 40-70% higher throughput than GT200's depending on the application.
Yes, but keep in mind g92/gt200 never achieved close to theoretical rate (for whatever reason), not even in 3dmarks texture fill rate tests. So 40-70% probably just means they now indeed get to their theoretical rate (that would account for maybe 30%) plus larger l1 caches make them even more efficient. Maybe all that makes them more efficient than rv870 tmus, but I doubt it's much. Hence the actual alu:tex ratio is still similar for cypress and fermi.
AnarchX said:
The possibility seems high, that filtering runs at hot-clock, while adressing is half hot-clock.
I thought that was already debunked?
So we are looking at 1.4GHz on 44.8Gtex/s bilinear, trilinear and 2x bi-AF (G80 style). HD 5870 would be in comparison 68/34/34.
Yeah, IF filtering would run at hot clock that would indeed probably make any filtering cheats rather unnecessary.
Silus said:
'm extremely interested in the mainstream parts, if as was discussed in the past, NVIDIA can simply disable GPCs and end up with chips with:

512 SPs, 384 SPs, 256 SPs and 128 SPs.

It would be extremely interesting to see a 128 SPs part in the mid-low end market. Could this part be roughly 1/4 in size of the die of GF100 (maybe a bit bigger like ~150 mm2) ?
I think a bit more, if that's 128bit / 16 rops. Either way, that gets quite close to Juniper die size, and it doesn't look very competitive to me. More like a serious Redwood competitor (though for that it probably wouldn't need the 16 rops).
On second thought, the 384 SPs chip may not be very practical, since it would probably be a considerably large die aswell (maybe ~400 mm2).
And 384 SPs is also the "natural" salvage part of a 512 SP part (either disable a full GPC or one SM within each GPC). Still wondering how the 448 SP part deals with the asymmetries...
 
Regarding these raw silicon costs, it doesn't make sense to me that they would simply divide the # of dies by the wafer costs. Would they not realistically expect a higher proportion of the wafer costs be ascribed to the higher bin parts to keep their margins relatively stable throughout the range of bins?

In addition to this, with Hemlock (sorry Dave I did think about writing R800) they would also expect that the highest bin Cypress parts would also be ascribed the highest proportion of the per wafer costs?

I know at this point Frankenstein has a greater chance of a fully functional brain than a Fermi board but even so the higher bin chips are actually worth more so I don't understand why they aren't priced as such on an analysis.

The short story is that TSMC almost always sells wafers, and with some variation due to processing of those wafers (metal layer count for example), has fairly well known rates. Many companies tell me that 40nm wafers cost ~$5000 each.

If you get 2500 chips out of each wafer at 99% yield, you end up with 2475 chips for $5000. If you get 100 at 10% yield, you get 10 chips for $5000. Either way, you pay $5000, then depending on a lot of things, pay more for dicing, testing, binning/sorting, and packaging. TSMC can do some of this, the companies ordering the wafers can do it, or you can contract it out.

I have been to a few companies that have low volume dice, test and packaging facilities in house to save time and cost, mostly time. The short answer is that it depends on a lot of things, and how much you are willing to do yourself.

From there, the method of accounting is up to the individual company. Do you sell the bad ones as keychains at the company gift shop for $5000/#of die candidates, then write the inventory off at the end of a good quarter? Do you assign each good die a cost of $5000/#good chips? Do you split that by bins, and is that value arbitrary, based on what you can sell them for, or based on what the end user pays?

The answer is most likely all of the above, and even more likely involves sacrifices of many goats at midnight under the full moon in that back room of the accounting department. You can't say for sure unless people tell you what THEY are doing for that particular chip.

On top of that, is internal costing relevant? I think the more important question is what NV sells the parts for, and that has zero to do with cost.

-Charlie
 
Yes, but keep in mind g92/gt200 never achieved close to theoretical rate (for whatever reason), not even in 3dmarks texture fill rate tests.
These 3DM tests with pure bilinear filtering were bound by interpolation performance and GT200 saw here an increase through the additional 8SPs per TPC.
Just see the scaling of GT200 with higher quality filtering:
http://techreport.com/r.x/radeon-hd-5870/rm3d-filtering.gif
http://techreport.com/articles.x/17618/6

I thought that was already debunked?
Yeah, IF filtering would run at hot clock that would indeed probably make any filtering cheats rather unnecessary.
Nvidia told that TMUs are 1/2 hot clock but did not give more details.

On the other hand we have 256 L/S-units @ hot clock, app state buckets with gains up to 70% and the statement, that filtering-quality will not be reduced.

Also 22.4GTex/s tri/2x bi-AF seems like a unlikely increase over G80s 18.4GTex/s, after NV told several times that texturing is still important and the 2013-15 goal is around 1 TTex/s.
 
Last edited by a moderator:
Yes, but keep in mind g92/gt200 never achieved close to theoretical rate (for whatever reason), not even in 3dmarks texture fill rate tests. So 40-70% probably just means they now indeed get to their theoretical rate (that would account for maybe 30%) plus larger l1 caches make them even more efficient.
Fermi L1 texture cache is 12KB, just like GT200's.

Fermi has two L1s, one for textures and another that's dual-function L1/shared memory.

Jawed
 
Do you know that GF100 will be sold in professional and HPC markets where the margins are sky high and AMD has next to 0% market share, despite having great consumer products?

GF100 will make a lot of money for nv as r&d for mainstream consumer market has been paid off already, and *profits* in quadro and tesla markets are worth a LOT.

Question for you there Mr Beancounter. If NV can sell the GF100 parts into the professional market at large profits, and they eat a lot on each GF100/consumer variant, why make the consumer variants? If you are making products that you KNOW are going to be under water for their entire life, and will cost time, effort, advertising, driver dev costs, and other things on top of that, would it not be saner to just say "Fermi is a GPGPU card. We will have consumer cards out this fall, thank you, no questions."?

You can play the same games with any of their other parts, do you have to subsidize GF100s with Fermi compute card margins, or can you subsidize G240s with that money? Where do you draw the line? Cash is quite fungible you know, and accounting table entries are much more so.

The answer to your question is in the G200b line. NV stopped making them last fall, and gets very uncomfortable when asked about why production STILL isn't happening. The claims of 'teh huuuge demand' for Q4 were fine and dandy, but given an ~10 week lead time for silicon, by the time they knew in September, there was ample time to get wafers out for Christmas should they want to.

When that one became blindingly obvious to anyone even casually looking at the situation, the next quarter's CC had them saying, "we have wafer shortages", and that is true FOR 40NM ONLY. NV is really good at twisting words and splitting hairs. If you call TSMC, they will tell you that there are no shortages for 55nm wafer starts, you can run as many as you like with a phone call. There has not been any shortages either for the time period NV is claiming, their excuses are simply not true.

So why are there only trickles of G200 based parts, and why did the price go up? Why no EOLs officially? Because NV doesn't want to look bad for the financial weenies. They are desperately afraid that people will realize they have had no competitive parts since the Evergreen launch, and can't make the old ones at a profit. So, instead, they trickle out parts, 10 here, 10 there (literally according to some AIBs I talked to at CES), and end up with a 'still shipping' product. They haven't started any G200b wafers that I am aware of since Q3/2009. Sucks to be one of their 'partners', eh?

And aren't these heavily subsidized by current Tesla parts?

That brings us back to Fermi. NV has demonstrated that they will not make parts if they are under water, and that is a lone smart management move amidst a sea of astoundingly bad decisions. How many GF100s do you think they will make until yields go WAY up? Think ATI can price them underwater the whole way? Think they will?

Also, with TSMC claiming 40nm wafer shortages until mid-year, do you think they will make a GF100 wafer for, say, 20 or 30 money losing halo chips, or 450 GTS260Ms that sell for positive margins?

GF100 is screwed on economics, and nothing will fix that. If yields go up for NV due to TSMC's measures, they go up for ATI too. No win there. Short of a respin, it is a lost product.

Note to the fanbois: 72FPS in Far Cry 2 on the accounting charts, regardless of how many exclamation points follow it or how many awards it gets from third rate review sites, is not something that sways the SEC during an audit. Really.

-Charlie
 
Not only that, if what Fudzilla are reporting is true (mainstream GF100 parts in June) then Nvidia are clearly a lot further along with their fab process than Charlie likes to make out. Harvested GF100s will only be useful for GTX360s nothing below that.
 
The video leaks, someone found them on PCPrespective's website :LOL:, while they were preparing thier article.

No one in the press community has the cards in hand yet, they will soon though, and the benchmarks they have shown so far, is actually not thier "Best Advantages" in game situations.

I agree, but 'best advantages on synthetic benches without independent confirmation' is not a better way of messaging things.

-Charlie
 
Back
Top