NVIDIA GF100 & Friends speculation

Lot of wishful thinking there, even trini admits they've got a scaling problem with a Fermi derivatives.


I think its too early to say there is a scaling probem, in my experience with tesselation, mid range and lower end cards really won't have the horsepower overall anyways to do major amounts of tesselation. The problem will probably be solved with adaptive tesslation with some type of LOD so when using less power graphics cards tesselation amounts are going to be dropped, so Fermi derivitives might be well balanced. Again, thats just speculation, but its too earlier to really give a definitive answer.

Something that Charlie takes for granted, he doesn't think like a developer why does he come to conclusions on such matters without being informed or looking into it kills me but just ask Charlie, just ask!
 
You mean like they implemented on AvP? They did show that off on the Hornet you know, it is there, and it works. That said, you have two options for Tessellation:

1) Make it scalable and good enough on the high end chips, and have it scale down, and have the interconnects likely limit clock scaling.
2) Spend less net die area for a single unit that is consistent across your product line, and have the tessellation scale just like 1), but have tons of headroom, with less silicon used.

Now, if I were a dev, I would prefer to know that any device I use would have enough tesselation capabilities to do the job, and only have to worry about poly counts in general. The ATI way takes tessellation out of the loop to think about, it is always 'good enough'. With NV's method, it _MAY_ be insufficient, but that is unlikely on the higher end parts.

If you wanted to think like a dev, I would be thinking, "I have had Junipers in house since last May, but NV is promising me GF100s in early March. Do I wait on testing and optimizing DX11 features, or do I just get 10 month or so head start...."

-Charlie
 
They could always alter the number of SMs per GPC.

Correct. I'm not at all convinced that things on silicon are at all the way they are in their diagrams. It would make sense to have the geometry independent of the shaders and connect them via either streaming buffers or cache/memory which allows much more flexibility.
 
Lol, what do you mean by "even trini admits"? Didn't realize I was on trial :LOL:
:p I think its quite clear where your loyalties lie but you are also one of the intelligent people. ;) Besides that the real reason is that I had a discussion with you the other day where you admitted a half Fermi would be hard pressed in some areas.
 
You mean like they implemented on AvP? They did show that off on the Hornet you know, it is there, and it works. That said, you have two options for Tessellation:

1) Make it scalable and good enough on the high end chips, and have it scale down, and have the interconnects likely limit clock scaling.
2) Spend less net die area for a single unit that is consistent across your product line, and have the tessellation scale just like 1), but have tons of headroom, with less silicon used.

Now, if I were a dev, I would prefer to know that any device I use would have enough tesselation capabilities to do the job, and only have to worry about poly counts in general. The ATI way takes tessellation out of the loop to think about, it is always 'good enough'. With NV's method, it _MAY_ be insufficient, but that is unlikely on the higher end parts.

If you wanted to think like a dev, I would be thinking, "I have had Junipers in house since last May, but NV is promising me GF100s in early March. Do I wait on testing and optimizing DX11 features, or do I just get 10 month or so head start...."

-Charlie

you are drawing assumptions based on half knowledge, do you know the tessellation performance hit based on AMD products and scaling of the products, based on shader performance, resolution and features of said game, or game engines? AVP uses only an adaptive tessellation algorithm as far as I know, I'm talking a developer initiated limitation on the iterations of tessellation added to the adaptive tessellation algorithm. If you want the GPU to analyze the situation and from a clock scaling perspective, I can for see problems where the GPU has to forecast possible situations depending on the scene, which probably won't work too well.
 
Last edited by a moderator:
You mean like they implemented on AvP? They did show that off on the Hornet you know, it is there, and it works. That said, you have two options for Tessellation:

1) Make it scalable and good enough on the high end chips, and have it scale down, and have the interconnects likely limit clock scaling.
2) Spend less net die area for a single unit that is consistent across your product line, and have the tessellation scale just like 1), but have tons of headroom, with less silicon used.

Now, if I were a dev, I would prefer to know that any device I use would have enough tesselation capabilities to do the job, and only have to worry about poly counts in general. The ATI way takes tessellation out of the loop to think about, it is always 'good enough'. With NV's method, it _MAY_ be insufficient, but that is unlikely on the higher end parts.

If you wanted to think like a dev, I would be thinking, "I have had Junipers in house since last May, but NV is promising me GF100s in early March. Do I wait on testing and optimizing DX11 features, or do I just get 10 month or so head start...."

-Charlie

Good point.

Developers thus far haven't even seen Nvidia DX11 hardware and therefore most games up until the end of the year will almost certainly have been made using the 'ATI model' of tessellation so we have to consider how it scales to Nvidia hardware as all the benchmarks for the next few months especially with DX11 will likely favour ATI.

The question I have to you or anyone else is this:

If the developers are using evergreen as the basis for their work on DX11 tessellation then the important first question is not how the whole system compares to Fermi as it is, but how well the 128/256 SP parts face up to workloads designed for Evergreens level of tessellation. I don't believe the questions should be raised about Fermis DX11 implementation but how the implementation scales down to their half and especially quarter sized parts!

They sell far more mid and lower end cards than they ever will Fermi, so it would be good to have some answers on how the majority of their DX11 featureset cards will perform with basic DX11 features.
 
I just thought of something, someone confirm to me if this is correct: With NV's 8 pixel triangle design, doesn't that mean they can't really disable the whole GPC block, because if they did they wouldn't be able to do a whole normal 32 pixel tri/clock? If there are currently games that are supposedly setup limited, wouldn't a lesser chip really start to choke?
 
why would the less card start choking, its going to be doing lesser work in ever other area too. Unless we have real data on it there is no way to say if it would or it scales with the rest of the program.
 
:p I think its quite clear where your loyalties lie but you are also one of the intelligent people. ;) Besides that the real reason is that I had a discussion with you the other day where you admitted a half Fermi would be hard pressed in some areas.

Way more entertaining that way eh? ;)

Based on Nvidia's numbers Fermi's texture units are 60-100% more effective than GT200's. They claim a 70% gain in Crysis so assuming that's at a 1400Mhz clock that's 2x the throughput per unit per clock.

1.7 / (64*700)/(80*650) ~ 1.98x

If that's anywhere near the truth then things might not be as dire for Fermi variants as it seems on the surface.
 
Now, if I were a dev, I would prefer to know that any device I use would have enough tesselation capabilities to do the job, and only have to worry about poly counts in general. The ATI way takes tessellation out of the loop to think about, it is always 'good enough'. With NV's method, it _MAY_ be insufficient, but that is unlikely on the higher end parts.

It still doesn't change the fact that developers writing games for the PC market have to contend with multiple IHVs, cards without tessellation, cards with tessellation but different shading power and different clocks. The idea that putting a low performing tessellation part in the mix simplifies things for developers is pure fantasy. AMD's tessellation design does not bring "console programming" simplificity for devs.

On top of that, you still don't seem to understand how tessellation works. Even if AMD keeps the same tessellation unit on the super-low end cards vs the high end cards, the amount of ALU power will very greatly, which means that your throughput after domain shading can still vary greatly.

Simply put, this little bit of spin doesn't fly.
 
It still doesn't change the fact that developers writing games for the PC market have to contend with multiple IHVs, cards without tessellation, cards with tessellation but different shading power and different clocks. The idea that putting a low performing tessellation part in the mix simplifies things for developers is pure fantasy. AMD's tessellation design does not bring "console programming" simplificity for devs.

On top of that, you still don't seem to understand how tessellation works. Even if AMD keeps the same tessellation unit on the super-low end cards vs the high end cards, the amount of ALU power will very greatly, which means that your throughput after domain shading can still vary greatly.

Simply put, this little bit of spin doesn't fly.

I haven't seen proof that Ati's tessellation design is a low performing one compared to nvidia's. Nor do we know the exact in game performance of nvidia's design. Perhaps we should wait and see how both cards perform in actual games.
 
Well if we were to put aside for a moment the prevailing assumption that Nvidia's engineers are morons I suspect they won't produce Fermi variants that are terribly unbalanced.
I wouldn't consider a 128SP Fermi-derived part with only 8 rops (and 8 pixels per clock rasterization rate) terribly unbalanced. I really don't see how it could keep up (even with twice the rops / rasterization rate) with juniper (gts250 doesn't, at best it keeps up with the Juniper salvage part). If hot clock is as much as it is on GTS250, it still has a significant disadvantage in texturing rate (granted if it really can filter at hot clock it shouldn't be too bad) compared to gts250. Also look where the 96SP GT2xx part sits at compared to Evergreen... So imho a 128SP Fermi-derived part with 8 ROPs might be enough to beat all redwood based parts if things go well (and it should really as it would be a much larger chip still) at a similar die size to gt215 (so that would be a quite needed improvement). Give it 16 rops and it still won't be able to touch Juniper, unless nvidia can reach a hot clock of 2.5Ghz or something like that :).
 
It still doesn't change the fact that developers writing games for the PC market have to contend with multiple IHVs, cards without tessellation, cards with tessellation but different shading power and different clocks. The idea that putting a low performing tessellation part in the mix simplifies things for developers is pure fantasy. AMD's tessellation design does not bring "console programming" simplificity for devs.

On top of that, you still don't seem to understand how tessellation works. Even if AMD keeps the same tessellation unit on the super-low end cards vs the high end cards, the amount of ALU power will very greatly, which means that your throughput after domain shading can still vary greatly.

Simply put, this little bit of spin doesn't fly.

Wouldn't it make sense to parametrize the level of tessellation based on the card? If yes then were is the problem?
 
I have a fair idea of TSMC contract costs. Then again, if you are in a situation where there are massive shortages for ~2 years on 40nm wafer starts, you probably aren't going to give many discounts. ATI likely has the volume now on 40nm, so I really doubt that NV gets appreciably cheaper wafers.
Ok, you keep feeling that way. :)

You are right about the PCIe bridge, so add $15 or so, maybe 20.
Wow, I'd thought, they'd be cheaper than that.

I am aware of the cost of a vapor chamber, I know the two companies that make the OEM parts for ATI, and know many of the people involved at the cooler companies. ATI has almost the same wattage to cool as NV, and has a larger area to cool it over, both die and card. If you want to add $5 for ATI there, feel free. The official cost for the 4870X2 reference cooler was $15, and I was told at the time that some companies could do an internally designed vapor chamber for half that. The cost differential there is very low. Don't confuse consumer costs with OEM/ODM costs.
I won't. But 4870x2's cooler was quite a different beast than 5870x2... err 5970's. It had two separate blocks instead of one on the 5970, only one of which was equipped with a small vapor chamber.
You also have no idea what the production GF100 consumer card will use for a cooler, do you?
No, but then again: Do you? It obviously shouldn't be like the Tesla-card's.

Board cost is mainly layer count. The power and signal pins to a single GF100 chip are likely much harder to route than the those to a single Cypress. 384 bit memory vs 256 are a major factor, as is power PER GPU. The PCIe lane count will be equal either way. You have a slightly longer board for ATI, and a much more complex board for Nvidia. Given the two, the GF100 board is probably more expensive. Anyone want to count the layers on a Cypress vs Hemlock board? Same? I don't have a hemlock here to do it with, but I would wager that Hemlock is a notably simpler board than GF100.

Anyone want to count the board layers on 3870 vs 3870x2, 4870 vs 4870x2 and 5870 vs 5970? Since they are all released products, can Dave tell us? Oh heavenly voice of ATI knowledge, answer our nattering technical minutia.....

FWIW, the board layer count from my CES pics seem to indicate a 14 layer board, but it is hard to tell if that is camera artifacting or real counts. It is a 14MP SLR, and I can clearly read the resistor numbers, so it is likely to not be artifacting, but it could be.
-Charlie
I read a lot of "probably" and "if" - plus you keep referring to a single Cypress where Hemlock has two. Most of the wiring might go to different places, thus not requiring additional layers, but then: some might not. Nvidia used at lot of 14-Layer boards in the past for Teslas (and for pre-manufactured Geforce-cards at launch i presume), so if that's your major concern, then it seems like they can sell 500+ mm² chips on 14 Layer boards and still make a profit.
 
The point is that NV, AMD, ATI, and Intel routinely have press demos where they compare their product to the competition. This is common practice, and if you have a winning part, you allow the press to run wild, bring their own software, and generally do whatever the hell they want.

If you don't have a winning product, you control the comparisons, limit what can be done, and don't let people wander outside the guidelines if you let them do anything at all. Again, this is pretty common practice.

I have been at briefings that where both were done (not the same briefing obviously), and the they invariably turn out to presage the performance of the product. You also can learn a lot from how the numbers are presented, and the way in which questions are answered, if at all.

Nvidia is doing one of these things.

-Charlie
I dont' know the usual practice in the US, but in Europe it's normal to have a launch where you get brainw… briefed on the product and then get to take a sample back to your labs or get one delivered at some point in the future. It's absolutely uncommon to be told what to test and what not.

But I haven't been to an architecture briefing where I can run all my own benchmarks on a PC provided there.


--
You mean like they implemented on AvP?
Great example. With high tessellation, you drop from 500ish FPS in wireframe to just under 90. Without wireframe (i.e. shading, shadows and so in place), you go from 125ish to... 86ish. Talk about limitations.

But then, that was on HD 5800 launch, maybe drivers and or AvP-builds have improved on this.

Now, if I were a dev, I would prefer to know that any device I use would have enough tesselation capabilities to do the job, and only have to worry about poly counts in general. The ATI way takes tessellation out of the loop to think about, it is always 'good enough'. With NV's method, it _MAY_ be insufficient, but that is unlikely on the higher end parts.
You seem to keep forgetting about or ignoring the implications tessellated geometry will have on the shaders. After all, hull and domain shaders don't come for free. Ideally, you want to spend exactly so many transistors on ff tessellation and setup to not be limited by shader power behind it. Conversely, You don't need a gazillion tessellated triangles on an double-digit shader part. From where I am sitting, Nvidias approach seems more sensible wrt to balance between geometry and shading - except for the cost in die space, that is. But they don't disclose how much of that "extra" is geometry and how much is raster.

If you wanted to think like a dev, I would be thinking, "I have had Junipers in house since last May, but NV is promising me GF100s in early March. Do I wait on testing and optimizing DX11 features, or do I just get 10 month or so head start...."

-Charlie
Key Devs from AMDs perspective had that luxury, but how many are there? I know of only half a dozen or so. The others seemed to had to wait a little longer - but nevertheless, even if they had to buy it in stores, they have 6 months head start.
 
Last edited by a moderator:
Count me in the minority because I for one am excited for GF100 to be productized and released. Gotta update my Folding farm :D

One question re: the higher CSAA mode
does this reduce specular and texture aliasing as well, or only aliasing of alpha textures?
 
I don't think NVidia will scale back geometry for downmarket parts to the ridiculous extent that Silus is suggesting. Though I wouldn't bet against the barrel-scraping GF108 or whatever the hell the crappiest part is called being super shit in this respect - performance is not an option.

And I agree too. I was just speculating on the option of disabling GPCs for other chips. and its effects, given architectural improvements that we know of. That may or may not be NVIDIA's path for this, because they may be able to just leave some bits and pieces of other GPCs enabled in the chip, instead of disabling them completely.
 
There's a gulf between disabling and deleting. You speculated specifically on a 1 GPC chip to compete with Juniper.

Jawed

That's true, which was already debunked by more knowledgeable forum members, including yourself. No need to throw pitch forks at me :)
 
Back
Top