NVIDIA Fermi: Architecture discussion

To clarify, when you mention a 40% hit in Unigine when applying tesselation, are you implying that tesselation alone incurs a 40% performance hit, or that applying tesselation and displacement maps (and the new geometry casting and receiving shadows) has a 40% hit? There is a difference as you can use tesselation for purposes other than displacement mapping.

It's the "whole package" in case of Unigine, not tesselation alone
 
I doubt nVidia has full dedicated fixed function hardware for the tessellation. More likely they have a little bit of dedicated hardware and majority of the tessellation math is done in the general purpose shader cores. Radeon DX11 chips also moved some math from texture filtering units to general purpose shader cores. There is no purpose to generate dedicated fixed function hardware for all the new chip features anymore.
Agreed. Keep in mind that the "heavy-lifting" of tessellation is done in the programmable hull and domain shaders, not in the new fixed-function piece which basically just parcels up a parameter space based on some simple rules. In fact I wouldn't be surprised if the biggest performance hit related to tessellation is the additional triangle setup rather than anything with the tessellation process itself.
 
I agree with what Andrew just said.
I wouldn't be too much worried about the TS stage, what can really kill you are the HS/DS stages, memory bandwidth (are primitives generated with tessellation streamed out to GDDR?), rasterization rate (small primitives -> low hierarchical rasterization efficiency ) primitive setup rate and decreased pixel shaders efficiency (shade 4, use 1 special offer :) )

Marco
 
I don't think the newly generated geometry is sent to the memory?, that would kinda of be counter productive tfor the need of tesselation, but I also agree with Andrew and nAo.
 
It's not like we have not seen before hw that streams out/in data when primitives amplification is involved. Not saying that this is the case now, but it's certainly not a remote possibility.
 
Been so waiting to post this. ;P

In my recent "Ask Nvidia a question" thread. We were allowed to discuss Fermi's DX11 features.

Q: How is NVIDIA approaching the tessellation requirements for DX11 as none of the previous and current generation cards have any hardware specific to this technology?

Jason Paul, Product Manager, GeForce: Fermi has dedicated hardware for tessellation (sorry Rys ). We’ll share more details when we introduce Fermi’s graphics architecture shortly

Sorry if you got singled out Rys. ;)

Interesting if true, I was also told it would not have one. Call me overly pessimistic, but after dealing with their PR for years, and getting all of zero honest and non-hair split responses, I would have to see what they mean by "dedicated hardware".

I fully expect them to nuance it like, "When we are using the tessellator, we dedicate 20 shaders to it, thus we have dedicated hardware for tessellation".

They can't even admit that the Fermi card was a sham, do you expect them to suddenly start being honest on other unpalatable things? Oh, you would, never mind.

-Charlie
 
I was simply assuming that since you weren't certain whether Fermi did tape-out in W42, you probably couldn't truly know the amount of activity (versus a simple 'it has or hasn't taped-out') around the derivatives. But of course, what you know is not what your sources (or their own sources) know, and in foresight it seems perfectly reasonable that you'd know about one but not the other.

You're right; I certainly don't know, although I obviously agree they couldn't be minor ones if it took 7 weeks. It does seem rather extreme to claim that a completely new architecture should only take 2 weeks to respin, including bringup; but 7 weeks is still clearly more than you'd expect. I'll also admit I assumed that you didn't know either; but I can certainly believe that if you say so.

The short answer is that I didn't have time to check. :) I was in Taipei that week, so I couldn't just call everyone I wanted to. Then again, it is easier to reach the contacts in Taiwan if I can just hop on the MRT. :)

Normally, you have boards ready, drivers ready and the rest, and you plug the GPU in/solder it down, and run your tests to see if everything comes up OK/correctly. This shouldn't take long. If you have bugs, the test itself should narrow down where it is in silicon. If not, you didn't prep right. Finding, fixing and verifying the fix should be fairly quick since it is likely a pretty specific change.

From there, it is just making masks for a layer or two. When you get A2 back, you know what you changed, and it should be even quicker to verify those fixes, and probably re-run the general tests to check that nothing else broke. Each new step after the first is usually much quicker.

This is assuming it is a logic bug. What if it is something like the GDDR5 controller doesn't have enough timing slack or is picking up noise? How about if the chip uses to much power? If power distribution is 'odd'. Things like that are not clean and easily simulatable, especially if the pre-A1 tapeout simulations didn't catch them. The technical term for those types of errors is "a mess". :)

That's perfectly reasonable; although as silent_guy indicated, they'd probably park the risk wafers before the first metal layer since the time difference is so small. And if the silicon layer is affected, then I'd expect the 'respin' to be named B1, not A3.

Of course, I have to admit that I suspect NVIDIA has started to name spins for marketing reasons. Specifically, Tegra APX 2600 is an 'A3' even though it includes significant silicon layer changes compared to the APX 2500. I assume they did that so their customers would be less scared of switching most existing design-ins to that revision. So it's possible that they'd name A3 what should logically be a B1, and then you'd be even more right.

Yeah, I would not put them above having marketing influence spin names. I also agree about the A2/A3 vs B1. This one might be B1 for all I know, I haven't seen anything hard to say that it isn't a base layer change. My guys only referred to it as A2, but that was a bit ago, since then, it was basically talked about as the new stepping/spin/whatnot.

Oh, I agree with the numbers. What I took exception with is that you said 'assuming Nvidia parked a few wafers' in that paragraph whereas later you refer to risk wafers as if they were a certainty and the only question is whether they will need to be scrapped. I do realize that you know they are used for both things, I simply found the article's phrasing a bit misleading.

I was using both scenarios. Last I heard definitively, they were going quite far down the stack. I don't know where they parked them, or where the A2 changes are yet, so I can't say for sure. One scenario is optimistic, the other pessimistic.

If they are still feeding Fudo December dates for 'launch', then it is likely that they risk wafers are still good. Then again, if someone high up has a stock sale pending, don't put it above them to fib a little in their 'leaks'.

Well, nothing forces them to produce all of them before getting the hot lots back, and remember there is also both a direct and an indirect (brand reputation/investor confidence/...) loss from being even more late to market. So producing $10-15M, for example, may be a good compromise. However remember this: there's no reason for them to order so many of them in advance if they didn't plan to use them fast. I know yields went back *down* at TSMC, but surely they were (naively ;)) expecting the reverse. Of course *if* they are themselves very uncertain whether A2 is good to go, they wouldn't want to waste all $50M and I'd be crazy to deny that.

"This Puppy is Fermi". Yup, they are aware of investor actions. Right now, I do agree that they will do anything they can to move the date up, but things seem to be going the wrong way.

Mostly (but not exclusively) bring up & early chips for the driver guys, obviously. Which is also why I'm confused that a respin should only take 2 weeks once you've got silicon back (we do agree 7 is not a good sign at all).

See above. That said, on the later steppings, it may not be needed, especially if they are sure a fix is good, and there isn't huge pressure on them.

I'm not aware of any demo that made extensive use of OGL/DX, but what do you mean by 'simulated graphics'? Do you mean CUDA-generated stuff, or are you claiming they were unable to demo anything (even CUDA) on real silicon? I do have good reason to believe they weren't lying when they claimed N-Body ran on real silicon in real time (live), but then again I'm not sure we disagree on that.

I mean that the demo was unlikely to have been output by a Fermi card, more likely that it was computed on one, and output through something else, if it was even a live video. I am not sure that they ever claimed it was live, and given how they are desperately clinging to nuance in their public statements, well, go over the video with a very fine toothed comb.

That said, given my dealings with them, I just assume they lie when they talk to me, and I haven't been proven wrong yet. Do you have a transcript for the video?

-Charlie

P.S. Whereever did Razor go, I miss his sig line. :(
 
I mean that the demo was unlikely to have been output by a Fermi card, more likely that it was computed on one, and output through something else, if it was even a live video.
It was a live demo on real hardware. Obviously I can't prove that to you because I wasn't in the room at the time, but, I'm more right than you are :D
 
Interesting if true, I was also told it would not have one. Call me overly pessimistic, but after dealing with their PR for years, and getting all of zero honest and non-hair split responses, I would have to see what they mean by "dedicated hardware".

I fully expect them to nuance it like, "When we are using the tessellator, we dedicate 20 shaders to it, thus we have dedicated hardware for tessellation".

They can't even admit that the Fermi card was a sham, do you expect them to suddenly start being honest on other unpalatable things? Oh, you would, never mind.

-Charlie

Yes, because you ahave always published honest to goodness articles about them. Why in the world should they ever tell you any thing even remotely close to the fricken truth? You do enough making up stories about them all on your own and with AMDs help. Money well spent by AMD i do believe.
 
Originally Posted by Ailuros
No an IHV specific techdemo/benchmark should not be any defining point, but real time usage in future games.

How is Unigine's benchmark IHV specific in any way? :???:

That's what I would like to know as well. If Unigine's demo is considered "IHV specific" simply because it's the only hardware available that is capable of running DX11, then by the same measure, every GPU PhyX enabled game/benchmark (Batman AA, 3Dmark.. etc) should to be considered "IHV specific" and "should not be any defining point"... after all any such implementations are it's performance are specific to ONE IHV.
 
Each IHV has a misses a step every so often and I wonder whether this is one to add alongside nv30 and from Ati the HD 2XXX series?

ATi must be frustrated about the lack of gpu's coming from TSMC, but on the other hand there is no pricing pressure, in fact negative pricing pressure.

G80 is looking like the zenith in recent times for our green friends.
 
Well Charlie's been more or less right in terms of timelines so far so that would be very unfortunate. I want to upgrade and the 5870 just isn't a big enough jump. May 2010 would really suck. Guess I could always downgrade my monitor :D
Yeah, wouldn't May 2010 make this the most belated launch in the history of the graphics market yet? It's never been more than six or seven months from what I recall.
 
That's what I would like to know as well. If Unigine's demo is considered "IHV specific" simply because it's the only hardware available that is capable of running DX11, then by the same measure, every GPU PhyX enabled game/benchmark (Batman AA, 3Dmark.. etc) should to be considered "IHV specific" and "should not be any defining point"... after all any such implementations are it's performance are specific to ONE IHV.

The real point behind it was real games in the future to show each side's efficiencies.

By your reasoning NVIDIA only needs a similar DX11 to Unigine's demo to have a benchmark answer to another benchmark and the happy hairpulling can start from there which is more accurate in evaluating exactly what.

If you personally make your buying decisions based on one or two sterile benchmarks be my guest; I on the other hand prefer to read as many reviews possible and the more real games measured in them the merrier. And yes of course there Unigine's demo would be the only thing that "should" matter and it's completely irrelevant that there are barriers between plagiarism and research.
 
Back
Top