NVIDIA Fermi: Architecture discussion

That's a tough guess. Sometimes it may be smarter to use a software solution. For example if GF100 really won't have a h/w tesselator we'll see soon enough if that's true for DX11-type tesselation.

Maybe they think that for the next year nobody will really push on tessellation, or in case they can "ask" to pull it out like for dx10.1 in assassin creed, and when it will began to widespread, they'll have a new revision with hw tesselator, or so powerfull and versatile that tesselation will be a little pain compared to the other effects/poly/raytracing etc.
 
Maybe they think that for the next year nobody will really push on tessellation, or in case they can "ask" to pull it out like for dx10.1 in assassin creed, and when it will began to widespread, they'll have a new revision with hw tesselator, or so powerfull and versatile that tesselation will be a little pain compared to the other effects/poly/raytracing etc.
It's really a question of wether they can map DX11 tesselation to their SMs well enough. I'm thinking that they may have chosen s/w tesselation because they are certain that s/w solution is preferable in the long run in the same way as unified PS/VS/GS are preferable to separate pipelines right now. Take Cell for example. AFAIK it's pretty good for tesselation. Does it have a h/w tesselator? Will it get one in the future? Will LRB have a h/w tesselator? Right now it looks like AMD may end up being the only one on the market with h/w tesselator in their chips. But who knows, maybe AMD's right and then everyone will be forced to implement a separate h/w tesselator at some point.
We need some benchmarks =)
 
I agree. In the past DirectX has set targets for what PC hardware should be capable of. Now that the IHVs are pushing the envelope DirectX will become more of a hindrance than a help. But it will still be very important as a common lowest denominator for all hardware. That's why Nvidia has to go it alone because they can't sit by and wait for Microsoft. It's no different to ATi and tessellation. The only difference is that Nvidia has the will and capability to drive things beyond DirectX.
I question that heavily, till engine providers are ready directx 11 will important no matter what Nvidia claims and desires are.
Epic, Crytech and likely other are working to provide tools and engine for what they expect to be next generation consoles systems. I would expect Directx to stay relevant for a while so.
 
Nvidia GF100: 128 TMUs and 48 ROPs

http://www.hardware-infos.com/news.php?news=3228

If I summarize, we have following facts:

- 40 nm
- 3.0 billion transistors
- 512 SPs
- 128 TMUs
- 48 ROPs
- 384 Bit GDDR5
In broad terms, in order for GTX285 to be just about faster than HD4890 (10-20%), it required 2x HD4890's TUs (80 v 40) and 2x HD4890's RBEs (32 v 16).

Now that HD5870 has 80 TUs and 32 RBEs ...

Of course that takes no account of the per-unit efficiency of these things. There's no reason why NVidia hasn't re-vamped that - if there are fixed function TMUs and ROPs.

Jawed
 
I don't think we'll see any breaking compatiblity coming, but rather decline of DX as the driving force in graphics world (gaming included).

I just don't ever see us going back to the early days when devs had to program to a different API for every different 3D card on the market, and customers had to check whether a game supported your graphics card or you just got software rendering. I don't see anything else unseating DX as the common, incumbent API for PC gaming or general graphics/3D on the ubiquitous windows platforms.
 
Why not? 16 SMs with 512 SPs is bigger than 1 tesselator.

Those 512 SPs will also be occupied with other pressing tasks. That's the whole point of fixed function hardware - to do something cheaply instead of using expensive general hardware.

It still kind of makes me pause when people turn up their noses at 1.6x scaling in a concurrent processing environment.

I don't see why. We are seeing orders of magnitude of speedup in compute applications but we should be elated with sub-linear scaling in graphics? They're supposed to be equal citizens right? I'm not worried about CPU limitations in the least, 4MP resolutions will have that effect :)

I question that heavily

Question what? The rest of the post seems to agree with what I said.

This is the sort of BS that Nvidia pulls all the time, and it's why a lot of people don't like them as a company.

Yep. Though it's not relevant in the least it still leaves a bad taste in your mouth.

I just don't ever see us going back to the early days when devs had to program to a different API for every different 3D card on the market

Nobody is proposing that. Things will still be standardized but just at a much lower level. Eventually all we would need is something akin to CS that allows developers to target the hardware. There'll be standardization of texture formats, compression and filtering but all of the higher level constraints on the rendering pipeline imposed by DirectX will go away. Middleware providers like Id and Epic will step in to fill that gap just like they do today.
 
It would certainly look more impressive if you guys had increased the setup rate.

I'm still trying to understand why we should be impressed by 2x the raster rate. Hasn't that alway been increasing. Why is it a highlight now? Dave is being very opaque about the whole thing.
 
There is no "sorta" about it. There is 2x the raster rate there.

What that entails exactly, and why triangle rates don't appear to have doubled in certain tests, has been hashed about in the R8xx thread for pages with no satisfactory conclusion. Perhaps the tri rate can scale independently of the rasterizer count. Perhaps there's a reason why 32 pixels per clock equals 2 rasterizers in Cypress, but just one in G80 (edit: sorry, GT200). Hashing it out in this thread probably wouldn't change the outcome.

With respect to a comparison to the Nvidia architecture and why it won't scale by a factor of 2, it is a question of whether setup is still 1 triangle per clock in a Fermi chip.
For setup-limited parts of the workload, doubling everything else would not double performance.
 
Another question is that IIRC for graphics loads GT200 unit utilization is already quite high (90% or more) so improved efficiency in Fermi leads to a preformance gain for graphics applications, but this would be limited IMHO.
 
I don't see why. We are seeing orders of magnitude of speedup in compute applications but we should be elated with sub-linear scaling in graphics? They're supposed to be equal citizens right? I'm not worried about CPU limitations in the least, 4MP resolutions will have that effect :)
What are the points of reference?
Orders of magnitude of improvement over cases where previous chips were terrible is readily possible.

Graphics, I would contend, would be something Nvidia was already very good at.
As was noted in other articles, a lot of the efficiencies gained are not efficiences that graphics loads presently care much about.
The write-back data path from the L1s is something graphics cards don't have nd yet have done very well without.

Much of Fermi's improvements focus on the compute part, which helps little in bandwidth/setup/ROP/TEX/CPU/driver-limited parts of the graphics workload.
 
Perhaps they were already so good that they decided to multiply units and improve efficiency in the compute parts? I mean 90% if true is rather exceptional for any IC.
 
David, many thanks for the great article. Do you know what's at the very center of the die?

07.jpg

No I don't. In the past, I think the thread scheduler, setup engine and rasterizer were in the center of the GPU.

David
 
Graphics, I would contend, would be something Nvidia was already very good at.

But that's not a good enough excuse for neglecting known bottlenecks. Given the dramatic changes on the compute side I don't think it's unreasonable to ask for a little love for graphics.
 
I'm still trying to understand why we should be impressed by 2x the raster rate. Hasn't that alway been increasing. Why is it a highlight now? Dave is being very opaque about the whole thing.
I don't think we're supposed to be impressed by it. I think it's just something that a couple reviewers mentioned and then a lot of people here at B3D started making a big deal about.
It would certainly look more impressive if you guys had increased the setup rate.
I was already disappointed when neither AMD nor NVidia did anything about setup rate in 2008, but at this point I just can't understand it. Do you know what's so hard about doing this in terms of ordering and dependencies?

Maybe I'm making a mountain out of a molehill, as their are very few games that have low framerates due to high poly count. But when you look at the benchmark wars that these two companies are engaged in, you'd think that they'd be jumping all over an opportunity for a 10-20% improvement.

One thing I love about high poly counts is that it makes for a very easy way to do selective supersampling. Sure, wasting a hardware quad on a triangle covering a couple samples seems ludicrous, but it's probably better than putting the burden on devs to rewrite shaders, and definately better than supersampling the whole scene.
 
Back
Top