NVIDIA Fermi: Architecture discussion

fehu · Oct 2, 2009

DegustatoR said:
That's a tough guess. Sometimes it may be smarter to use a software solution. For example if GF100 really won't have a h/w tesselator we'll see soon enough if that's true for DX11-type tesselation.

Maybe they think that for the next year nobody will really push on tessellation, or in case they can "ask" to pull it out like for dx10.1 in assassin creed, and when it will began to widespread, they'll have a new revision with hw tesselator, or so powerfull and versatile that tesselation will be a little pain compared to the other effects/poly/raytracing etc.

KonKort · Oct 2, 2009

Nvidia GF100: 128 TMUs and 48 ROPs

http://www.hardware-infos.com/news.php?news=3228

If I summarize, we have following facts:

- 40 nm
- 3.0 billion transistors
- 512 SPs
- 128 TMUs
- 48 ROPs
- 384 Bit GDDR5

ShaidarHaran · Oct 2, 2009

While I am inclined to believe the ROP and TMU counts, I don't think you can yet classify them as "facts".

Dave Baumann · Oct 2, 2009

3dilettante said:
AMD doubled (maybe, sorta, from a certain point of view, who knows) its rasterizers.

There is no "sorta" about it. There is 2x the raster rate there.

DegustatoR · Oct 2, 2009

fehu said:
Maybe they think that for the next year nobody will really push on tessellation, or in case they can "ask" to pull it out like for dx10.1 in assassin creed, and when it will began to widespread, they'll have a new revision with hw tesselator, or so powerfull and versatile that tesselation will be a little pain compared to the other effects/poly/raytracing etc.

It's really a question of wether they can map DX11 tesselation to their SMs well enough. I'm thinking that they may have chosen s/w tesselation because they are certain that s/w solution is preferable in the long run in the same way as unified PS/VS/GS are preferable to separate pipelines right now. Take Cell for example. AFAIK it's pretty good for tesselation. Does it have a h/w tesselator? Will it get one in the future? Will LRB have a h/w tesselator? Right now it looks like AMD may end up being the only one on the market with h/w tesselator in their chips. But who knows, maybe AMD's right and then everyone will be forced to implement a separate h/w tesselator at some point.
We need some benchmarks =)

liolio · Oct 2, 2009

trinibwoy said:
I agree. In the past DirectX has set targets for what PC hardware should be capable of. Now that the IHVs are pushing the envelope DirectX will become more of a hindrance than a help. But it will still be very important as a common lowest denominator for all hardware. That's why Nvidia has to go it alone because they can't sit by and wait for Microsoft. It's no different to ATi and tessellation. The only difference is that Nvidia has the will and capability to drive things beyond DirectX.

I question that heavily, till engine providers are ready directx 11 will important no matter what Nvidia claims and desires are.
Epic, Crytech and likely other are working to provide tools and engine for what they expect to be next generation consoles systems. I would expect Directx to stay relevant for a while so.

Jawed · Oct 2, 2009

KonKort said:
Nvidia GF100: 128 TMUs and 48 ROPs

http://www.hardware-infos.com/news.php?news=3228

If I summarize, we have following facts:

- 40 nm
- 3.0 billion transistors
- 512 SPs
- 128 TMUs
- 48 ROPs
- 384 Bit GDDR5

In broad terms, in order for GTX285 to be just about faster than HD4890 (10-20%), it required 2x HD4890's TUs (80 v 40) and 2x HD4890's RBEs (32 v 16).

Now that HD5870 has 80 TUs and 32 RBEs ...

Of course that takes no account of the per-unit efficiency of these things. There's no reason why NVidia hasn't re-vamped that - if there are fixed function TMUs and ROPs.

Jawed

Bouncing Zabaglione Bros. · Oct 2, 2009

CouldntResist said:
I don't think we'll see any breaking compatiblity coming, but rather decline of DX as the driving force in graphics world (gaming included).

I just don't ever see us going back to the early days when devs had to program to a different API for every different 3D card on the market, and customers had to check whether a game supported your graphics card or you just got software rendering. I don't see anything else unseating DX as the common, incumbent API for PC gaming or general graphics/3D on the ubiquitous windows platforms.

nAo · Oct 2, 2009

Dave Baumann said:
There is no "sorta" about it. There is 2x the raster rate there.

It would certainly look more impressive if you guys had increased the setup rate.

trinibwoy · Oct 2, 2009

DegustatoR said:
Why not? 16 SMs with 512 SPs is bigger than 1 tesselator.

Those 512 SPs will also be occupied with other pressing tasks. That's the whole point of fixed function hardware - to do something cheaply instead of using expensive general hardware.

3dilettante said:
It still kind of makes me pause when people turn up their noses at 1.6x scaling in a concurrent processing environment.

I don't see why. We are seeing orders of magnitude of speedup in compute applications but we should be elated with sub-linear scaling in graphics? They're supposed to be equal citizens right? I'm not worried about CPU limitations in the least, 4MP resolutions will have that effect

liolio said:
I question that heavily

Question what? The rest of the post seems to agree with what I said.

Bouncing Zabaglione Bros. said:
This is the sort of BS that Nvidia pulls all the time, and it's why a lot of people don't like them as a company.

Yep. Though it's not relevant in the least it still leaves a bad taste in your mouth.

Bouncing Zabaglione Bros. said:
I just don't ever see us going back to the early days when devs had to program to a different API for every different 3D card on the market

Nobody is proposing that. Things will still be standardized but just at a much lower level. Eventually all we would need is something akin to CS that allows developers to target the hardware. There'll be standardization of texture formats, compression and filtering but all of the higher level constraints on the rendering pipeline imposed by DirectX will go away. Middleware providers like Id and Epic will step in to fill that gap just like they do today.

trinibwoy · Oct 2, 2009

nAo said:
It would certainly look more impressive if you guys had increased the setup rate.

I'm still trying to understand why we should be impressed by 2x the raster rate. Hasn't that alway been increasing. Why is it a highlight now? Dave is being very opaque about the whole thing.

3dilettante · Oct 2, 2009

Dave Baumann said:
There is no "sorta" about it. There is 2x the raster rate there.

What that entails exactly, and why triangle rates don't appear to have doubled in certain tests, has been hashed about in the R8xx thread for pages with no satisfactory conclusion. Perhaps the tri rate can scale independently of the rasterizer count. Perhaps there's a reason why 32 pixels per clock equals 2 rasterizers in Cypress, but just one in G80 (edit: sorry, GT200). Hashing it out in this thread probably wouldn't change the outcome.

With respect to a comparison to the Nvidia architecture and why it won't scale by a factor of 2, it is a question of whether setup is still 1 triangle per clock in a Fermi chip.
For setup-limited parts of the workload, doubling everything else would not double performance.

Psycho · Oct 2, 2009

KonKort said:
http://www.hardware-infos.com/news.php?news=3228

Brave linking the old article from 18.05.09 talking about the frequencies on already taped out A1 silicon

leoneazzurro · Oct 2, 2009

Another question is that IIRC for graphics loads GT200 unit utilization is already quite high (90% or more) so improved efficiency in Fermi leads to a preformance gain for graphics applications, but this would be limited IMHO.

3dilettante · Oct 2, 2009

trinibwoy said:
I don't see why. We are seeing orders of magnitude of speedup in compute applications but we should be elated with sub-linear scaling in graphics? They're supposed to be equal citizens right? I'm not worried about CPU limitations in the least, 4MP resolutions will have that effect

What are the points of reference?
Orders of magnitude of improvement over cases where previous chips were terrible is readily possible.

Graphics, I would contend, would be something Nvidia was already very good at.
As was noted in other articles, a lot of the efficiencies gained are not efficiences that graphics loads presently care much about.
The write-back data path from the L1s is something graphics cards don't have nd yet have done very well without.

Much of Fermi's improvements focus on the compute part, which helps little in bandwidth/setup/ROP/TEX/CPU/driver-limited parts of the graphics workload.

compres · Oct 2, 2009

Perhaps they were already so good that they decided to multiply units and improve efficiency in the compute parts? I mean 90% if true is rather exceptional for any IC.

dkanter · Oct 2, 2009

spacemonkey said:
David, many thanks for the great article. Do you know what's at the very center of the die?

No I don't. In the past, I think the thread scheduler, setup engine and rasterizer were in the center of the GPU.

David

Arty · Oct 2, 2009

KonKort said:
Nvidia GF100: 128 TMUs and 48 ROPs

http://www.xxxxxxxxxxxxxxxxx.com

If I summarize, we have following facts:

- 40 nm
- 3.0 billion transistors
- 512 SPs
- 128 TMUs
- 48 ROPs
- 384 Bit GDDR5

Can the mods please regulate the self promotion by the usual suspects like these?

I can understand this guy spamming his links on vr-zone, thats how it works over there but here? :no:

trinibwoy · Oct 2, 2009

3dilettante said:
Graphics, I would contend, would be something Nvidia was already very good at.

But that's not a good enough excuse for neglecting known bottlenecks. Given the dramatic changes on the compute side I don't think it's unreasonable to ask for a little love for graphics.

Mintmaster · Oct 2, 2009

trinibwoy said:
I'm still trying to understand why we should be impressed by 2x the raster rate. Hasn't that alway been increasing. Why is it a highlight now? Dave is being very opaque about the whole thing.

I don't think we're supposed to be impressed by it. I think it's just something that a couple reviewers mentioned and then a lot of people here at B3D started making a big deal about.

nAo said:
It would certainly look more impressive if you guys had increased the setup rate.

I was already disappointed when neither AMD nor NVidia did anything about setup rate in 2008, but at this point I just can't understand it. Do you know what's so hard about doing this in terms of ordering and dependencies?

Maybe I'm making a mountain out of a molehill, as their are very few games that have low framerates due to high poly count. But when you look at the benchmark wars that these two companies are engaged in, you'd think that they'd be jumping all over an opportunity for a 10-20% improvement.

One thing I love about high poly counts is that it makes for a very easy way to do selective supersampling. Sure, wasting a hardware quad on a triangle covering a couple samples seems ludicrous, but it's probably better than putting the burden on devs to rewrite shaders, and definately better than supersampling the whole scene.

NVIDIA Fermi: Architecture discussion

fehu

KonKort

ShaidarHaran

hardware monkey

Dave Baumann

Gamerscore Wh...

DegustatoR

liolio

Aquoiboniste

Jawed

Bouncing Zabaglione Bros.

nAo

Nutella Nutellae

trinibwoy

Meh

trinibwoy

Meh

3dilettante

Psycho

leoneazzurro

3dilettante

compres

dkanter

Arty

KEPLER

trinibwoy

Meh

Mintmaster

Similar threads