NVIDIA Fermi: Architecture discussion

nAo · Oct 1, 2009

MfA said:
L2 is coherent with itself, there is a single L2 for each memory bus ... there can only ever be one copy.

That's not a cache coherency scheme, that is simply caching.

I've just finished to read RWT's preview. I previously thought that they got fully coherent L1 caches per SM. Wishful thinking..

trinibwoy · Oct 1, 2009

Rys said:
More precision in an FMA at the MUL stage.

Can higher precision cause rendering errors?

MfA · Oct 1, 2009

nAo said:
I've just finished to read RWT's preview. I previously thought that they got fully coherent L1 caches per SM. Wishful thinking..

What would it really get you any way, finegrained communication should be done with messages, coarse grained communication can be done through L2 (latency is not an issue for coarse grained communication, and it has plenty of bandwidth). Complete coherency is a waste of transistors

babcat · Oct 1, 2009

Has NVIDIA posted any information on GRAPHICS related improvements?

digitalwanderer · Oct 1, 2009

Arty · Oct 1, 2009

In layman terms;

R600 - 320SP - 720 million - 2.25 MT per SP (Added D3D10)
RV670 - 320SP - 666 million - 2.08 MT per SP (Added D3D10.1 & lowered memory bus width)
RV770 - 800SP - 956 million - 1.19 MT per SP
Cypress - 1600SP - 2.15 billion - 1.34 MT per SP (Added D3D11)

G80 - 128SP - 686 million - 5.36 MT per SP (Added D3D10)
G200 - 240SP - 1.4 billion - 5.83 MT per SP (Added CUDA)
Fermi - 512SP - 3 billion - 5.85 MT per SP (Added more CUDA + D3D11 & lowered memory bus width)

Since we dont have any performance numbers, purely from this perspective it doesnt like Fermi is Nvidia's RV770 but still their jump looks more efficient.

babcat · Oct 1, 2009

digitalwanderer said:
No.

That is really disappointing. I can't wait until they share more information about what features it has that will benefit graphics.

INKster · Oct 1, 2009

babcat said:
Has NVIDIA posted any information on GRAPHICS related improvements?

DX11 is an improvement, no ?

And this is a "graphics" card, it has a DVI port (unlike their Tesla GPGPU counterparts):

Since this was never meant to be a product "launch", not even a "paper-launch", talking about that three months before the actual unveiling would sort of kill the "buzz" in the media.
I'm surprised even an official die photo got in there, since not even AMD treated us with one from the "Cypress"/RV870/HD5870 core.

babcat · Oct 1, 2009

INKster said:
DX11 is an improvement, no ?

And this is a "graphics" card, it has a DVI port:

Since this was never meant to be a product "launch", not even a "paper-launch", talking about that three months before the actual unveiling would sort of kill the "buzz" in the media.
I'm surprised even an official die photo got in there, since not even AMD treated us with one from the "Cypress"/HD5870 core.

DX11 is an improvement, but one that everyone has expected.

BTW, I'm not suggesting that they reveal frame rates or even images/videos from games. I'm just wondering if the hardware has been modified or any hardware has been added that will benefit graphics. For example, there is the rumor that it does not have a hardware tessellation unit. That is something that could impact graphics.

DemoCoder · Oct 1, 2009

At this point, I think fixed functional graphics features are essentially maxed out. Yes, they are marginal improvements to be made to texture filtering or antialiasing, but its high cost for small gain. The future of GPU graphics improvements as things have become more general is essentially just upping performance and eliminating pathological use cases that might choke performance.

They might improve AF or AA slightly, but those just eliminate annoying artifacts from fewer and fewer edge cases, they won't improve lighting.

We've kind of reached a 'boring' stage in GPU evolution where introduction of new fixed-function blocks may recede. I'm not sure what the long term survival will be for fixed-function tessellation for example (despite Charlie's aggressive shilling for it). Geometry amplification is a wide-open field that encompasses compression and procedural synthesis.

From a developer standpoint, this card looks sweet, from exceptions, to vtable-dispatch support, to debugging in a real IDE stepping through C++ on the GPU, but it comes at a cost, which is density. NVidia may be following SGI's eventual decline, by losing the consumer/workstation market based on cost and targeting HPC, which is a niche.

DemoCoder · Oct 1, 2009

babcat said:
For example, there is the rumor that it does not have a hardware tessellation unit. That is something that could impact graphics.

Whether tessellation is done in hardware or software won't affect graphics quality, only performance, and that's if your bottlenecked by tessellation and if your desired tessellation fits the hardware.

babcat · Oct 1, 2009

DemoCoder said:
At this point, I think fixed functional graphics features are essentially maxed out. Yes, they are marginal improvements to be made to texture filtering or antialiasing, but its high cost for small gain. The future of GPU graphics improvements as things have become more general is essentially just upping performance and eliminating pathological use cases that might choke performance.

From a developer standpoint, this card looks sweet, from exceptions, to vtable-dispatch support, to debugging in a real IDE stepping through C++ on the GPU, but it comes at a cost, which is density. NVidia may be following SGI's eventual decline, by losing the consumer/workstation market based on cost and targeting HPC, which is a niche.

How could exceptions, vtable dispatch support, debugging in a real IDE, and so fourth impact graphics?

Also, I read somewhere that ray tracing could be improved in this GPU. Do you see anything that could indicate this?

INKster · Oct 1, 2009

babcat said:
For example, there is the rumor that it does not have a hardware tessellation unit. That is something that could impact graphics.

Pure software emulation is certainly not on their minds if it significantly impacts actual 3D performance, but it is conceivable that it's one area where there'll be compromises as to the level of hardware implementation, much like the ones Nvidia did it with G80 when Geometry Shaders' performance (part of the DX10.x spec) took a back seat to the rest.
This time, with this much programmability on-chip (assuming for a moment the scenario described above), the quality of the OS driver and compiler is paramount, and could change the tide for a particular game pretty dramatically.

Davros · Oct 1, 2009

So to those who understand, this new chip is it
Yay, Meh or Wtf ?

MarkoIt · Oct 1, 2009

Davros said:
So to those who understand, this new chip is it
Yay, Meh or Wtf ?

I don't understand, but i think it's

for gpgpu computing.. and

for gaming (in terms of performance/mm^2 against Cypress).
But we will see..
For sure, it's a nice step forward from GT200's architecture.. but where Nvidia is stepping?

DemoCoder · Oct 1, 2009

This seems to be a new FUD meme, that this card won't run games fast. With the exception of special hardware for ECC and double precision, practically everything else will help games, if you accept the messaging that AMD/ATI was selling since the R600, which was that ALU:TEX ratios are going up and shader power is therefore a big factor in gaming.

What may not be as important, is the stuff for C++, since you can always write non-OO code and avoid indirect dispatch (with a cost). That's extra trannies for development convenience.

The lack of information about TMUs/ROPs doesn't mean they suddenly gimped the card. They wouldn't have included a 384-bit memory bus if they were going to gimp on texturing or fillrate. I would wait and see. Then again, AMD had a massive 512-bit bus on the R520, but totally skimped out on TMUs.

ninelven · Oct 1, 2009

Yeah, based on the information available, I'd estimate it will be at least ~45% faster than HD5870 which seems reasonable for 40% greater die size.

FUDie · Oct 1, 2009

DemoCoder said:
Then again, AMD had a massive 512-bit bus on the R520, but totally skimped out on TMUs.

R520 had a 256-bit bus. R600 is the chip you're looking for.

-FUDie

seahawk · Oct 1, 2009

That was tech day, not aimed at the gaming consumer. For those with an interest in what they showed it was an awesome day, for the others ... well wait till the gamer launch comes.

DemoCoder · Oct 1, 2009

FUDie said:
R520 had a 256-bit bus. R600 is the chip you're looking for.

-FUDie

Yeah, my memory is starting to get faulty over these years.

NVIDIA Fermi: Architecture discussion

nAo

Nutella Nutellae

trinibwoy

Meh

MfA

babcat

digitalwanderer

wandering

Arty

KEPLER

babcat

INKster

babcat

DemoCoder

DemoCoder

babcat

INKster

Davros

MarkoIt

DemoCoder

ninelven

PM

FUDie

seahawk

DemoCoder

Similar threads