NVIDIA Fermi: Architecture discussion

It's funny how things have come full circle. nvidia used to be the bane of network operators and systems guys when their product used to allow multiplayer doom on the network, now they are trying to be the toast of systems guys with their HPC re-orientation.

With the advent of more laptops, games being made for consoles and now graphics cards going other places than 3D I am rather worried that the 3D world I love is turning slightly grey and wrinkly :(

We're getting a dog soon, will be the first time I have left the house since getting my Riva TNT2 ! :D The writing is on the wall ....
 
What would it really get you any way, finegrained communication should be done with messages, coarse grained communication can be done through L2 (latency is not an issue for coarse grained communication, and it has plenty of bandwidth)
Coherent caches are best in the cases where you don't necessarily know in detail your data and communications patterns up front, but you often know general statistics about them. If you're forced to statically determine your memory hierarchy (i.e. where coherency, contention, etc. may lie) you're forced to code to the worst case, which is often bad. Caches give you the ability to handle the bad cases gracefully if they are rare, which is really valuable in a lot of algorithms, histogram generation - with some coherency in the input - being the poster child.

While you can argue about value vs cost, they do clearly have value in a variety of algorithms, and not just for "lazy" programmers.
 
I just read Anand's piece. So fast double precision is phenomenal; overall their ecosystem seems to be way ahead of ATI's. But the thing that made my jaw drop was C++ support, memory addressing and VS plugin - they're building a C++ compiler into a driver, provide a "sane" way of debugging (and have it all run on 3-billion transistor chip). That's definitely a great leap forward in ease of development!

Embedded compilers are of course nothing new, but your typical embedded compiler doesn't support try-catch either.
 
Also, I read somewhere that ray tracing could be improved in this GPU. Do you see anything that could indicate this?
I see (at least) great improvements in ease of development, also more flexible hardware and more capable hardware.
So, in context of the question, yes - raytracing could be improved on this GPU :).
 
The reception has been very lukewarm at best. Anand is even hyping Larrabee as the "true" revolution. :LOL:
My gut reaction is that NVidia's built a slower version of Larrabee in about the same die size with no x86 and added ECC. It might be a bit smaller.

The split L1/SM is sensible. Trying to go forwards without a cache per core would have been nuts. And plenty of compute kernels don't have any use for SM, so they've lessened the proportion of memory sat there doing nothing.

I'm impressed. But I think NVidia is right on the edge of boom/bust with this. In theory NVidia will be ditching ARM from Tegra in a few years - this will do everything.

Oh and this would make a nice PS4 processor. Well, version 2.0 of it, anyway - as I don't think this chip is powerful enough.

Good question. The non-SM % of the die area looks much smaller compared to GT200.
The ALUs have ballooned, so even if there are TMUs, they'd look small in comparison.

But the key thing is that GF100 has way too much INT capability for graphics. In my view the only way to diffray this is to make it an INT unit for compute and part of DP as well as to do texturing. It's pretty smart I'd say. A lot of compute is address computation, yet another use for INT (the ALU).

I'm also dubious that GF100 has ROPs, for what it's worth. Again, those INT units would be idling a lot of the time if there are ROPs. Instead the ALUs will rely upon L1<->L2 for the memory side of ROP operations (including fast atomics for basic Z operations), but the math is purely ALU/FPU based.

So, if I'm right about the lack of TMUs and ROPs, the real question is what's the throughput of the ALUs for traditional int8 texturing and blending... And how much of that overlaps with the pull model interpolation mode in D3D11? They're all variations of interpolation.

So, Intel ditched the rasteriser and kept the TUs, while NVidia ditched the TUs and kept the rasteriser?... (actually, is there still a rasteriser :p ?)

And who can write a decent driver/compiler? It's gonna be an interesting race to see which shrugs off D3D rendering-bugs/performance-woes first.

Jawed
 
Oh and this would make a nice PS4 processor. Well, version 2.0 of it, anyway - as I don't think this chip is powerful enough.

Actually the way theyre going seems away from consoles, doesnt it? Less and less of this stuff is useful for consoles, as it moves away from graphics towards HPC.

What console wants a chip this big to maybe get equivalent graphics performance to 5870?

Then again I keep in mind that Nvidia has stressed theyre very interested in next gen consoles in interviews, so I dont know. Maybe theyd do a custom gfx oriented chip for a console, that would seem to be a lot more R&D though.

Out of all the console contenders MS is definitely sitting prettiest with ATI at the moment imo.
 
chavvdarrr said:
I was under impression that AMD double precision is 2/5 from peak fp32, yet Anand claims its 1/5. Is he right?
Yeah it's 1/5 (really 1/4 as the 5th alu simply isn't used), but anyway ~544GFlops worth.
 
Maybe theyd do a custom gfx oriented chip for a console, that would seem to be a lot more R&D though.

Maybe fermi isn't worst than even GT200...
ok they advanced many non game related aspects, but who said hat they don't considered to be competitive against ati?
 
DemoCoder said:
NVidia may be following SGI's eventual decline, by losing the consumer/workstation market based on cost and targeting HPC, which is a niche.
I'm impressed. But I think NVidia is right on the edge of boom/bust with this.

NVidia PR claims that compared to x86 HPC, NVIdia is able to provide the same amount of TFLOPS with 20-30 times (not %) less number machines, kW of power and overall upfront cost.
And their GF100 can produce ~100 FPS in games also. The performance suffers compared to next gen consoles, but when are these coming out?

I think the stars are quite aligned for NVidia - the PC gaming market is not VERY compute-bound currently, but general HPC is about to warm up to new approaches. If their driver can eat pure C++, it wouldn't scare them that Larrabee has x86 support in hardware. On top of all that, GF100 derivates are still mass-produced chips, so the benefitting laws of "economics of scale" still apply.

Bold direction and good progress, if NVidia manages to pull this off.
One of the finest PR-works, if otherwise ;).
 
Is it just me or is nVidia running a really huge risk here? I can't imagine this having a better price/performance ratio than the HD5xx0 line, so it really looks like they're trying to transition away from mainstream consumer graphics, or else just couldn't adjust their plans fast enough to have anything else in this time frame.

But that puts them in a position where they're forced to cede even more of the consumer market to AMD, and Intel and Larrabee seem like they could directly compete with this sort of concept. It's like they're running toward the giant behemoth that is Intel while being nipped at the heels by an AMD with a far stronger bite than expected.

I can't believe that's an enviable position to be in at all.
 
NVidia PR claims that compared to x86 HPC, NVIdia is able to provide the same amount of TFLOPS with 20-30 times (not %) less number machines, kW of power and overall upfront cost.
Until Larrabee arrives. GF100's killer blow might be the ECC.

And their GF100 can produce ~100 FPS in games also. The performance suffers compared to next gen consoles, but when are these coming out?
It'll be interesting to see if this is competitive with whatever ATI card is available when it launches. Games are becoming ALU bound and on single-precision floating point this is way way behind. On INT it's way ahead. On D3D compute it's down to whether the memory system is dramatically better than ATI. ATI memory system is still old-school bits and pieces here and there, so it's a question of whether they add up at all. Bearing in mind that D3D compute is a subset of CUDA 3.0, so techniques on CUDA 3.0 aren't all available, and so can't all benefit GF100.

I think the stars are quite aligned for NVidia - the PC gaming market is not VERY compute-bound currently, but general HPC is about to warm up to new approaches. If their driver can eat pure C++, it wouldn't scare them that Larrabee has x86 support in hardware.
Agreed, to a degree. There's still a gulf between your current weather forecasting supercomputer running on x86 now and building something that just "drops in" based upon Fermi.

On top of all that, GF100 derivates are still mass-produced chips, so the benefitting laws of "economics of scale" still apply.
Only if they're fast enough that people want to game on them.

Jawed
 
New console chips are a few years off I'd say.

Jawed

Probably yes, but I can't imagine that the console manufacturers haven't started negotiations a long time ago. They said no to Intel quite some time ago for a reason.
 
:LOL:

Damn, must make a plea to the planners and engineers to put a "cookie monster" in our ASIC's! :D
darkside4h08.jpg
 
Back
Top