NVIDIA Fermi: Architecture discussion

Intel demostrated something LRB-like a long time ago. That's all dust in your eyes and nothing more. Only real launches and sales matter. NV is clearly betting on Nov-Dec launch right now. I (and you) have no idea if they'll be able to do it.

You made a statement, now stop side stepping the request for a link.

Leon: When does it come out?
DegustatoR: November-December clearly.
Leon: Link for official announcement?

If you are going to be dogmatic and fill the pages of the thread with such be prepared to put up or shut up!
 
You cannot just rip out a bunch of logic from the FPUs and achieve anything. You'd really need to redesign the FPUs from scratch and re-layout the execution resources to save power and area.
I thought it would be somewhat simpler because the FPUs appear to be running along the edges of each core, leaving the other hardware in the center.

It's not like Lego blocks where you can magically disconnect them, or even like a multi-core where you just lop off one core.
A large part of the core is agnostic to the capability of the FPUs, and Nvidia has a history of maintaining both DP and SP pipeline designs.
I did not mean to imply one can wave a wand over the DP pipeline and it magically becomes SP, just that this is but one component of the core and that the rest is very much unaware of FP capability.
Anything not an ALU, decoder, or scheduler will not care (and the latter two can in naive implementations almost not care), and it looks like a fair amount of the rest of the core is kept isolated from the parts that do.


To do DP you need more bits for your operands (mantissa, exponent, etc.) and you have to store and play with them somewhere. That will be very close to the logic that does single precision (which is just a smaller mantissa and exponent).
Nvidia has undertaken the expense of designing both a double precision and single precision floating point pipeline before.
As you've said, one is an incremental evolution over the other.

Taking what amounts to a decrement for consumer hardware could have been planned for and budgeted into the development effort, and there exists the potential for a significant amount of reuse.
Since Nvidia has gone and applied full IEEE compliance to both SP and DP, there doesn't appear to be anything disjoint enough to make a core with just the FPUs replaced impractical.

Frankly at that point you need to significantly redesign almost the whole thing, the scheduler would be somewhat different, the dispatch as well, etc. etc.
Changing this seems handy, but I'd be curious what reasons there are for modifying the scheduler or dispatch being prohibitive or strictly necessary.
The current DP-capable issue hardware is fully capable of not running DP instructions, and if the SP-only hardware is not modified to change any of the SP instruction latencies, what would the scheduler notice?
Not that a front end that dispensed with the extra instructions entirely wouldn't probably be smaller.

Taking the hardware a step down within the same constraints that the heftier and more complex multi-precision DP units have would seem to impose a lesser burden.

You really cannot easily remove DP, any more than you could easily remove x87 from a CPU.
That's pulling in concerns outside of the engineering difficulty.
An x86 host processor that suddenly blows up on code that ran on older chips is worse than useless.

What a slave chip does behind a driver layer that obscures all or part of the internals is much less constrained (or lets hope this is the case for Larrabee).

It's pretty deeply integrated in there, and you'd need to redo your layout anyway to take advantage.

The layout of the core would possibly change, or at some of it would need to change.
Given that this fraction of one component of the whole chip that would likely need global layout changes anyway for a reduced mass-market variant, is this necessarily prohibitive or unthinkable?
 
Are the Evergreen ALUs more efficient than RV7xx ALUs, then? Since HD4890, which is clocked higher than HD5770, is getting only 60FPS in perlin noise (source: bit-tech 5870 review)

He's saying that there are more than 800 SP, plain and simply. ;)

Anyway, this is argument for the R8xx discussion, not his one. So I think it's better to be back in topic again.

EDIT : I had a bad understanding. However also texture rates are much higher on Juniper...
 
Last edited by a moderator:
Are the Evergreen ALUs more efficient than RV7xx ALUs, then? Since HD4890, which is clocked higher than HD5770, is getting only 60FPS in perlin noise (source: bit-tech 5870 review)
5770 is 850MHz core.

5870: 158FPS
5770: 79FPS

Still believing in 14 SIMDs tales?
 
Keep the Fermi thread to Fermi, stop polluting it with Juniper/Cypress speculation (that is also....:| inducing).
 
They're doing fine and are waving hello....LOL

Seriously now I don't think anything regarding possible release dates is known yet.

How do you know, they are doing fine? There are no tape-out rumours, not even any rumours about possible specs. Maybe they have not even taped out yet?
 
I gave up on Deg after he refused to take Nvidia's word that Fermi's late.

At this point in time, unless Nvidia can get enough quantities out before black Friday, they will bleed market share.
 
Like RV770?


L2->L1 is untouched, and GDS->anything could be untouched (though GDS is 4x larger - and it may have additional features aimed specifically at compute), but everything else looks like it's scaled.

L2->L1 not scaling looks like a severe problem - and theoretically indicates that the next architecture has to be quite different in this respect. I'm interested to see how well prunedtree's matrix multiplication works on HD5870, to see if L2->L1 is causing a problem.


Nor how useful the prior architecture was for D3D11. e.g. VS-HS-TS-DS-GS-PS may not work so well on the prior architecture.

Still have to wait to see how well it works on ATI.

Jawed
No, not like RV770 but like the step from RV770 to RV870 which seems to me kind of a logical evolution where they've added DX11 (with all the necessary entanglements) and doubled the number of execution units but left the basic architecture unchanged. This is where Fermi differs - at least in my opinion.

But change has not always been for the better. What ended in a disaster in 2003 turned out pretty amazing in 2006. Now it's 2009 - time for a change or history repeating itself, go figure. :)
 
Are the Evergreen ALUs more efficient than RV7xx ALUs, then? Since HD4890, which is clocked higher than HD5770, is getting only 60FPS in perlin noise (source: bit-tech 5870 review)

Did not Mighty Dave who saved all an everything himself say, that Vantages Perlin Noise would profit from shader-based interpolation which is why HD 5870 achieved >2,2x increase in that particular test?
 
Given the economy and existing installed based, I'm skeptical any large swings in market share are even going to take place in the next 6-12 months. The real problem for NVidia is lost revenues and earnings. Looking at their fundamentals, it seems like they have enough cash to survive a few quarters of bleeding. ($1.7 billion cash, only $25M debt)

NVidia has the benefit that they built up a lot of developer mindshare with the G80 and CUDA, mindshare that will only slowly be eroded by OpenCL/DX11 and new offerings. The bulk of developers still have to target earlier class chips.

What I'm saying is, NVidia opened up an opportunity for AMD, and the new open standards will gradually displace CUDA, but that 6 months is too short of a time for anything dramatic to happen, unlike say, the huge opportunity Intel yielded to AMD with the P4.

I think AMD will make many design wins in the OEM space, especially mid-range machines, notebooks, and that will continue for a while until NVidia gets a low end part out, with NVidia hoping that large margins in the high end "big chip" area, workstation market, and HPC will pad out losses in the low end and chipset markets in the mean time. I think NVidia's new suite of developer tools will do much to continue to attract developers to offer extra support.

In general, this round, AMD benefits, but the doom-and-gloom is clearly misplaced. In hindsight, I think Fermi, if performance works out, was the right bet. Take the pain pill early before LRB arrives. They didn't have much choice in the matter. It's just unfortunate that it had to occur during an economic downturn that drives consumers to shop for big bargains.
 
No, not like RV770 but like the step from RV770 to RV870 which seems to me kind of a logical evolution where they've added DX11 (with all the necessary entanglements) and doubled the number of execution units but left the basic architecture unchanged. This is where Fermi differs - at least in my opinion.
No doubt Fermi differs, but NVidia hasn't merely built Fermi for D3D11, whereas at best AMD seems to have tweaked a few things beyond DirectCompute and OpenCL 1.0 but we'll prolly never see them described properly, let alone used.

More interesting is whether for ~2B transistors on 40nm R800 is good enough at D3D11. We could be looking at an R580 type part, the last of its era before a new architecture arrives. R580 seemed alright, but the next iteration showed major increases in per-unit/per-clock capabilities etc.

Actually you get into a fiddlesome question of what defines an architecture.

At least Fermi looks categorically like a new architecture. Even if the TMUs and ROPs haven't changed, the rest has changed enough...

Jawed
 
There could be a renaming going on this time (i.e. GTX260 -> GTS330 ? )

The GTX/GTS/GT scheme is gone, if I remember correctly and nVidia hadn't changed their naming again.

Anithing with a G200b can go GeForce 3xx but can they do that without losing loads and loads of money, provided they have to price it against Juniper?
 
The GTX/GTS/GT scheme is gone, if I remember correctly and nVidia hadn't changed their naming again.

Anithing with a G200b can go GeForce 3xx but can they do that without losing loads and loads of money, provided they have to price it against Juniper?

IIRC with the latest news on renaming-scheme, some of the GT21x parts will get 3xx names
edit:
NVIDIA_DEV.0CAF.01 = "NVIDIA N11P-GS1" = "NVIDIA GeForce GT 335M" = GT216
That was used as example somewhere
 
Last edited by a moderator:
IIRC with the latest news on renaming-scheme, some of the GT21x parts will get 3xx names
edit:
NVIDIA_DEV.0CAF.01 = "NVIDIA N11P-GS1" = "NVIDIA GeForce GT 335M" = GT216
That was used as example somewhere

nothing truly new as that is delegated to mobile parts.. something both sides have done in the past.
 
Back
Top