NVIDIA Fermi: Architecture discussion

Sontin · Jan 10, 2010

A.L.M. said:
Don't know if it's possible to disable only the ROPs (to which the bus is linked). I think that in order to castrate a chip you should disable entire blocks (ALUs, TMUs, ROPs).
Thus, F A K E.

It's possible since the G80. There is no link between rops/memory interface and the cluster/SM.
And they did the same with the gtx275 and gtx295.

A.L.M. · Jan 10, 2010

Sontin said:
It's possible since the G80. There is no link between rops/memory interface and the cluster/SM.
And did the same with the gtx275 and gtx295.

My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...

DavidGraham · Jan 10, 2010

A.L.M. said:
My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...

Wrong , Fermi has independent Texture Mapping Units probably 128 , you can read Rys's piece at techreport .

Sontin · Jan 10, 2010

A.L.M. said:
My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...

He adds all the specification to one. AMD did the same with the 5970.

A.L.M. · Jan 10, 2010

DavidGraham said:
Wrong , Fermi has independent Texture Mapping Units probably 128 , you can read Rys's piece at techreport .

TMUs have TAU units and TFU units. for sure there should be 128 TAU, I'm not sure that there are 128TFUs.

Sontin said:
He adds all the specification to one. AMD did the same with the 5970.

Nothing to say about the "shared L2" and dual core gpu?

DavidGraham · Jan 10, 2010

A.L.M. said:
TMUs have TAU units and TFU units. for sure there should be 128 TAU, I'm not sure that there are 128TFUs.

Yes they are , but since GT200 the number of TFUs are equal to TAUs , that is the cause of the GT200 tremendous texture brute force .

Sontin · Jan 10, 2010

A.L.M. said:
Nothing to say about the "shared L2" and dual core gpu?

He is counting all specifications together. nVidia did a press event for GF100, so it's possible that the guy got the information directly from nVidia.

jimmyjames123 · Jan 10, 2010

http://www.overclock.net/nvidia/641156-guru-legit-nvidia-specs-benchmarks.html

Mod edit: That sums up the post nicely, thanks. Copy&pasting someone else's content is a bit of a faux pas, and when it's lengthy noisy content, it's even more improper. Thanks.

PSU-failure · Jan 10, 2010

Sontin said:
He is counting all specifications together. nVidia did a press event for GF100, so it's possible that the guy got the information directly from nVidia.

Except the fact there are obvious inconsistencies, like shared L2 and dual core GPU?

Yeah, it could quite be "dual core" by dedicating 1/3rd of the memory bus to inter-GPU communication assuming adress bus is R/W, but L2 wouldn't be shared that way and that wouldn't stick with other "specs".

OlegSH · Jan 10, 2010

A.L.M. said:
Nothing to say about the "shared L2" and dual core gpu?

Maybe you need to look at this http://www.freepatentsonline.com/7616206.pdf There is describe of efficient private bus utilizing free MC to create fast connect. There are also some interesting methods of tile like interleve of render targets using cache to help hide added latency. Has some more new patents about this link but im too lazy to search them:smile:

MfA · Jan 10, 2010

OlegSH said:
Maybe you need to look at this http://www.freepatentsonline.com/7616206.pdf There is describe of efficient private bus utilizing free MC to create fast connect. There are also some interesting methods of tile like interleve of render targets using cache to help hide added latency. Has some more new patents about this link but im too lazy to search them:smile:

Ugh, it's nice that NVIDIA is thinking of doing it ... but it really really doesn't deserve a patent.

PSU-failure · Jan 10, 2010

OlegSH said:
Maybe you need to look at this http://www.freepatentsonline.com/7616206.pdf There is describe of efficient private bus utilizing free MC to create fast connect. There are also some interesting methods of tile like interleve of render targets using cache to help hide added latency. Has some more new patents about this link but im too lazy to search them:smile:

The theory is ok, but I hope you're not assuming they would use just 1 MC to communicate between GPUs...

With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.

Even with 1/3 of the bus dedicated to inter-GPU comm, it would still be quite bad and that would give a composite 512bit bus, which is not in line with the "specs" given.

Sontin · Jan 10, 2010

Really, what's the problem? He is counting the two L2 Cache together. Yeah, it's not right but AMD did the same with Hemlock: http://forum.beyond3d.com/showpost.php?p=1359262&postcount=4698

Ninjaprime · Jan 10, 2010

CES is over in a few hours, and we know nothing really new, I guess that Rahja dude was full of shit?

OlegSH · Jan 10, 2010

PSU-failure said:
The theory is ok, but I hope you're not assuming they would use just 1 MC to communicate between GPUs...

With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.

Even with 1/3 of the bus dedicated to inter-GPU comm, it would still be quite bad and that would give a composite 512bit bus, which is not in line with the "specs" given.

1 MC can be possibly enough for some types of SLI rendering. In discribed method interliave only RT information, geometry and etc as usual doubled in each local GPU memory. But i'm not sure it's about nowadays becouse patent reffered to 3Mb or more caches to hide latency

trinibwoy · Jan 10, 2010

PSU-failure said:
With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.

Not that it isn't pure fantasy but why would inter-GPU bandwidth need to be equal when the load on that path would be far lower than GPU<->Mem? I'm not sure what purpose it would serve anyway, didn't both AMD and Nvidia claim that their current proprietary links have sufficient bandwidth for their purposes?

jimmyjames123 · Jan 10, 2010

Ninjaprime said:
CES is over in a few hours, and we know nothing really new, I guess that Rahja dude was full of shit?

"Rahja" said that more info would be available after the "12th". According to Chris Ray, an NDA is expiring on that day, or at least changing to some extent, so that they will be allowed to pass new info on GF100. Chris said that we will get this information "very soon" (so presumably sometime within the next few days, or at least soon this month, based on his wording).

Ninjaprime · Jan 10, 2010

jimmyjames123 said:
"Rahja" said that more info would be available after the "12th". According to Chris Ray, an NDA is expiring on that day, or at least changing to some extent, so that they will be allowed to pass new info on GF100. Chris said that we will get this information "very soon" (so presumably sometime within the next few days, or at least soon this month, based on his wording).

Waiting for tuesday then, hopefully not another let down like the "wait for CES!" one.

hatter · Jan 10, 2010

Anand on Fermi

AnandTech finally says something on Fermi @ CES

The demo also used NVIDIA's sterescopic 3D technology - 3D Vision. We're hearing that the rumors of a March release are accurate, but despite the delay Fermi is supposed to be very competitive (at least 20% faster than 5870?). The GeForce GTX 265 and 275 will stick around for the first half of the year as Fermi isn't expected to reach such low price/high volume at the start of its life.

I think GTX265 is a type for GTX285

http://www.anandtech.com/tradeshows/showdoc.aspx?i=3719&p=3

Sontin · Jan 10, 2010

hatter said:
AnandTech finally says something on Fermi @ CES

I think GTX265 is a type for GTX285

http://www.anandtech.com/tradeshows/showdoc.aspx?i=3719&p=3

And anandtech said in october, that GT200 is EOL.

NVIDIA Fermi: Architecture discussion

Sontin

A.L.M.

DavidGraham

Sontin

A.L.M.

DavidGraham

Sontin

jimmyjames123

PSU-failure

OlegSH

MfA

PSU-failure

Sontin

Ninjaprime

OlegSH

trinibwoy

Meh

jimmyjames123

Ninjaprime

hatter

Sontin

Similar threads