NVIDIA Fermi: Architecture discussion

Don't know if it's possible to disable only the ROPs (to which the bus is linked). I think that in order to castrate a chip you should disable entire blocks (ALUs, TMUs, ROPs).
Thus, F A K E. :LOL:

It's possible since the G80. There is no link between rops/memory interface and the cluster/SM.
And they did the same with the gtx275 and gtx295.
 
Last edited by a moderator:
It's possible since the G80. There is no link between rops/memory interface and the cluster/SM.
And did the same with the gtx275 and gtx295.

My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...
 
My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...
Wrong , Fermi has independent Texture Mapping Units probably 128 , you can read Rys's piece at techreport .
 
My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...

He adds all the specification to one. AMD did the same with the 5970.
 
He is counting all specifications together. nVidia did a press event for GF100, so it's possible that the guy got the information directly from nVidia.
Except the fact there are obvious inconsistencies, like shared L2 and dual core GPU?

Yeah, it could quite be "dual core" by dedicating 1/3rd of the memory bus to inter-GPU communication assuming adress bus is R/W, but L2 wouldn't be shared that way and that wouldn't stick with other "specs".
 
Maybe you need to look at this http://www.freepatentsonline.com/7616206.pdf There is describe of efficient private bus utilizing free MC to create fast connect. There are also some interesting methods of tile like interleve of render targets using cache to help hide added latency. Has some more new patents about this link but im too lazy to search them:smile:
Ugh, it's nice that NVIDIA is thinking of doing it ... but it really really doesn't deserve a patent.
 
Maybe you need to look at this http://www.freepatentsonline.com/7616206.pdf There is describe of efficient private bus utilizing free MC to create fast connect. There are also some interesting methods of tile like interleve of render targets using cache to help hide added latency. Has some more new patents about this link but im too lazy to search them:smile:
The theory is ok, but I hope you're not assuming they would use just 1 MC to communicate between GPUs...

With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.

Even with 1/3 of the bus dedicated to inter-GPU comm, it would still be quite bad and that would give a composite 512bit bus, which is not in line with the "specs" given.
 
The theory is ok, but I hope you're not assuming they would use just 1 MC to communicate between GPUs...

With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.

Even with 1/3 of the bus dedicated to inter-GPU comm, it would still be quite bad and that would give a composite 512bit bus, which is not in line with the "specs" given.
1 MC can be possibly enough for some types of SLI rendering. In discribed method interliave only RT information, geometry and etc as usual doubled in each local GPU memory. But i'm not sure it's about nowadays becouse patent reffered to 3Mb or more caches to hide latency
 
With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.

Not that it isn't pure fantasy but why would inter-GPU bandwidth need to be equal when the load on that path would be far lower than GPU<->Mem? I'm not sure what purpose it would serve anyway, didn't both AMD and Nvidia claim that their current proprietary links have sufficient bandwidth for their purposes?
 
CES is over in a few hours, and we know nothing really new, I guess that Rahja dude was full of shit?

"Rahja" said that more info would be available after the "12th". According to Chris Ray, an NDA is expiring on that day, or at least changing to some extent, so that they will be allowed to pass new info on GF100. Chris said that we will get this information "very soon" (so presumably sometime within the next few days, or at least soon this month, based on his wording).
 
"Rahja" said that more info would be available after the "12th". According to Chris Ray, an NDA is expiring on that day, or at least changing to some extent, so that they will be allowed to pass new info on GF100. Chris said that we will get this information "very soon" (so presumably sometime within the next few days, or at least soon this month, based on his wording).

Waiting for tuesday then, hopefully not another let down like the "wait for CES!" one.
 
Anand on Fermi

AnandTech finally says something on Fermi @ CES

The demo also used NVIDIA's sterescopic 3D technology - 3D Vision. We're hearing that the rumors of a March release are accurate, but despite the delay Fermi is supposed to be very competitive (at least 20% faster than 5870?). The GeForce GTX 265 and 275 will stick around for the first half of the year as Fermi isn't expected to reach such low price/high volume at the start of its life.

I think GTX265 is a type for GTX285

http://www.anandtech.com/tradeshows/showdoc.aspx?i=3719&p=3
 
Back
Top