Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 10-Jan-2010, 11:03   #3301
Sontin
Naughty Boy!
 
Join Date: Dec 2009
Posts: 399
Default

Quote:
Originally Posted by A.L.M. View Post
Don't know if it's possible to disable only the ROPs (to which the bus is linked). I think that in order to castrate a chip you should disable entire blocks (ALUs, TMUs, ROPs).
Thus, F A K E.
It's possible since the G80. There is no link between rops/memory interface and the cluster/SM.
And they did the same with the gtx275 and gtx295.

Last edited by Sontin; 10-Jan-2010 at 11:44.
Sontin is offline   Reply With Quote
Old 10-Jan-2010, 11:15   #3302
A.L.M.
Member
 
Join Date: Jun 2008
Location: Looking for a place to call home
Posts: 144
Default

Quote:
Originally Posted by Sontin View Post
It's possible since the G80. There is no link between rops/memory interface and the cluster/SM.
And did the same with the gtx275 and gtx295.
My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...
A.L.M. is offline   Reply With Quote
Old 10-Jan-2010, 11:23   #3303
DavidGraham
Member
 
Join Date: Dec 2009
Posts: 581
Default

Quote:
Originally Posted by A.L.M. View Post
My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...
Wrong , Fermi has independent Texture Mapping Units probably 128 , you can read Rys's piece at techreport .
DavidGraham is offline   Reply With Quote
Old 10-Jan-2010, 11:24   #3304
Sontin
Naughty Boy!
 
Join Date: Dec 2009
Posts: 399
Default

Quote:
Originally Posted by A.L.M. View Post
My fault. Anyway, there are other specs wrong in that list:

L2 cache shared? No way, it's on chip.
Dual core design gpu? Come on, no one still believes that...
Fermi should be doing texture filtering through alus (but I ask for Rys on this), thus I'm not that sure that Fermi has "128TFU"...
He adds all the specification to one. AMD did the same with the 5970.
Sontin is offline   Reply With Quote
Old 10-Jan-2010, 11:34   #3305
A.L.M.
Member
 
Join Date: Jun 2008
Location: Looking for a place to call home
Posts: 144
Default

Quote:
Originally Posted by DavidGraham View Post
Wrong , Fermi has independent Texture Mapping Units probably 128 , you can read Rys's piece at techreport .
TMUs have TAU units and TFU units. for sure there should be 128 TAU, I'm not sure that there are 128TFUs.

Quote:
Originally Posted by Sontin View Post
He adds all the specification to one. AMD did the same with the 5970.
Nothing to say about the "shared L2" and dual core gpu?
A.L.M. is offline   Reply With Quote
Old 10-Jan-2010, 11:42   #3306
DavidGraham
Member
 
Join Date: Dec 2009
Posts: 581
Default

Quote:
Originally Posted by A.L.M. View Post
TMUs have TAU units and TFU units. for sure there should be 128 TAU, I'm not sure that there are 128TFUs.
Yes they are , but since GT200 the number of TFUs are equal to TAUs , that is the cause of the GT200 tremendous texture brute force .
DavidGraham is offline   Reply With Quote
Old 10-Jan-2010, 11:46   #3307
Sontin
Naughty Boy!
 
Join Date: Dec 2009
Posts: 399
Default

Quote:
Originally Posted by A.L.M. View Post
Nothing to say about the "shared L2" and dual core gpu?
He is counting all specifications together. nVidia did a press event for GF100, so it's possible that the guy got the information directly from nVidia.
Sontin is offline   Reply With Quote
Old 10-Jan-2010, 11:57   #3308
jimmyjames123
Member
 
Join Date: Apr 2004
Posts: 810
Default

http://www.overclock.net/nvidia/6411...enchmarks.html

Mod edit: That sums up the post nicely, thanks. Copy&pasting someone else's content is a bit of a faux pas, and when it's lengthy noisy content, it's even more improper. Thanks.

Last edited by jimmyjames123; 10-Jan-2010 at 13:02.
jimmyjames123 is offline   Reply With Quote
Old 10-Jan-2010, 12:01   #3309
PSU-failure
Member
 
Join Date: May 2007
Posts: 249
Default

Quote:
Originally Posted by Sontin View Post
He is counting all specifications together. nVidia did a press event for GF100, so it's possible that the guy got the information directly from nVidia.
Except the fact there are obvious inconsistencies, like shared L2 and dual core GPU?

Yeah, it could quite be "dual core" by dedicating 1/3rd of the memory bus to inter-GPU communication assuming adress bus is R/W, but L2 wouldn't be shared that way and that wouldn't stick with other "specs".
PSU-failure is offline   Reply With Quote
Old 10-Jan-2010, 12:04   #3310
OlegSH
Member
 
Join Date: Jan 2010
Posts: 117
Default

Quote:
Originally Posted by A.L.M. View Post
Nothing to say about the "shared L2" and dual core gpu?
Maybe you need to look at this http://www.freepatentsonline.com/7616206.pdf There is describe of efficient private bus utilizing free MC to create fast connect. There are also some interesting methods of tile like interleveof render targets using cache to help hide added latency. Has some more new patents about this link but im too lazy to search them
OlegSH is offline   Reply With Quote
Old 10-Jan-2010, 12:11   #3311
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,219
Send a message via ICQ to MfA
Default

Quote:
Originally Posted by OlegSH View Post
Maybe you need to look at this http://www.freepatentsonline.com/7616206.pdf There is describe of efficient private bus utilizing free MC to create fast connect. There are also some interesting methods of tile like interleveof render targets using cache to help hide added latency. Has some more new patents about this link but im too lazy to search them
Ugh, it's nice that NVIDIA is thinking of doing it ... but it really really doesn't deserve a patent.
MfA is online now   Reply With Quote
Old 10-Jan-2010, 12:17   #3312
PSU-failure
Member
 
Join Date: May 2007
Posts: 249
Default

Quote:
Originally Posted by OlegSH View Post
Maybe you need to look at this http://www.freepatentsonline.com/7616206.pdf There is describe of efficient private bus utilizing free MC to create fast connect. There are also some interesting methods of tile like interleveof render targets using cache to help hide added latency. Has some more new patents about this link but im too lazy to search them
The theory is ok, but I hope you're not assuming they would use just 1 MC to communicate between GPUs...

With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.

Even with 1/3 of the bus dedicated to inter-GPU comm, it would still be quite bad and that would give a composite 512bit bus, which is not in line with the "specs" given.
PSU-failure is offline   Reply With Quote
Old 10-Jan-2010, 12:20   #3313
Sontin
Naughty Boy!
 
Join Date: Dec 2009
Posts: 399
Default

Really, what's the problem? He is counting the two L2 Cache together. Yeah, it's not right but AMD did the same with Hemlock: http://forum.beyond3d.com/showpost.p...postcount=4698
Sontin is offline   Reply With Quote
Old 10-Jan-2010, 12:25   #3314
Ninjaprime
Member
 
Join Date: Jun 2008
Posts: 335
Default

CES is over in a few hours, and we know nothing really new, I guess that Rahja dude was full of shit?
Ninjaprime is offline   Reply With Quote
Old 10-Jan-2010, 12:29   #3315
OlegSH
Member
 
Join Date: Jan 2010
Posts: 117
Default

Quote:
Originally Posted by PSU-failure View Post
The theory is ok, but I hope you're not assuming they would use just 1 MC to communicate between GPUs...

With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.

Even with 1/3 of the bus dedicated to inter-GPU comm, it would still be quite bad and that would give a composite 512bit bus, which is not in line with the "specs" given.
1 MC can be possibly enough for some types of SLI rendering. In discribed method interliave only RT information, geometry and etc as usual doubled in each local GPU memory. But i'm not sure it's about nowadays becouse patent reffered to 3Mb or more caches to hide latency
OlegSH is offline   Reply With Quote
Old 10-Jan-2010, 12:41   #3316
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Quote:
Originally Posted by PSU-failure View Post
With this "NUMA-like" approach, inter-GPU bandwidth must be equal to local memory bandwidth to achieve optimal efficiency, and that would still imply a quite high latency.
Not that it isn't pure fantasy but why would inter-GPU bandwidth need to be equal when the load on that path would be far lower than GPU<->Mem? I'm not sure what purpose it would serve anyway, didn't both AMD and Nvidia claim that their current proprietary links have sufficient bandwidth for their purposes?
__________________
What the deuce!?
trinibwoy is offline   Reply With Quote
Old 10-Jan-2010, 12:45   #3317
jimmyjames123
Member
 
Join Date: Apr 2004
Posts: 810
Default

Quote:
Originally Posted by Ninjaprime View Post
CES is over in a few hours, and we know nothing really new, I guess that Rahja dude was full of shit?
"Rahja" said that more info would be available after the "12th". According to Chris Ray, an NDA is expiring on that day, or at least changing to some extent, so that they will be allowed to pass new info on GF100. Chris said that we will get this information "very soon" (so presumably sometime within the next few days, or at least soon this month, based on his wording).
jimmyjames123 is offline   Reply With Quote
Old 10-Jan-2010, 12:52   #3318
Ninjaprime
Member
 
Join Date: Jun 2008
Posts: 335
Default

Quote:
Originally Posted by jimmyjames123 View Post
"Rahja" said that more info would be available after the "12th". According to Chris Ray, an NDA is expiring on that day, or at least changing to some extent, so that they will be allowed to pass new info on GF100. Chris said that we will get this information "very soon" (so presumably sometime within the next few days, or at least soon this month, based on his wording).
Waiting for tuesday then, hopefully not another let down like the "wait for CES!" one.
Ninjaprime is offline   Reply With Quote
Old 10-Jan-2010, 12:55   #3319
hatter
Junior Member
 
Join Date: Dec 2009
Posts: 31
Default Anand on Fermi

AnandTech finally says something on Fermi @ CES

Quote:
The demo also used NVIDIA's sterescopic 3D technology - 3D Vision. We're hearing that the rumors of a March release are accurate, but despite the delay Fermi is supposed to be very competitive (at least 20% faster than 5870?). The GeForce GTX 265 and 275 will stick around for the first half of the year as Fermi isn't expected to reach such low price/high volume at the start of its life.
I think GTX265 is a type for GTX285

http://www.anandtech.com/tradeshows/...spx?i=3719&p=3
hatter is offline   Reply With Quote
Old 10-Jan-2010, 13:07   #3320
Sontin
Naughty Boy!
 
Join Date: Dec 2009
Posts: 399
Default

Quote:
Originally Posted by hatter View Post
AnandTech finally says something on Fermi @ CES



I think GTX265 is a type for GTX285

http://www.anandtech.com/tradeshows/...spx?i=3719&p=3
And anandtech said in october, that GT200 is EOL.
Sontin is offline   Reply With Quote
Old 10-Jan-2010, 13:26   #3321
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,070
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by jimmyjames123 View Post
Take this with a grain of salt, but did anyone notice the Anonymous post on BSON with GF104 specs:

NVIDIA GeForce GTX 395 Specs :

- Codename "GF104".
- Dual Core GPU Design (Two GF100 "Fermi" Cores).
- 6.4 Billion Transistors In Total (TSMC 40nm Process).
- 32 Streaming Multiprocessors (SM).
- Each SM has 2x16-wide groups of Scalar ALUs (IEEE754-2008; FP32 and FP64 FMA).
- The 32 SMs Have 1536KB Shared L2 Cache.
- 1024 Stream Processors (1-way Scalar ALUs) at 1350MHz.
- 1024 ALUs In Total.
- 1024 FP32 FMA Ops/Clock.
- 512 FP64 FMA Ops/Clock.
- Single Precision (SP; FP32) FMA Rate : 2.76 Tflops.
- Double Precision (DP; FP64) FMA Rate : 1.38 Tflops.
- 256 Texture Address Units (TA).
- 256 Texture Filtering Units (TF).
- INT8 Bilinear Texel Rate : 153.6 Gtexels/s
- FP16 Bilinear Texel Rate : 76.8 Gtexels/s
- 80 Raster Operation Units (ROPs).
- ROP Rate : 48 Gpixels
- 600MHz Core.
- 640bit (2x320bit) Memory Subsystem.
- 4200 MHz Memory Clock.
- 336 GB/s Memory Bandwidth.
- 2560MB (1280MB Effective) GDDR5 Memory.
- New Cooling Design.
- High Power Consumption.

GTX395 is 60% faster than GTX380
GTX395 is 70% faster than HD5970
Release Date : May 2010, Price : 499-549 USD.
Pros:
2 way coherent caches using a 64 bits of mem bus looks to be a BIG improvement. Otherwise, seems rather sensible. Clocked a bit less than expected. If some SM's had been fused off, it could have been quite believable.

Cons:
A downclocked, castrated, cherrypicked GF100 hits 225W. So power is a big question here.
Can't shake off the feeling that this thing was derived/made up by applying the 285->295 formula.
__________________
The views presented here are my own and not my employer's.
Quote:
Originally Posted by Alexko View Post
So in a nutshell, model [BLANK] will have [BLANK], up to [BLANK], and even [BLANK] for a power consumption of just [BLANK]. Impressive.
rpg.314 is offline   Reply With Quote
Old 10-Jan-2010, 13:30   #3322
no-X
Senior Member
 
Join Date: May 2005
Posts: 2,038
Default

Quote:
Originally Posted by Sontin View Post
And anandtech said in october, that GT200 is EOL.
Yes, it's more or less a catalogue product now. In my country GTX285 are often more expensive and harder to find than HD5870. The same for GTX275/HD5850. Only a few overpriced ~1,8-2GB models are available. GT200 is EOL, availability is really poor, so it's logical to expect, that the GPU is produced no more. nVidia has some reasons to keep it in pricelists - maybe it's better to offer a virtual competitor than nothing.

I'd expect, that this situation will last until the launch of Fermi mainstream parts.
__________________
Sorry for my English. But I hope it's better than your Czech
no-X is offline   Reply With Quote
Old 10-Jan-2010, 13:34   #3323
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,219
Send a message via ICQ to MfA
Default

Quote:
Originally Posted by trinibwoy View Post
Not that it isn't pure fantasy but why would inter-GPU bandwidth need to be equal when the load on that path would be far lower than GPU<->Mem? I'm not sure what purpose it would serve anyway, didn't both AMD and Nvidia claim that their current proprietary links have sufficient bandwidth for their purposes?
It has sufficient bandwidth for AFR.

The problem with sidebusses and non AFR parallel rendering is that loadbalancing is rather difficult ... the naive approach is simple round-robin, but then all framebuffer writes are 50/50 local/remote ... which is going to take a whole lot of bandwidth.

Personally I would do things like this ...

- Vertex processing is divided round robin (vertex buffers are fully replicated)
- All write buffers are roughly tiled (say 64x64 or more) and checkerboard divided between the GPUs
- All transformed vertices get tiled and then written to buffers in the memory of the relevant GPUs for tesselation or rasterization (icky, but the writes would be done with special types of non temporal load/stores ... if the vertices get consumed while not evicted from L2 they never have to be written to external memory)
- All read buffers (including former write buffers) are replicated on demand on a tile by tile basis, which is to say they that if a tile from a buffer is accessed which is not stored locally that tile gets replicated

(My thinking on as needed replication is that it will be as or more efficient than NUMA with caching, for instance with dynamic textures reused across multiple frames it is clearly superior, and certainly more efficient than doing full buffer replication all the time since that introduces too much latency in between rendering steps.)

How much bandwidth that would consume? Hell if I know, would have to implement it in a simulator and run traces (lossless compression could probably cut the data for the tiled triangle writes by 66%, but because of the fact you are working with floating point numbers it's not cheap).

Last edited by MfA; 10-Jan-2010 at 13:44.
MfA is online now   Reply With Quote
Old 10-Jan-2010, 13:42   #3324
PSU-failure
Member
 
Join Date: May 2007
Posts: 249
Default

Quote:
Originally Posted by Sontin View Post
Really, what's the problem? He is counting the two L2 Cache together. Yeah, it's not right but AMD did the same with Hemlock: http://forum.beyond3d.com/showpost.p...postcount=4698
No, they added the GPUs processing units together, which is 100% valid, they added the mem bandwidth together, which is quite valid too (not 100%, but bandwidth really doubles when there's no duplicate access), but they didn't add the "L2 cache" together in your link's pics.

These specs are one of the worst fakes ever, except perhaps G80's hybrid water/air cooling I don't recall something worse.
PSU-failure is offline   Reply With Quote
Old 10-Jan-2010, 13:45   #3325
Sontin
Naughty Boy!
 
Join Date: Dec 2009
Posts: 399
Default

Quote:
Originally Posted by PSU-failure View Post
No, they added the GPUs processing units together, which is 100% valid, they added the mem bandwidth together, which is quite valid too (not 100%, but bandwidth really doubles when there's no duplicate access), but they didn't add the "L2 cache" together in your link's pics.

These specs are one of the worst fakes ever, except perhaps G80's hybrid water/air cooling.
They added the memory together - which is 100% not valid.
Sontin is offline   Reply With Quote

Reply

Tags
delay, fermi, geforce, gf100

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:32.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.