Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Old 19-Nov-2006, 13:33   #551
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,877
Default

G71 isn't Vec4... It's Vec2+Vec2/Vec3+Scalar. There still are efficiency gains with a purely scalar architecture, and they can still be up to 2x in the absolute corner cases, but generally speaking they're much smaller.

I do not believe NVIDIA's implementation of their scalar units are more expensive than G71's implementation of Vec2+Vec2/Vec3+Scalar (at least not by more than, say, 10%) - it's just smarter, imo.


Uttar
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline  
Old 19-Nov-2006, 19:26   #552
silent_guy
Senior Member
 
Join Date: Mar 2006
Posts: 1,696
Default

Quote:
Originally Posted by no-X View Post
I'd like to know, which configuration has higher transistor count. 128 scalar-processors or 64 vec4 processors?
Irrespective of whether they are scalar or not, GPU's have simple execution units, in the sense that they don't have single thread out-of-order execution, register renaming, branch prediction and other fancy stuff that you can find in contemporary CPU's. They also don't have support for exceptions, no interrupt support etc etc. So after stripping off all this, you're left with the ALU's, register files (that are much larger for a GPU than CPU) and probably still quite a lot of control logic but a lot of it won't be directly scalar/non-scalar related.
My guess would be that the key factor in determining the relative perf/mm2 efficiency of the processor is in the complexity of the register file design: single ported/double ported/tripple ported? The area increases more or less linear with the number of ports. If a ve3/4 unit can produce 3 or 4 MULs per clock cycles, the data has to come from and go to somewhere?
Maybe Jawed has a better idea about this

Quote:
I still think that old-style vec3+scalar architecture is more effective (at least) for todays games (performance per square mm).G80 shader core should be 2x more effective than G71, it's 2x bigger than G71 and it's ALU are clocked 2x higher than G71, but ther performance isn't 2x2x2 = 8x better... it's still about twice as fast as G71...
I'm sure that a bit of creative thinking will allow you to find some holes in your calculation. (Hint: count the number MADD's and try to find out if G80 has functionality that has been added or has been improved compared to G70)
silent_guy is offline  
Old 19-Nov-2006, 23:08   #553
Shtal
Senior Member
 
Join Date: Jun 2005
Posts: 1,320
Default

Right now AMD is getting ready with 65nm shrink and soon be widely available, does anybody think since AMD acquire ATI, it will increase/faster development for ATI R6xx series to be 65nm shrink too. Or it would not make any differences even if ATI was never bought by AMD.

Does anybody ever thought about this question?
__________________
What is the meaning of life? - Why I'm here, I know my past, because I return to the past but I'm going forward to see my future, to find the truth, meaning of the existence and purpose.
Shtal is offline  
Old 19-Nov-2006, 23:12   #554
Razor1
Senior Member
 
Join Date: Jul 2004
Location: NY, NY
Posts: 2,680
Default

AMD doesn't have the capacity to do both right now, or in the near to mid future, thats the only thing really stopping AMD from making ATi chips in there fabs that and it will take time to shift over to the AMD libraries, I would think possibly in 2 or 3 years we might see a transition once the NY fab opens up. But for the time being nada
Razor1 is offline  
Old 19-Nov-2006, 23:13   #555
Kaotik
yes, i'm drunk
 
Join Date: Apr 2003
Posts: 4,818
Send a message via ICQ to Kaotik
Default

Quote:
Originally Posted by Shtal View Post
Right now AMD is getting ready with 65nm shrink and soon be widely available, does anybody think since AMD acquire ATI, it will increase/faster development for ATI R6xx series to be 65nm shrink too. Or it would not make any differences even if ATI was never bought by AMD.

Does anybody ever thought about this question?
It doesn't really make any difference, R6xx chips are manufactured by TSMC, not AMD
__________________
I'm nothing but a shattered soul...
Been ravaged by the chaotic beauty...
Ruined by the unreal temptations...
I was betrayed by my own beliefs...
Kaotik is online now  
Old 19-Nov-2006, 23:15   #556
Shtal
Senior Member
 
Join Date: Jun 2005
Posts: 1,320
Default

Quote:
Originally Posted by Razor1 View Post
AMD doesn't have the capacity to do both right now, or in the near to mid future, thats the only thing really stopping AMD from making ATi chips in there fabs that and it will take time to shift over to the AMD libraries, I would think possibly in 2 or 3 years we might see a transition once the NY fab opens up. But for the time being nada
So in other words it feels it's still two different company's only with AMD logo now vs. ATI logo.
__________________
What is the meaning of life? - Why I'm here, I know my past, because I return to the past but I'm going forward to see my future, to find the truth, meaning of the existence and purpose.
Shtal is offline  
Old 19-Nov-2006, 23:16   #557
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,863
Send a message via Skype™ to Jawed
Default

This patent:

Simulating Multiported Memories Using Lower Port Count Memories

needs some study for the register file complexity question. As far as I can tell the primary issue with G80's register file is that on each clock cycle the access pattern can be different from the prior clock.

Additionally, it now seems that there are three kinds of units writing to the register file:
  1. primary MAD pipeline - reading 16, 32 or 48 scalars per clock and writing 16 scalars per clock
  2. SF unit - reading and writing four scalars per clock
  3. TEX pipeline - writing 16 scalars (4x vec4) per clock (much slower clock)
and I haven't even considered the flow of data required in reading constants (from the constant buffer - a process that appears to be more like texel-fetching, but with the data normally expected to reside in L1 cache, I guess).

---

The "co-issuable" MUL in G80 appears to be Int24, only. It seems G80 actually offers ~345GFLOPs and NVidia, for some reason, never quotes GFLOPs in documents (TIA: if someone can find a GFLOPs figure for G80 in an NVidia document - it used to mystify me why NVidia's been so thoroughly quiet on this).

Obviously if you compare R580's GFLOPs, they come from the co-issue of MAD+ADD. In MADs, alone, G80 is ~33% faster, whilst in ADDs R580 is ~equal. Of course, that's ignoring R580's VS GFLOPs.

Obviously the whole thing is skewed in G80's favour because of unified shading and the scalar pipeline's inherently greater utilisation in code that doesn't occupy all four channels of vec4 ALUs.

Jawed

Last edited by Jawed; 20-Nov-2006 at 02:29. Reason: TMU write rate to register file was wrong
Jawed is offline  
Old 19-Nov-2006, 23:29   #558
Razor1
Senior Member
 
Join Date: Jul 2004
Location: NY, NY
Posts: 2,680
Default

Quote:
Originally Posted by Shtal View Post
So in other words it feels it's still two different company's only with AMD logo now vs. ATI logo.
Not two different companies, kinda like if ain't broken don't fix it!

Well really depends AMD could make the choice of making ATi GPU's in thier fabs, but if they are getting more money from AMD chips, it wouldn't be smart for them to shift already low capacity over to something over all less profitable.
Razor1 is offline  
Old 19-Nov-2006, 23:38   #559
Shtal
Senior Member
 
Join Date: Jun 2005
Posts: 1,320
Default

Quote:
Originally Posted by Razor1 View Post
Not two different companies, kinda like if ain't broken don't fix it!

Well really depends AMD could make the choice of making ATi GPU's in thier fabs, but if they are getting more money from AMD chips, it wouldn't be smart for them to shift already low capacity over to something over all less profitable.
I remember the days when Nvidia acquire 3DFX; then later they used 3Dfx engineers to make NV30-FX chips: I wasn’t sure failure was cause by that reason or not. But the original question was is two heads better then one.

just to let you know I'm 100% with you on your answer!
__________________
What is the meaning of life? - Why I'm here, I know my past, because I return to the past but I'm going forward to see my future, to find the truth, meaning of the existence and purpose.

Last edited by Shtal; 20-Nov-2006 at 00:10. Reason: add
Shtal is offline  
Old 20-Nov-2006, 02:54   #560
Geeforcer
Harmlessly Evil
 
Join Date: Feb 2002
Posts: 2,027
Default

As has already been pointed out, "R600 = PC-ized beefed-up Xenos" does not mash with 700+ million transistors rumor.
__________________
"Complexity is easy; simplicity is difficult."
Geeforcer is offline  
Old 20-Nov-2006, 03:05   #561
Kaotik
yes, i'm drunk
 
Join Date: Apr 2003
Posts: 4,818
Send a message via ICQ to Kaotik
Default

Quote:
Originally Posted by Geeforcer View Post
As has already been pointed out, "R600 = PC-ized beefed-up Xenos" does not mash with 700+ million transistors rumor.
Why not? It's all about how much you beef it up
__________________
I'm nothing but a shattered soul...
Been ravaged by the chaotic beauty...
Ruined by the unreal temptations...
I was betrayed by my own beliefs...
Kaotik is online now  
Old 20-Nov-2006, 03:13   #562
Geeforcer
Harmlessly Evil
 
Join Date: Feb 2002
Posts: 2,027
Default

I should have been more specific - I had "64 Xenos-type ALUs" rumor in mind. IMO, for the 700M transitors rumor to be true, either R600 ALU >> Xenos ALU or R600 ALU # >> 64.
__________________
"Complexity is easy; simplicity is difficult."
Geeforcer is offline  
Old 20-Nov-2006, 04:36   #563
SugarCoat
Senior Member
 
Join Date: Jul 2005
Location: State of Illusionism
Posts: 2,091
Default

Quote:
Originally Posted by Geeforcer View Post
As has already been pointed out, "R600 = PC-ized beefed-up Xenos" does not mash with 700+ million transistors rumor.


If you had tried to tell me that the 8800GTX was being powered by a 700M transistor chip 4 months ago i would of told you you're head might be screwed on backwards and stuck in your ass. Everyone should be humbled and accepting after that .

Calling the R600 an improved Xenos really doesnt do it justice at all. It would have to be quite large regardless to pack in support for a 512-bit bus, let alone what added features might do to the transistor count.

Quote:
I had "64 Xenos-type ALUs" rumor in mind.
well erase it from your mind, they're significantly beefed up.
__________________
Everything's Eventual

Oedipus On The Orpheum Circuit!

Last edited by SugarCoat; 20-Nov-2006 at 04:38.
SugarCoat is offline  
Old 20-Nov-2006, 04:50   #564
Geeforcer
Harmlessly Evil
 
Join Date: Feb 2002
Posts: 2,027
Default

All of that is exactly my point: A) R600 will either feature some pretty dramatic changes to its 64 ALUs OR B) have more than 64 of them OR C) will not be 700+ Million transistors. My money is on A.
__________________
"Complexity is easy; simplicity is difficult."
Geeforcer is offline  
Old 20-Nov-2006, 09:59   #565
rwolf
Rock Star
 
Join Date: Oct 2002
Location: Canada
Posts: 961
Default

Quote:
Originally Posted by Geeforcer View Post
All of that is exactly my point: A) R600 will either feature some pretty dramatic changes to its 64 ALUs OR B) have more than 64 of them OR C) will not be 700+ Million transistors. My money is on A.
We know that to be true based on patents. Accumulator, min, max in simd unit.
rwolf is offline  
Old 20-Nov-2006, 10:08   #566
rwolf
Rock Star
 
Join Date: Oct 2002
Location: Canada
Posts: 961
Default

Quote:
Originally Posted by geo View Post
We're assuming R600 is SIMD because of Xenos? They've clearly said that they're leveraging Xenos, but one would have to believe they are bringing some v2 wrinkles as well.
It is a good assumption given that sine of their latest patents show vec4 + scalar.
rwolf is offline  
Old 20-Nov-2006, 13:27   #567
PeterAce
Member
 
Join Date: Sep 2003
Location: UK, Bedfordshire
Posts: 450
Default

AMD (ATi) adopted a strategy with R520 with their initial thourghts that as they were making a big change to the previous R4xx gen they decided that SM 3.0 + new ultra-threaded design was enough 'risk' and that they did 'go for the moon' by adding lots of extra PS ALUs, once the design was trusted they went for the 'many more ALUs' R580.

As more noices have been made resently about 64 ALUs (Vec4 MADD + Scalar ADD/SF) in R600, I'm wondering of they are doing the same thing this time round with the first high-end R600. Maybe 'next gen refresh' of R600 (labled as 65nm on the current roadmap) will be more like the 'going for the moon' version (many more ALUs + new 10.1 requirements) and will be more like my previous speculation of 96 ALUs :

http://www.beyond3d.com/forum/showpo...4&postcount=27
__________________
PeterAce "Lost in quantisation"

Last edited by PeterAce; 20-Nov-2006 at 13:32.
PeterAce is offline  
Old 20-Nov-2006, 13:56   #568
Razor1
Senior Member
 
Join Date: Jul 2004
Location: NY, NY
Posts: 2,680
Default

Quote:
Originally Posted by Jawed View Post

The "co-issuable" MUL in G80 appears to be Int24, only. It seems G80 actually offers ~345GFLOPs and NVidia, for some reason, never quotes GFLOPs in documents (TIA: if someone can find a GFLOPs figure for G80 in an NVidia document - it used to mystify me why NVidia's been so thoroughly quiet on this).

Obviously if you compare R580's GFLOPs, they come from the co-issue of MAD+ADD. In MADs, alone, G80 is ~33% faster, whilst in ADDs R580 is ~equal. Of course, that's ignoring R580's VS GFLOPs.

Obviously the whole thing is skewed in G80's favour because of unified shading and the scalar pipeline's inherently greater utilisation in code that doesn't occupy all four channels of vec4 ALUs.

Jawed

Its in the gf8800 tech brief

http://www.nvidia.com/object/IO_37100.html

Each stream processor on a GeForce 8800 GTX operates at 1.35 GHz and supports the dual issue of a scalar MAD and a scalar MUL operation, for a total of roughly 520 gigaflops of raw shader horsepower. But raw gigaflops do not tell the whole performance story. Instruction issue is 100 percent efficient with scalar shader units, and the mixed scalar and vector shader program code will perform much better compared to vector-based GPU hardware shader units that have instruction issue limitations (such as 3+1 and 2+2).

There is a good chance ATi's r600 might have more gflops so they probably aren't going to market the gflop side to much right now.
Razor1 is offline  
Old 20-Nov-2006, 14:14   #569
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,863
Send a message via Skype™ to Jawed
Default

Ta, it was under my nose. I guess they better hurry up and get it working then.

Jawed
Jawed is offline  
Old 20-Nov-2006, 15:09   #570
Razor1
Senior Member
 
Join Date: Jul 2004
Location: NY, NY
Posts: 2,680
Default

nah it was hard to find, actually looked through that doc too yesterday and missed it lol,

but that would the one hell of a back fire if nV starts promoting gflops and they get the short end of the stick!
Razor1 is offline  
Old 20-Nov-2006, 15:24   #571
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

Flop numbers have never been a focus of PC GPU marketing before....why would it become so now?
__________________
What the deuce!?
trinibwoy is offline  
Old 20-Nov-2006, 15:40   #572
Kaotik
yes, i'm drunk
 
Join Date: Apr 2003
Posts: 4,818
Send a message via ICQ to Kaotik
Default

Quote:
Originally Posted by trinibwoy View Post
Flop numbers have never been a focus of PC GPU marketing before....why would it become so now?
I guess GPGPU solutions might be a reason for GFLOP-advertising?
__________________
I'm nothing but a shattered soul...
Been ravaged by the chaotic beauty...
Ruined by the unreal temptations...
I was betrayed by my own beliefs...
Kaotik is online now  
Old 20-Nov-2006, 16:00   #573
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,809
Default

I guess but wouldn't that be a completely different realm of marketing? I'd be surprised to see gflop quotes on a retail box for example.
__________________
What the deuce!?
trinibwoy is offline  
Old 20-Nov-2006, 17:58   #574
no-X
Senior Member
 
Join Date: May 2005
Posts: 2,038
Default

Quote:
Originally Posted by Shtal View Post
I remember the days when Nvidia acquire 3DFX; then later they used 3Dfx engineers to make NV30-FX chips: I wasn’t sure failure was cause by that reason or not. But the original question was is two heads better then one.
nVidia used ex-3Dfx engineers to make NV4x, too http://www.beyond3d.com/previews/nvi.../index.php?p=9

//edited
__________________
Sorry for my English. But I hope it's better than your Czech

Last edited by no-X; 20-Nov-2006 at 22:08. Reason: 3dfx ->>> ex-3Dfx :-)
no-X is offline  
Old 20-Nov-2006, 22:05   #575
_xxx_
Naughty Boy!
 
Join Date: Aug 2004
Location: Stuttgart, Germany
Posts: 5,008
Default

I wouldn't say "they used 3dfx engineers", it's nV's employees after all
__________________
I have thought some of nature's journeymen had made men, and not made them well, they imitated humanity so abominably.
_xxx_ is offline  

 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:01.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.