Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Old 11-Sep-2006, 14:23   #1
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default The New and Improved "G80 Rumours Thread" *DailyTech specs at #802*

The original thread here having become unwieldy. . . .

The B3D Forum Conventional Wisdom Watch (Minority Reports noted):

D3D10, 500M+ transistors, release sometime between September and end of the CY, probably 80nm (tho I've seen a minority report for 90nm), possibly GDDR3 with > 256-bit bus rather than GDDR4, HDR+AA. More new goodness on the AA side too, details unclear. Non-unified ps/vs. Power-hungry beastie, almost certainly with improved cooling vs G71.

Taking requests to add to this list. Do we have a unit count we are willing to point at as the Conventional Wisdom at this point? Xbit reported 48ps. . .willing to go with that for the moment?

Depending on how lazy I and my brethren are, we might try to keep this OP updated with particularly interesting new tidbits as they come in downstream, as an experiment to see how it works. Should be noted with "Update:"

Please note that this post is just meant to reflect the speculation included herein (and the previous thread, of course), rather than an official position of B3D, Inc!

Some relevant linkage along the way from the previous thread, for which only the authors are responsible for the accuracy thereof (i.e. don't bitch to me!):

http://www.xbitlabs.com/news/video/d...220100915.html
http://www.cooltechzone.com/Special_..._200604092276/
http://www.beyond3d.com/forum/showthread.php?t=30014
http://www.dailytech.com/article.aspx?newsid=2785
http://www.theinquirer.net/default.aspx?article=32385
http://www.beyond3d.com/forum/showth...737#post775737
http://www.theinquirer.net/default.aspx?article=32768
http://www.theinquirer.net/default.aspx?article=32856
http://www.digitimes.com/NewsShow/Ma...pages=A1&seq=2
http://gpu-fun.spaces.live.com/Perso...9&_c=links:119
http://www.beyond3d.com/forum/showpo...&postcount=361
http://www.theinquirer.net/default.aspx?article=33260
http://www.extremetech.com/article2/...1987258,00.asp
http://translate.google.com/translat...&hl=en&ie=UTF8
http://www.beyond3d.com/forum/showpo...&postcount=485
http://www.beyond3d.com/forum/showpo...&postcount=493
http://www.forbes.com/2006/08/18/nvi...rtner=yahootix
http://www.beyond3d.com/forum/showpo...&postcount=688

Update 9/12/2006: CW seems to be looking 600-700mhz core.

http://www.theinquirer.net/default.aspx?article=34319

48ps confirmed, here, follow the link download "graphics track": http://www.beyond3d.com/forum/showthread.php?t=33605

Update 9/18/2006: VR-Zone takes their shot at immortality as either prophets or buffoons: http://www.vr-zone.com/?i=4007

Update 9/29/2006: Some interesting pics here, including 12 memory chips, indicating the likliehood of a 384-bit memory bus and 768MB framebuffer: http://www.beyond3d.com/forum/showpo...&postcount=620

Update 10/05/2006: DailyTech mostly confirms VR-Zone's specs: http://www.beyond3d.com/forum/showpo...&postcount=802


*Added the "u" to rumours for our Brittanic overlord.
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline  
Old 11-Sep-2006, 14:36   #2
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
Default

G80 is new and improved? Wow I didn't even see the old one.
Nick is offline  
Old 11-Sep-2006, 14:48   #3
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,810
Default

Complicated thing, that english language eh?

I have to say though - you know you're at a quality establishment when the rumours thread is so well structured
__________________
What the deuce!?
trinibwoy is offline  
Old 11-Sep-2006, 15:01   #4
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

Quote:
Originally Posted by Nick View Post
G80 is new and improved? Wow I didn't even see the old one.
There!
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline  
Old 11-Sep-2006, 15:06   #5
Tim Murray
chaos dunk
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
Default

I still think GDDR3 plus a 512-bit bus is way more likely than GDDR4 on a 256-bit bus. (Take what they did during the NV30 era and reverse it!)

Okay, I guess I should elaborate. Given that ATI is already using GDDR4 and at the moment, only Samsung is producing GDDR4 in quantity, it would be foolish to assume that supplies would be plentiful enough to risk the G80's performance on its availability. With R580+ showing a nice jump in performance due to increased bandwidth, I think we can assume pretty easily that the next-generation chips, with geometry shaders and a ridiculous amount of fillrate compared to G71/R580, need as much bandwidth as possible. So... hooray 512-bit bus.
Tim Murray is offline  
Old 11-Sep-2006, 15:10   #6
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

Quote:
Originally Posted by The Baron View Post
I still think GDDR3 plus a 512-bit bus is way more likely than GDDR4 on a 256-bit bus. (Take what they did during the NV30 era and reverse it!)
Did you read the last two pages of the previous thread, sleepy-head?
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline  
Old 11-Sep-2006, 15:15   #7
Tim Murray
chaos dunk
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
Default

Quote:
Originally Posted by geo View Post
Did you read the last two pages of the previous thread, sleepy-head?
You know I don't do that. But okay, not a lot of GDDR4 floating around, hooray, I was right.
Tim Murray is offline  
Old 11-Sep-2006, 15:32   #8
Farid
Artist formely known as Vysez
 
Join Date: Mar 2004
Location: Paris, France
Posts: 3,899
Icon Idea According to some extremely talkative little bird

The bus width of G80 is interesting, as much as the number of RAM chips present on the board...
__________________
- Power corrupts and absolute power is kinda neat.
- If at first you don't succeed, put it out for beta test.
--Internets
Farid is offline  
Old 11-Sep-2006, 15:33   #9
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,309
Default

This is not a rumour but since NVIDIA has a couple of patents about this I suspect G80 might use its PS units to perform blending operations between incoming fragments and the frame buffer.
Basicly, even though D3D10 does not expose this AFAIK, PS units would be able to issue a special instruction which fetches into some registers all the subsamples colors potentially covered by a fragment and then blend them in the pixel shader.
A 'smart' driver would be able to dynamically patch a shader everytime we change blending modes.
I'm not saying that's easy to implement in hw (there are obviously some serious coherency/processing order issues to solve first ) but it would be nice in the future to have completely programmable blending modes at some point in the future
It is also quite straightforward to expect more and more fixed function units to be slowly phagocytized by programmable units as we have more of them and more complex/more powerful/more accurate ALUs

Marco
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
First they ignore you, then they laugh at you, then they fight you, then you win. [Mahatma Gandhi]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline  
Old 11-Sep-2006, 16:15   #10
Sunrise
Member
 
Join Date: Aug 2002
Posts: 306
Default

I don´t get one of Jen-Hsun Huang´s little sneak-peeks out of my head. Some months ago, he said something like: "With our next-generation graphics architecture, we want to further increase programming flexibility" and actually i´m still wondering what exactly he had in mind when he specifically mentioned "flexibility", while he was speaking a little about their future plans. Along with Jen-Hsung´s saying that "they want to innovate where it makes sense, instead of innovating like crazy" (like they did with NV30), i keep questioning myself what exactly would make sense here and in the future, WRT their first incarnation of a part that has to have enough potential to be at least worth another 2-3 years.

We´ve already seen some patents, but i´m still at a point where i can´t really see what he may have meant by that. Maybe i´m reading a little bit too much into it, but if there are any ideas, don´t hesitate to post them here.
Sunrise is offline  
Old 11-Sep-2006, 16:24   #11
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,810
Default

Quote:
Originally Posted by nAo View Post
This is not a rumour but since NVIDIA has a couple of patents about this I suspect G80 might use its PS units to perform blending operations between incoming fragments and the frame buffer.
Well that would be nice. One less client (ROP) to worry about when configuring the MC and you could probably do whatever floats your boat when it comes to AA.

Actually in a situation like this would the PS need its own link to the memory controller or will it go through the TMUs (whatever those might look like) ??
__________________
What the deuce!?

Last edited by trinibwoy; 11-Sep-2006 at 16:27.
trinibwoy is offline  
Old 11-Sep-2006, 16:37   #12
Sunrise
Member
 
Join Date: Aug 2002
Posts: 306
Default

Quote:
Originally Posted by Vysez View Post
...as much as the number of RAM chips present on the board...
One of the questions that comes to mind is, how exactly will it work? Looking at current PCB designs there is no place at all for 2 more RAM chips on one side (well, physically there is, but you would need to increase the PCB either in length or put them at the back) because you have to keep in mind that there is a limit as to how close you can put them against each other (because of termination, etc.) and when you place them further away this could lead to some potential problems. There is a reason why 8 chips per side is the maximum right now. You´d need a fair amount of intelligent pathing when only 2 modules are placed further away.
Sunrise is offline  
Old 11-Sep-2006, 17:50   #13
Razor1
Senior Member
 
Join Date: Jul 2004
Location: NY, NY
Posts: 2,680
Default

Quote:
Originally Posted by trinibwoy View Post
Well that would be nice. One less client (ROP) to worry about when configuring the MC and you could probably do whatever floats your boat when it comes to AA.

Actually in a situation like this would the PS need its own link to the memory controller or will it go through the TMUs (whatever those might look like) ??

Hmm interesting. I would think the same, a more programmable AA engine.

Not sure but if PS would go through the TMU's wouldn't that lock the TMU's? I think they would need thier own connections to the memory control.
Razor1 is offline  
Old 11-Sep-2006, 19:04   #14
_xxx_
Naughty Boy!
 
Join Date: Aug 2004
Location: Stuttgart, Germany
Posts: 5,008
Default

128 bits wide?
__________________
I have thought some of nature's journeymen had made men, and not made them well, they imitated humanity so abominably.
_xxx_ is offline  
Old 11-Sep-2006, 19:45   #15
Brimstone
B3D Shockwave Rider
 
Join Date: Feb 2002
Posts: 1,813
Default

My guess.

G80 is two g70 improved cores with geometry shaders added to the architechture. Improved A.A. and HDR support along with other tweaks.


The Sony PS3 RSX is comprised of just one of these cores.Two cores would require too much power and produce too much heat in a console form factor.
__________________
When God plays an online shooter he plays Shadowrun. He buys resurrection first round and selects Dwarf.

www.shadowrunshow.com
Brimstone is offline  
Old 11-Sep-2006, 19:47   #16
zsouthboy
Member
 
Join Date: Aug 2003
Location: Derry, NH
Posts: 563
Default

Quote:
Originally Posted by Brimstone View Post
My guess.

G80 is two g70 improved cores with geometry shaders added to the architechture. Improved A.A. and HDR support along with other tweaks.


The Sony PS3 RSX is comprised of just one of these cores.Two cores would require too much power and produce too much heat in a console form factor.

G80 has been in development much too long to be as simple as two G70s slapped together.
zsouthboy is offline  
Old 11-Sep-2006, 19:50   #17
zsouthboy
Member
 
Join Date: Aug 2003
Location: Derry, NH
Posts: 563
Default

Quote:
Originally Posted by Sunrise View Post
One of the questions that comes to mind is, how exactly will it work? Looking at current PCB designs there is no place at all for 2 more RAM chips on one side (well, physically there is, but you would need to increase the PCB either in length or put them at the back) because you have to keep in mind that there is a limit as to how close you can put them against each other (because of termination, etc.) and when you place them further away this could lead to some potential problems. There is a reason why 8 chips per side is the maximum right now. You´d need a fair amount of intelligent pathing when only 2 modules are placed further away.
I know it's not true, but I'll put it out there:
two PCB design?
like the 7950, only one board is all RAM?
obviously expense is a huge issue with that, etc.
zsouthboy is offline  
Old 11-Sep-2006, 20:04   #18
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,810
Default

Quote:
Originally Posted by Brimstone View Post
G80 is two g70 improved cores with geometry shaders added to the architechture. Improved A.A. and HDR support along with other tweaks.The Sony PS3 RSX is comprised of just one of these cores.Two cores would require too much power and produce too much heat in a console form factor.
I hope your guess isn't based on anything to do with consoles, RSX or PS3 - or is it the GX2 that's leading you down that path?
__________________
What the deuce!?
trinibwoy is offline  
Old 11-Sep-2006, 20:09   #19
trinibwoy
Meh
 
Join Date: Mar 2004
Location: New York
Posts: 9,810
Default

Quote:
Originally Posted by _xxx_ View Post
128 bits wide?
You could certainly make a case for dedicated framebuffer space/bandwidth on a high-end card given today's resolutions and HDR/AA requirements. You wouldnt have the crossbar complexity that sireric described earlier and it may even simplify accesses for the other clients like the TMU's. That's assuming that you can keep that dedicated bus saturated enough to justify its existence.
__________________
What the deuce!?
trinibwoy is offline  
Old 11-Sep-2006, 20:38   #20
Pete
Moderate Nuisance
 
Join Date: Feb 2002
Posts: 4,664
Default

Nice summary, geo. Kind of horrible to think we can deflate 29 pages into close to 29 words.

Should we also add 600+MHz and maybe even accept a 384bit bus, given trumphsiao's chirping in the penultimate page of the previous thread? He's been right before, IIRC. The "4:1 concept architecture" is the most interesting part. Are we talking 48 PS "processors" : 16 ROPs in G80 (assuming it still has discrete ROPs)? Are we talking 64 PS ALUs : 16 ROPs in R600, assuming an extra PS ALU per "pipe" (though I'd expect this at a very high core clock)?

(Or does G80 stick with 24 pixel shader "pipes/processors"--two DX9 PS ALUs each--but add two extra DX10 PS ALUs each? Nah, too NV30ish, if it's even possible.)

I've also heard 16 VS/GS processors, too, though I forget where (possibly in one of the OP's links).

48 pixel shader "processors" at 600+MHz sounds power and transistor hungry to me, weakly corroborating other rumors and perhaps hinting at 96 PS ALUs. It also makes Brimstone's "two G70s" theory not incredibly far-fetched, also considering 16 VS/GS shaders. That's twice a G70 in G70 terms, but obviously NV's been modifying the heck out of everything, so obviously it's not that simple.

What does the rumored new AA engine signify, updating the ROPs or folding them into the PSs?

Finally, nAo's talking about this, right?

I've been out of the loop awhile, thus the more-than-usual silly questions.

Last edited by Pete; 11-Sep-2006 at 20:51.
Pete is offline  
Old 11-Sep-2006, 20:40   #21
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,679
Default

I still don't think it makes much sense to have a dedicated bus. Yes, it is simpler, but GPU's have had unified buses for many years now. I doubt they'd take a step backwards like this.

After all, don't forget that it's not just the memory bandwidth that is being dedicated, but also the memory space. All individual areas of memory space are highly-variable in today's GPU designs.
Chalnoth is offline  
Old 11-Sep-2006, 22:58   #22
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

Quote:
Originally Posted by trinibwoy View Post
Well that would be nice. One less client (ROP) to worry about when configuring the MC and you could probably do whatever floats your boat when it comes to AA.

Actually in a situation like this would the PS need its own link to the memory controller or will it go through the TMUs (whatever those might look like) ??
It's my interpretation of patents, etc. that NVidia wants to merge TMU and ROP functionality into one programmable "unit".

Whether that unit is a decoupled pipeline that runs alongside the ALU pipeline, or is integrated as macros into the ALU pipeline, who knows... I expect the former initially.

So the end result is one point of access to memory.

---

There's an interesting, minor, corrollary with streamout in my view:

Streamout writes data to memory that then needs to be read back (sometime soon!) for rendering to continue. Streamout is a geometry (vertex) specific technique.

A lot of pixel shading techniques would benefit from writing a pixel value and then (sometime soon!) reading it for rendering to continue.

As it happens, in both cases "sometime soon!" is blocked - the dev is forced to flush things out and the whole thing is fairly clunky. It makes the parallelism of the GPU much easier to implement, but programmers apparently have been screaming they want "immediate read after write" for donkey's years.

So, in my view, both streamout and ROP-output make natural targets for "more timely" writing/reading.

Apart from what we might see in G80 (prolly only exposed in OGL 3.0? or as an NVidia extension in OGL?) I'm doubtful that this "fully programmable ROP" (and streamout?) will come any time soon, i.e. to DX.

I'm still unclear on the mechanics of read-after-write in a pixel shader. How restrictive would it end up?, and would those restrictions nullify most of the benefit devs have been dreaming about?

Jawed
Jawed is offline  
Old 11-Sep-2006, 23:28   #23
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,309
Default

Maybe we shouldn't think about 12 memory chips around one GPU.. what about 6 mem chips x 2 GPUs? ok ok..I shut up
The original patents I was referring to are these ones:

Pixel load instruction for a programmable graphics processor

Position conflict detection and avoidance in a programmable graphics processor

Position conflict detection and avoidance in a programmable graphics processor using tile coverage data

BTW..while I was checking those patents I found a new interesting one (LOL): what's the difference between a costant value held in a texture or in a costant register in the end? well, the latter must reside closer to your 'heart', so here we go:

Shader cache using a coherency protocol
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
First they ignore you, then they laugh at you, then they fight you, then you win. [Mahatma Gandhi]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline  
Old 11-Sep-2006, 23:34   #24
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 2,832
Send a message via Skype™ to fellix
Default

It's clear, that in a DX10 GPU there is little or no more place for fixed-function parts, so the ROPs either must go for full programmability or their functions shall fall back to the fragment pipes and thus all legacy blending/sampling op's must be emulated on driver/API level (as was for T'n'L).
I honestly bet for the second option, as it will save some level of complexity (in favour of extra VS/PS units) and will "close" more the memory interface to the fragment core, if it has now to deal with the burden of framebuffer op's in sampling/blending & etc. The other thing also is the support for virtual addressing in the GPU - will be there an extra (mini)AGU for each fragment pipe/quad or this function will be too consumed by the new "multipurpose" ALU's?
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline  
Old 11-Sep-2006, 23:48   #25
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,309
Default

Quote:
Originally Posted by fellix View Post
...or this function will be too consumed by the new "multipurpose" ALU's?
I second this option, imho nvidia will continue to use PS ALUs as AGUs for texturing ops and even for general purpose read/write memory ops (makes even more sense now that they are going to support integer ops as well, sharing the same computational units with their floating point counterparts)
At the same time I believe they will decouple TMUs from PS units since now they have to massively use them to serve multiple clients (VS/GS/PS).
I also wonder if they are going to have a single big L2 (texture) cache which will serve all texturing requestes from all possible clients or whether they will have a multiple dedicated L2s.
Wouldn't be nice having your pixel shader slowing down cause a mad vertex shading is thrashing all your texture cache, lol

Marco
__________________
[twitter]
More samples, we need more samples! [Dean Calver]
First they ignore you, then they laugh at you, then they fight you, then you win. [Mahatma Gandhi]
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way
nAo is offline  

 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:07.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.