View Full Version : The New and Improved "G80 Rumours Thread" *DailyTech specs at #802*
Pages :
[
1]
2
3
4
5
6
7
8
9
10
11
The original thread here (http://www.beyond3d.com/forum/showthread.php?t=28551) having become unwieldy. . . .
The B3D Forum Conventional Wisdom Watch (Minority Reports noted):
D3D10, 500M+ transistors, release sometime between September and end of the CY, probably 80nm (tho I've seen a minority report for 90nm), possibly GDDR3 with > 256-bit bus rather than GDDR4, HDR+AA. More new goodness on the AA side too, details unclear. Non-unified ps/vs. Power-hungry beastie, almost certainly with improved cooling vs G71.
Taking requests to add to this list. Do we have a unit count we are willing to point at as the Conventional Wisdom at this point? Xbit reported 48ps. . .willing to go with that for the moment?
Depending on how lazy I and my brethren are, we might try to keep this OP updated with particularly interesting new tidbits as they come in downstream, as an experiment to see how it works. Should be noted with "Update:"
Please note that this post is just meant to reflect the speculation included herein (and the previous thread, of course), rather than an official position of B3D, Inc! :lol:
Some relevant linkage along the way from the previous thread, for which only the authors are responsible for the accuracy thereof (i.e. don't bitch to me!):
http://www.xbitlabs.com/news/video/display/20060220100915.html
http://www.cooltechzone.com/Special_Reports/Insider_Series/NVIDIA_G80_Delayed_200604092276/
http://www.beyond3d.com/forum/showthread.php?t=30014
http://www.dailytech.com/article.aspx?newsid=2785
http://www.theinquirer.net/default.aspx?article=32385
http://www.beyond3d.com/forum/showthread.php?p=775737#post775737
http://www.theinquirer.net/default.aspx?article=32768
http://www.theinquirer.net/default.aspx?article=32856
http://www.digitimes.com/NewsShow/MailHome.asp?datePublish=2006/6/30&pages=A1&seq=2
http://gpu-fun.spaces.live.com/PersonalSpace.aspx?_c01_ListID=cns!FB5EEACE075E135 0!119&_c=links:119
http://www.beyond3d.com/forum/showpost.php?p=788469&postcount=361
http://www.theinquirer.net/default.aspx?article=33260
http://www.extremetech.com/article2/0,1697,1987258,00.asp
http://translate.google.com/translate?u=http%3A%2F%2Fpc.watch.impress.co.jp%2F docs%2F2006%2F0727%2Fkaigai291.htm&langpair=ja%7Cen&hl=en&ie=UTF8
http://www.beyond3d.com/forum/showpost.php?p=804990&postcount=485
http://www.beyond3d.com/forum/showpost.php?p=805099&postcount=493
http://www.forbes.com/2006/08/18/nvidia-0818markets10.html?partner=yahootix
http://www.beyond3d.com/forum/showpost.php?p=828033&postcount=688
Update 9/12/2006: CW seems to be looking 600-700mhz core.
http://www.theinquirer.net/default.aspx?article=34319
48ps confirmed, here, follow the link download "graphics track": http://www.beyond3d.com/forum/showthread.php?t=33605
Update 9/18/2006: VR-Zone takes their shot at immortality as either prophets or buffoons: http://www.vr-zone.com/?i=4007
Update 9/29/2006: Some interesting pics here, including 12 memory chips, indicating the likliehood of a 384-bit memory bus and 768MB framebuffer: http://www.beyond3d.com/forum/showpost.php?p=841400&postcount=620
Update 10/05/2006: DailyTech mostly confirms VR-Zone's specs: http://www.beyond3d.com/forum/showpost.php?p=845779&postcount=802
*Added the "u" to rumours for our Brittanic overlord. :smile:
G80 is new and improved? Wow I didn't even see the old one. ;)
trinibwoy
11-Sep-2006, 14:48
Complicated thing, that english language eh? :lol:
I have to say though - you know you're at a quality establishment when the rumours thread is so well structured :D
G80 is new and improved? Wow I didn't even see the old one. ;)
There! :razz:
Tim Murray
11-Sep-2006, 15:06
I still think GDDR3 plus a 512-bit bus is way more likely than GDDR4 on a 256-bit bus. (Take what they did during the NV30 era and reverse it!)
Okay, I guess I should elaborate. Given that ATI is already using GDDR4 and at the moment, only Samsung is producing GDDR4 in quantity, it would be foolish to assume that supplies would be plentiful enough to risk the G80's performance on its availability. With R580+ showing a nice jump in performance due to increased bandwidth, I think we can assume pretty easily that the next-generation chips, with geometry shaders and a ridiculous amount of fillrate compared to G71/R580, need as much bandwidth as possible. So... hooray 512-bit bus.
I still think GDDR3 plus a 512-bit bus is way more likely than GDDR4 on a 256-bit bus. (Take what they did during the NV30 era and reverse it!)
Did you read the last two pages of the previous thread, sleepy-head?
Tim Murray
11-Sep-2006, 15:15
Did you read the last two pages of the previous thread, sleepy-head?
You know I don't do that. But okay, not a lot of GDDR4 floating around, hooray, I was right.
The bus width of G80 is interesting, as much as the number of RAM chips present on the board...
This is not a rumour but since NVIDIA has a couple of patents about this I suspect G80 might use its PS units to perform blending operations between incoming fragments and the frame buffer.
Basicly, even though D3D10 does not expose this AFAIK, PS units would be able to issue a special instruction which fetches into some registers all the subsamples colors potentially covered by a fragment and then blend them in the pixel shader.
A 'smart' driver would be able to dynamically patch a shader everytime we change blending modes.
I'm not saying that's easy to implement in hw (there are obviously some serious coherency/processing order issues to solve first :) ) but it would be nice in the future to have completely programmable blending modes at some point in the future :)
It is also quite straightforward to expect more and more fixed function units to be slowly phagocytized by programmable units as we have more of them and more complex/more powerful/more accurate ALUs
Marco
Sunrise
11-Sep-2006, 16:15
I don´t get one of Jen-Hsun Huang´s little sneak-peeks out of my head. Some months ago, he said something like: "With our next-generation graphics architecture, we want to further increase programming flexibility" and actually i´m still wondering what exactly he had in mind when he specifically mentioned "flexibility", while he was speaking a little about their future plans. Along with Jen-Hsung´s saying that "they want to innovate where it makes sense, instead of innovating like crazy" (like they did with NV30), i keep questioning myself what exactly would make sense here and in the future, WRT their first incarnation of a part that has to have enough potential to be at least worth another 2-3 years.
We´ve already seen some patents, but i´m still at a point where i can´t really see what he may have meant by that. Maybe i´m reading a little bit too much into it, but if there are any ideas, don´t hesitate to post them here.
trinibwoy
11-Sep-2006, 16:24
This is not a rumour but since NVIDIA has a couple of patents about this I suspect G80 might use its PS units to perform blending operations between incoming fragments and the frame buffer.
Well that would be nice. One less client (ROP) to worry about when configuring the MC and you could probably do whatever floats your boat when it comes to AA.
Actually in a situation like this would the PS need its own link to the memory controller or will it go through the TMUs (whatever those might look like) ??
Sunrise
11-Sep-2006, 16:37
...as much as the number of RAM chips present on the board...
One of the questions that comes to mind is, how exactly will it work? Looking at current PCB designs there is no place at all for 2 more RAM chips on one side (well, physically there is, but you would need to increase the PCB either in length or put them at the back) because you have to keep in mind that there is a limit as to how close you can put them against each other (because of termination, etc.) and when you place them further away this could lead to some potential problems. There is a reason why 8 chips per side is the maximum right now. You´d need a fair amount of intelligent pathing when only 2 modules are placed further away.
Well that would be nice. One less client (ROP) to worry about when configuring the MC and you could probably do whatever floats your boat when it comes to AA.
Actually in a situation like this would the PS need its own link to the memory controller or will it go through the TMUs (whatever those might look like) ??
Hmm interesting. I would think the same, a more programmable AA engine.
Not sure but if PS would go through the TMU's wouldn't that lock the TMU's? I think they would need thier own connections to the memory control.
Brimstone
11-Sep-2006, 19:45
My guess.
G80 is two g70 improved cores with geometry shaders added to the architechture. Improved A.A. and HDR support along with other tweaks.
The Sony PS3 RSX is comprised of just one of these cores.Two cores would require too much power and produce too much heat in a console form factor.
zsouthboy
11-Sep-2006, 19:47
My guess.
G80 is two g70 improved cores with geometry shaders added to the architechture. Improved A.A. and HDR support along with other tweaks.
The Sony PS3 RSX is comprised of just one of these cores.Two cores would require too much power and produce too much heat in a console form factor.
G80 has been in development much too long to be as simple as two G70s slapped together.
zsouthboy
11-Sep-2006, 19:50
One of the questions that comes to mind is, how exactly will it work? Looking at current PCB designs there is no place at all for 2 more RAM chips on one side (well, physically there is, but you would need to increase the PCB either in length or put them at the back) because you have to keep in mind that there is a limit as to how close you can put them against each other (because of termination, etc.) and when you place them further away this could lead to some potential problems. There is a reason why 8 chips per side is the maximum right now. You´d need a fair amount of intelligent pathing when only 2 modules are placed further away.
I know it's not true, but I'll put it out there:
two PCB design?
like the 7950, only one board is all RAM?
obviously expense is a huge issue with that, etc.
trinibwoy
11-Sep-2006, 20:04
G80 is two g70 improved cores with geometry shaders added to the architechture. Improved A.A. and HDR support along with other tweaks.The Sony PS3 RSX is comprised of just one of these cores.Two cores would require too much power and produce too much heat in a console form factor.
I hope your guess isn't based on anything to do with consoles, RSX or PS3 - or is it the GX2 that's leading you down that path?
trinibwoy
11-Sep-2006, 20:09
128 bits wide?
You could certainly make a case for dedicated framebuffer space/bandwidth on a high-end card given today's resolutions and HDR/AA requirements. You wouldnt have the crossbar complexity that sireric described earlier and it may even simplify accesses for the other clients like the TMU's. That's assuming that you can keep that dedicated bus saturated enough to justify its existence.
Nice summary, geo. Kind of horrible to think we can deflate 29 pages into close to 29 words. :lol:
Should we also add 600+MHz and maybe even accept a 384bit bus, given trumphsiao's chirping in the penultimate page of the previous thread? He's been right before, IIRC. The "4:1 concept architecture" is the most interesting part. Are we talking 48 PS "processors" : 16 ROPs in G80 (assuming it still has discrete ROPs)? Are we talking 64 PS ALUs : 16 ROPs in R600, assuming an extra PS ALU per "pipe" (though I'd expect this at a very high core clock)?
(Or does G80 stick with 24 pixel shader "pipes/processors"--two DX9 PS ALUs each--but add two extra DX10 PS ALUs each? Nah, too NV30ish, if it's even possible.)
I've also heard 16 VS/GS processors, too, though I forget where (possibly in one of the OP's links).
48 pixel shader "processors" at 600+MHz sounds power and transistor hungry to me, weakly corroborating other rumors and perhaps hinting at 96 PS ALUs. It also makes Brimstone's "two G70s" theory not incredibly far-fetched, also considering 16 VS/GS shaders. That's twice a G70 in G70 terms, but obviously NV's been modifying the heck out of everything, so obviously it's not that simple.
What does the rumored new AA engine signify, updating the ROPs or folding them into the PSs?
Finally, nAo's talking about this (http://www.beyond3d.com/articles/directxnext/index.php?p=6), right?
I've been out of the loop awhile, thus the more-than-usual silly questions.
Chalnoth
11-Sep-2006, 20:40
I still don't think it makes much sense to have a dedicated bus. Yes, it is simpler, but GPU's have had unified buses for many years now. I doubt they'd take a step backwards like this.
After all, don't forget that it's not just the memory bandwidth that is being dedicated, but also the memory space. All individual areas of memory space are highly-variable in today's GPU designs.
Well that would be nice. One less client (ROP) to worry about when configuring the MC and you could probably do whatever floats your boat when it comes to AA.
Actually in a situation like this would the PS need its own link to the memory controller or will it go through the TMUs (whatever those might look like) ??
It's my interpretation of patents, etc. that NVidia wants to merge TMU and ROP functionality into one programmable "unit".
Whether that unit is a decoupled pipeline that runs alongside the ALU pipeline, or is integrated as macros into the ALU pipeline, who knows... I expect the former initially.
So the end result is one point of access to memory.
---
There's an interesting, minor, corrollary with streamout in my view:
Streamout writes data to memory that then needs to be read back (sometime soon!) for rendering to continue. Streamout is a geometry (vertex) specific technique.
A lot of pixel shading techniques would benefit from writing a pixel value and then (sometime soon!) reading it for rendering to continue.
As it happens, in both cases "sometime soon!" is blocked - the dev is forced to flush things out and the whole thing is fairly clunky. It makes the parallelism of the GPU much easier to implement, but programmers apparently have been screaming they want "immediate read after write" for donkey's years.
So, in my view, both streamout and ROP-output make natural targets for "more timely" writing/reading.
Apart from what we might see in G80 (prolly only exposed in OGL 3.0? or as an NVidia extension in OGL?) I'm doubtful that this "fully programmable ROP" (and streamout?) will come any time soon, i.e. to DX.
I'm still unclear on the mechanics of read-after-write in a pixel shader. How restrictive would it end up?, and would those restrictions nullify most of the benefit devs have been dreaming about?
Jawed
Maybe we shouldn't think about 12 memory chips around one GPU.. what about 6 mem chips x 2 GPUs? :idea: ok ok..I shut up :wink:
The original patents I was referring to are these ones:
Pixel load instruction for a programmable graphics processor (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US7091979&F=0)
Position conflict detection and avoidance in a programmable graphics processor (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US7053904&F=0)
Position conflict detection and avoidance in a programmable graphics processor using tile coverage data (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US7053893&F=0)
BTW..while I was checking those patents I found a new interesting one (LOL): what's the difference between a costant value held in a texture or in a costant register in the end? well, the latter must reside closer to your 'heart', so here we go:
Shader cache using a coherency protocol (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US7103720&F=0)
It's clear, that in a DX10 GPU there is little or no more place for fixed-function parts, so the ROPs either must go for full programmability or their functions shall fall back to the fragment pipes and thus all legacy blending/sampling op's must be emulated on driver/API level (as was for T'n'L).
I honestly bet for the second option, as it will save some level of complexity (in favour of extra VS/PS units) and will "close" more the memory interface to the fragment core, if it has now to deal with the burden of framebuffer op's in sampling/blending & etc. The other thing also is the support for virtual addressing in the GPU - will be there an extra (mini)AGU for each fragment pipe/quad or this function will be too consumed by the new "multipurpose" ALU's?
...or this function will be too consumed by the new "multipurpose" ALU's?
I second this option, imho nvidia will continue to use PS ALUs as AGUs for texturing ops and even for general purpose read/write memory ops (makes even more sense now that they are going to support integer ops as well, sharing the same computational units with their floating point counterparts)
At the same time I believe they will decouple TMUs from PS units since now they have to massively use them to serve multiple clients (VS/GS/PS).
I also wonder if they are going to have a single big L2 (texture) cache which will serve all texturing requestes from all possible clients or whether they will have a multiple dedicated L2s.
Wouldn't be nice having your pixel shader slowing down cause a mad vertex shading is thrashing all your texture cache, lol :)
Marco
That's what associativity is for...
Jawed
Nice summary, geo. Kind of horrible to think we can deflate 29 pages into close to 29 words. :lol:
Should we also add 600+MHz and maybe even accept a 384bit bus, given trumphsiao's chirping in the penultimate page of the previous thread? He's been right before, IIRC. The "4:1 concept architecture" is the most interesting part. Are we talking 48 PS "processors" : 16 ROPs in G80 (assuming it still has discrete ROPs)? Are we talking 64 PS ALUs : 16 ROPs in R600, assuming an extra PS ALU per "pipe" (though I'd expect this at a very high core clock)?
(Or does G80 stick with 24 pixel shader "pipes/processors"--two DX9 PS ALUs each--but add two extra DX10 PS ALUs each? Nah, too NV30ish, if it's even possible.)
I've also heard 16 VS/GS processors, too, though I forget where (possibly in one of the OP's links).
48 pixel shader "processors" at 600+MHz sounds power and transistor hungry to me, weakly corroborating other rumors and perhaps hinting at 96 PS ALUs. It also makes Brimstone's "two G70s" theory not incredibly far-fetched, also considering 16 VS/GS shaders. That's twice a G70 in G70 terms, but obviously NV's been modifying the heck out of everything, so obviously it's not that simple.
What does the rumored new AA engine signify, updating the ROPs or folding them into the PSs?
Finally, nAo's talking about this (http://www.beyond3d.com/articles/directxnext/index.php?p=6), right?
I've been out of the loop awhile, thus the more-than-usual silly questions.
Do I hear a second for any of these? :smile: Trying to keep it consensus-y.
Chose the wording of > 256-bit carefully, to allow for either 384 or 512. :wink: One second hand reported "spotting" does not spring make. Or something like that. Show me a piccie, and I'm in. :)
JF_Aidan_Pryde
12-Sep-2006, 00:51
What clock rate do you guys figure?
INKster
12-Sep-2006, 01:01
No more than 550MHz, that's my take.
That's what associativity is for...
Jawed
no degree of associativity will automagically solve texture cache(s) trashing
No more than 550MHz, that's my take.
My guess is 500 Mhz or even below at 90 nm..
no degree of associativity will automagically solve texture cache(s) trashing
Depends on how you parameterise the associativity and then frobnicate the tiling.
Judging by the incredibly efficient texturing in R580 (which also runs texturing out of order from multiple batches, just like a unified shader), you're going to need to come up with some evidence for your assertion.
Jawed
trumphsiao
12-Sep-2006, 01:39
My guess is 500 Mhz or even below at 90 nm..
700MHz for sure.
people who want to upgrade from G71/R580 to G80/R600 shall bear in mind that advance architecture will give you less performance pickup on less-ALUs computaional game .
trumphsiao
12-Sep-2006, 01:42
I believe either G80 or R600 are 4:1 concept architecture.
Depends on how you parameterise the associativity and then frobnicate the tiling.
You can improve things, you can't fix them.
If one thread decides to go berserk and to randomly fetch all the memory in your system as quick as it can there's NOTHING you can do to stop it.
Judging by the incredibly efficient texturing in R580 (which also runs texturing out of order from multiple batches, just like a unified shader), you're going to need to come up with some evidence for your assertion.
You missed my point, last time I checked R580 very efficient TMUs as serving one single shader (as they don't really support vertex texturing at all), which is expected (not in every case) to express some degree of coherency (fetching the same textures, using similar patterns and so on..)
I'm talking about 2 completely different entities that must be compete for the same resource in 2 completely different ways.
Even on SM3.0 shaders if you fill a texture with random values and you use dependant texture reads to perturb texture coordinates you can easily trash your texture cache despite any degree of associativity. It's just going to be much easier to do this with SM4 shaders and D3D10 rendering pipeline.
p.s. I don't even want to think about trying to randomly index texture arrays, lol :)
700MHz for sure.
people who want to upgrade from G71/R580 to G80/R600 shall bear in mind that advance architecture will give you less performance pickup on less-ALUs computaional game .
Yeah..but they're supposed to have more pipeline and more memory bandwidth, that should make up for the lack of Mhz
I believe either G80 or R600 are 4:1 concept architecture.
Can you elaborate on this? :)
If G80 (according rumours) has 16 VS/GS and 32 PS with 32 TMUs and one ALU per VS pipe and 2 ALUs per PS pipe we end up having a 2.5 ratio.
To approach 4:1 we should assume something like 3 ALUs per PS pipe or lower the number of TMUs to 24 or so, but I don't like the latter at all :)
INKster
12-Sep-2006, 02:29
Yeah..but they're supposed to have more pipeline and more memory bandwidth, that should make up for the lack of Mhz
Can you elaborate on this? :)
If G80 (according rumours) has 16 VS/GS and 32 PS with 32 TMUs and one ALU per VS pipe and 2 ALUs per PS pipe we end up having a 2.5 ratio.
To approach 4:1 we should assume something like 3 ALUs per PS pipe or lower the number of TMUs to 24 or so, but I don't like the latter at all :)
Unless each of those TMU's are completely different in their habilities from NV40/NV45/G70/G71... ;)
Unless each of those TMU's are completely different in their habilities from NV40/NV45/G70/G71... ;)
What do you mean? are you hinting to slower texturing rates per TMU? (like 2 clock cycles per bilinear sample???)
trumphsiao
12-Sep-2006, 02:35
Yeah..but they're supposed to have more pipeline and more memory bandwidth, that should make up for the lack of Mhz
Can you elaborate on this? :)
If G80 (according rumours) has 16 VS/GS and 32 PS with 32 TMUs and one ALU per VS pipe and 2 ALUs per PS pipe we end up having a 2.5 ratio.
To approach 4:1 we should assume something like 3 ALUs per PS pipe or lower the number of TMUs to 24 or so, but I don't like the latter at all :)
from What I heard
(Rumors)
G80:32X2 ALUs: 16TMUs
sample benchmark indicates that G80 could be slower than /equal to previous generation on numerous older game.
this round better architecture and good clock scaling are equally important.
R600 will be quite faster than G80 on next 3DMark.
R600
1.architecture is better than G80 but speed scaling right now is still in stall.
G80
1.Good architecture for appropriate right time only but needs better speed scaling for lack of advance.(have to reach 1GHz in case of faster-clock R600 )
trumphsiao
12-Sep-2006, 02:39
What do you mean? are you hinting to slower texturing rates per TMU? (like 2 clock cycles per bilinear sample???)
Nvidia is a slothful giant.:wink:
from What I heard
(Rumors)
G80:32X2 ALUs: 16TMUs.
WHAT??! so they should have designed a huge 500+ MTransistor part with only 16 TMUs on a 90 nm process expecting to make up for the lack of texturing performances clocking it at insane speeds.. no, I don't believe that.
G80
1.Good architecture for appropriate right time only but needs better speed scaling for lack of advance.(have to reach 1GHz in case of faster-clock R600 ) LOL, dunno where you picked up these rumours, but I don't believe this stuff can be trusted.
You can improve things, you can't fix them.
If one thread decides to go berserk and to randomly fetch all the memory in your system as quick as it can there's NOTHING you can do to stop it.
But ATI's architectures have a thread manager which talks to the cache and memory systems.
On top of that, each of R580's four PS pipes has an entirely private cache - so even if one thread is thrashing on one pipeline, the remaining pipes' texture caches are entirely free. They would also need a barmy thread of their own to suffer direct cache thrashing.
None of this is to deny the possibility of thrashing, per se. But you're trying to suggest that the designers didn't think of this...
You missed my point, last time I checked R580 very efficient TMUs as serving one single shader, which is expected (not in every case) to express some degree of coherency (fetching the same textures, using similar patterns and so on..)
Not that I'm aware of. Texturing is as freely schedulable in R580 as math. It might be the same single shader on all four pipelines, but the program counter for the texturing ops will be all over the place across the entire set of batches in flight. There's up to 128 threads (batches) per pipe and each thread's texturing can proceed independently of the others'. Coherent accesses obviously multiply in benefit if they're sequenced together. Dependent accesses are obviously best spread out in time because they'll generate worst-case latency all the time (as well as cache and memory thrashing).
I'm talking about 2 completely different entities that must be compete for the same resource in 2 completely different ways.
Ah well that's the point isn't it? You have one shader program making coherent, cache-friendly accesses and another shader (or another part of the same shader, say the final dependent texturing op) going barmy. The thread manager in collaboration with the cache and memory manager controls the intensity of the thrashing. The barmy stuff doesn't need to grab full control of a pipeline in a truly out-of-order GPU (unless all the code is barmy).
And, the barmy code will not only be thrashing cache, but it will also be thrashing memory. So that's a double-incentive for the thread manager to throttle the nasty stuff. ATI's GPUs are no longer statically threaded. They don't just stand there and go "Oh no!"
Even on SM3.0 shaders if you fill a texture with random values and you use dependant texture reads to perturb texture coordinates you can easily trash your texture cache despite any degree of associativity. It's just going to be much easier to do this with SM4 shaders and D3D10 rendering pipeline.
Obviously if all the code you're running is cache-thrashing, then yes, you'll have cache thrashing. But it's intrinsic in the design of a unified GPU that you manage the threads' priorities and scheduling... I dare say that's the special sauce that ATI's particularly proud (and secretive) about.
p.s. I don't even want to think about trying to randomly index texture arrays, lol :)
I'm fascinated to see how this new stuff in D3D10 works out - e.g. how the logical overlap between texture arrays and constant buffers hangs together. Dimensions that practically go off into infinity, writes to memory from pretty much everywhere, potentially large inter-stage buffers (exacerbated in a unified GPU) - I can't wait...
Jawed
from What I heard
(Rumors)
G80:32X2 ALUs: 16TMUs
sample benchmark indicates that G80 could be slower than /equal to previous generation on numerous older game.
this round better architecture and good clock scaling are equally important.
R600 will be quite faster than G80 on next 3DMark.
R600
1.architecture is better than G80 but speed scaling right now is still in stall.
G80
1.Good architecture for appropriate right time only but needs better speed scaling for lack of advance.(have to reach 1GHz in case of faster-clock R600 )
nV won't be going lower then 24 TMU's its seen that 24 TMU's are neccessary in some cases (they might go with 16 if Jawad is correct about the hybrid TMU/ALU structure, but again newer games will increase in TMU usage aswell, so it won't be beneficial to cut them back). A 500 mill trani chip I can see may hit 600, but thats it, but it will use double the power requirements of a g71 and maybe more if it does hit 600. So I don't see it going past 600. Increasing clocks really won't be very important, if the new GPU's are more effecient. The g80 has to be faster then gf7950 anyways in current games, no way around that, no one would buy it if it wasn't. Otherwise it will be going from a gf 4 to a gf fx, at useless upgrade.
well, 16 "TMU"s could maybe make sense if a TMU is a shared resource: i.e. a group of 1 VS/GS shader and 2 PS shaders would share one memory access point/TMU. Presumably such a TMU would be able to ouput more than a single bilinear sample per clock...
On top of that, each of R580's four PS pipes has an entirely private cache - so even if one thread is thrashing on one pipeline, the remaining pipes' texture caches are entirely free. They would also need a barmy thread of their own to suffer direct cache thrashing.
Are the 'private' caches just L1 caches which fetch data from a bigger and slower L2 cache? if the answer is affirmative so you're proving my point and thanks, case closed :)
None of this is to deny the possibility of thrashing, per se. But you're trying to suggest that the designers didn't think of this... I never suggested anything like that, I was only wondering what kind of caches design they're going to use.
It's not like they don't know what to do 1000x times better than me.
Ah well that's the point isn't it? You have one shader program making coherent, cache-friendly accesses and another shader (or another part of the same shader, say the final dependent texturing op) going barmy. The thread manager in collaboration with the cache and memory manager controls the intensity of the thrashing. The barmy stuff doesn't need to grab full control of a pipeline in a truly out-of-order GPU (unless all the code is barmy). You forgot the case which the mad thread is generating primitives for your non mad thread thus throttling the first one stall the second one as well, but here we're really going OT, open a new thread if you want to discuss about this.
ATI's GPUs are no longer statically threaded. They don't just stand there and go "Oh no!" LOL, I was not even talking about ATI's GPUs
well, 16 "TMU"s could maybe make sense if a TMU is a shared resource: i.e. a group of 1 VS/GS shader and 2 PS shaders would share one memory access point/TMU. Presumably such a TMU would be able to ouput more than a single bilinear sample per clock...
I believe TMUs on G80 will be a shared resource, but in a scenario like that your ALU-TMU ratio as we used to compute is till now would not make any sense
Bigus Dickus
12-Sep-2006, 03:33
It is also quite straightforward to expect more and more fixed function units to be slowly phagocytized...
Ugh. I'm four weeks into medical school, and that wasn't a word I was hoping to see here when taking a "mind break." Thanks.
:)
You forgot the case which the mad thread is generating primitives for your non mad thread thus throttling the first one stall the second one as well, but here we're really going OT, open a new thread if you want to discuss about this.
OK, well this misconception about a unified architecture clearly indicates we're talking at cross-purposes anyway.
Jawed
OK, well this misconception about a unified architecture clearly indicates we're talking at cross-purposes anyway.
Is unified shading slowly becoming a religion? did not want to hurt your feelings ;)(btw..I think I was one of the first people talking about it several years ago in a pre shader era on this forum)
Hey, you're the one suggesting that GPU designers don't consider texture thrashing when making a whizz-bangy out of order or unified GPU.
Although if you're suggesting that NVidia has stumbled on this point so far, then erm, maybe that's why they're so reluctant. Right?
:???:
Jawed
Jawed, you're completely out of track, I was not criticising out of order computations or unified shading.BTW every tech has some shortcomings or flaws, yep..unified shading too :)
silent_guy
12-Sep-2006, 05:07
nV won't be going lower then 24 TMU's its seen that 24 TMU's are neccessary in some cases (they might go with 16 if Jawad is correct about the hybrid TMU/ALU structure, but again newer games will increase in TMU usage aswell, so it won't be beneficial to cut them back).
Forgive my ignorance on this, but I never quite understood why the number of TMU's is so critically important for decoupled architectures.
Isn't it true that in that case the number of TMU's doesn't really matter all that much and that the rate at which texture fetches can be performed is much more important? I mean, as soon as your fetching bandwidth exceeds your memory bandwidth, an additional TMU won't make a big difference, will it? What's so special about more TMU's other than intrincically adding more associativity to your L1 caches (assuming that each TMU has one.)
A 500 mill trani chip I can see may hit 600, but thats it, but it will use double the power requirements of a g71 and maybe more if it does hit 600. So I don't see it going past 600.
In a race of 'mine is bigger and better', I think we'll have to live with insane power consumption anyway, no matter the chosen architecture...
But for a specific target performance, it's generally better to increase clock speeds than to increase area, since the latter is better for both performance per Watt and for performance per $.
So given enough time, I'd design for speed rather than for area.
Increasing clocks really won't be very important, if the new GPU's are more effecient.
Depends on how you define 'more efficient'.
In a race of 'mine is bigger and better', I think we'll have to live with insane power consumption anyway, no matter the chosen architecture...
The whispers on the wind keep getting uglier and uglier on this point, and not for just one of the major IHVs. . .
Ailuros
12-Sep-2006, 05:38
But for a specific target performance, it's generally better to increase clock speeds than to increase area, since the latter is better for both performance per Watt and for performance per $.
Not definitely and not in all cases. High frequencies in the high end segment usually come with high risks and lower margins. G71 compared to G70 had a smaller transistor count, a lot higher frequency, but was also produced on a smaller manufacturing process.
Where exactly does that put the G71 in terms of performance per Watt and dollar compared to G70?
There are no absolutes, since GPUs are highly complicated beasts and there are way too many factors involved to come to such oversimplyfied conclusions. Especially since above conclusion hasn't proven to be more efficient neither in the GPU nor in the CPU markets in the past.
Back on topic: I'd say that speculating from the "end" of the pipeline being ROPs (or a possible lack thereof...albeit it sounds too early for that), might be helpful, but that's just me.
Ailuros
12-Sep-2006, 05:45
The whispers on the wind keep getting uglier and uglier on this point, and not for just one of the major IHVs. . .
Imagine what would happen if either G80 or R6x0 wouldn't end up being by X persentage faster than their predecessors at least in a healthy number of case scenarios.
You can't have it all; you can't have D3D10's quite heavy requirements, higher programmability and by X higher performance in a package right now with "modest" power consumption.
trumphsiao
12-Sep-2006, 05:49
The whispers on the wind keep getting uglier and uglier on this point, and not for just one of the major IHVs. . .
but somehow die size is about the design of density .
silent_guy
12-Sep-2006, 06:04
Not definitely and not in all cases. High frequencies in the high end segment usually come with high risks and lower margins.
G71 is in the exponential part of the cost / area curve. Even more so for G70. Yield loss due to speed is much more controlled unless you start pushing speeds that were far outside the initial spec. That's not an uncommon thing to do, I've seen chip commercialized at speeds that were 40% over their initial design spec at slow, but this is usally only done later in the lifetime of the chip, as process controlability improves and variances go down (they always do.)
G71 compared to G70 had a smaller transistor count, a lot higher frequency, but was also produced on a smaller manufacturing process.
Where exactly does that put the G71 in terms of performance per Watt and dollar compared to G70?
With 90nm entering the mature period of its existence and 110nm more or less in maintenance mode, I wouldn't be surprised if G71 is not only better in terms of !/$, but also in terms of pure $, compared to G70.
There are no absolutes, since GPUs are highly complicated beasts and there are way too many factors involved to come to such oversimplyfied conclusions. Especially since above conclusion hasn't proven to be more efficient neither in the GPU nor in the CPU markets in the past.
Agreed, it's an oversimplyfied conclusion.
The speed is better argument took somewhat of a hit with the P4, but the circumstances there were different in that, starting from already high speeds, they had to take it too extremes. In GPU land, there's definitely more margin for improvement... Anyway, I can live with 600 MHz. :wink:
dizietsma
12-Sep-2006, 07:26
On the memory side I highly doubt 32x12 being seen. Much more likely to be 8x32 of the fastest GDDR4 chips they can get. For the gpu I am plumping for a simple 32ps/12vs device possibly on 80nm if the 7600's show good promise.
Given those simple hardware choices they can spend the R&D on the architecture tweaks.
On the memory side I highly doubt 32x12 being seen. Much more likely to be 8x32 of the fastest GDDR4 chips they can get. For the gpu I am plumping for a simple 32ps/12vs device possibly on 80nm if the 7600's show good promise.
Given those simple hardware choices they can spend the R&D on the architecture tweaks.
Your timescales seem completely off, if you're talking about the launch version of G80. Process is now fixed and the R&D for the chip was finished a long time ago. It's just about to enter MP for chrissakes, the chip is utterly done.
LeStoffer
12-Sep-2006, 08:59
700MHz for sure.
people who want to upgrade from G71/R580 to G80/R600 shall bear in mind that advance architecture will give you less performance pickup on less-ALUs computaional game .
I highly doubt everything over 650 MHz even on the top end part. The 7900 GTX runs at 650 MHz and the G80 chip is huge in transitor count.
Remember that projected core speeds (the early rumours we get) a lot of time falls short when the chip is finally launched. So yes, they might target a 700+ chip, but reality will problably give them between 600 og 650.
Ailuros
12-Sep-2006, 11:05
Agreed, it's an oversimplyfied conclusion.
The speed is better argument took somewhat of a hit with the P4, but the circumstances there were different in that, starting from already high speeds, they had to take it too extremes. In GPU land, there's definitely more margin for improvement... Anyway, I can live with 600 MHz. :wink:
I wish it would be Pentium4 vs Athlon only. How about FX5800 vs. 9700PRO, X800XT PE vs. 6800Ultra, 7800GTX vs. X1800XT etc as further examples instead? While I fully agree that companies often drive specific designs over the initially planned frequency in order to compete more efficiently, I don't see a very high initially planned core frequency as being ideal or better than low frequencies/large die areas. Can I have something that is balanced in both aspects? ;)
Forgive my ignorance on this, but I never quite understood why the number of TMU's is so critically important for decoupled architectures.
Isn't it true that in that case the number of TMU's doesn't really matter all that much and that the rate at which texture fetches can be performed is much more important? I mean, as soon as your fetching bandwidth exceeds your memory bandwidth, an additional TMU won't make a big difference, will it? What's so special about more TMU's other than intrincically adding more associativity to your L1 caches (assuming that each TMU has one.)
In a race of 'mine is bigger and better', I think we'll have to live with insane power consumption anyway, no matter the chosen architecture...
But for a specific target performance, it's generally better to increase clock speeds than to increase area, since the latter is better for both performance per Watt and for performance per $.
So given enough time, I'd design for speed rather than for area.
Depends on how you define 'more efficient'.
Lets say 16 TMU's ain't going to be enough for games like Crysis, which now use ambient occlusion maps + all the texture maps from before. So now there will be more burden in future games on TMU's, not as much as the increase on pixel shaders, but still its there.
Highly unlikely we will be seeing single card solutions needing 300 watts of power to run. I can see an entire base system using 300 watts, but not just the graphics card. I think we will see the g80 at 500 mhz to keep the power envolpe at are alittle higher then the gx2.
Efficiency per clock
It will draw more power. Even if the core doesn't, the higher clocked (and more) RAM will.
ChrisRay
12-Sep-2006, 12:53
Memory to PCB? Or are you expecting to see 1 gigabyte graphic cards in the next gen? The 7950 GX2 is an odd card to compare due to its unusual makeup compared to most single core cards.
INKster
12-Sep-2006, 13:01
It will draw more power. Even if the core doesn't, the higher clocked (and more) RAM will.
Not if they use GDDR4 (1.5v) with its lower voltage compared to standard GDDR3 (1.8v).
Current rumours point to DDR3 though. And 12 memory chips (?).
I believe that the G80 will be:
10 Vertex Shaders 4.0
32 Pixel Shaders 4.0
A number of Geometry Shaders (I cannot suspect how many)
Ultrathreading like R5x0 series.
HDR+SSAA
Around 700Mhz.
Ailuros
12-Sep-2006, 13:11
Memory to PCB? Or are you expecting to see 1 gigabyte graphic cards in the next gen? The 7950 GX2 is an odd card to compare due to its unusual makeup compared to most single core cards.
http://users.otenet.gr/~ailuros/estimate.jpg
There should be a minor peak until H2 2007 I guess. Anyway albeit a very rough estimate, it hasn't so far proven wrong (it probably helps to consider how power hungry a NV40@475MHz back then was).
Ailuros
12-Sep-2006, 13:13
HDR+SSAA
Current GeForces are already capable of combining Supersampling with float HDR; you just wouldn't want to see the resulting slideshow ;)
You're very pessimistic with the clocks and power for 2014, methinks.
Chalnoth
12-Sep-2006, 13:28
You're very pessimistic with the clocks and power for 2014, methinks.
If I remember correctly, that's actually an nVidia slide from a couple of years ago. I could be wrong, though. But regardless, bear in mind that physics places some pretty stringent limits upon clockspeeds. Just look at how Intel moved from the P4 to the Core 2 processors (clock speeds lowered by ~25%).
Well then nV is pessimistic, whatever. Just look at the rate the clocks climbed over the last few years and extrapolate to 2014. I mean, we may as well be using quantum computers or nanotubes until then :razz:
Either the clocks will climb further, or we'll see them all go multicore. I see no factual difference between the rise in clocks and the core-multiplication as far as the "oomph" goes in the long run. Though that's just me.
Well, don't hate me please :wink:, the INQ whistle again today that
G80 late because of a respin [External PSU and November tech day] (http://www.theinquirer.net/default.aspx?article=34319)
They say that the G80 got into another respin, thus it might come out late and might need external PSU (same to R600 too). If that is ture, it gonna be a sad move :roll:.
Well, don't hate me please :wink:, the INQ whistle again today that
G80 late because of a respin [External PSU and November tech day] (http://www.theinquirer.net/default.aspx?article=34319)
They say that the G80 got into another respin, thus it might come out late and might need external PSU (same to R600 too). If that is ture, it gonna be a sad move :roll:.
See? External PSUs here, "factory watercooled" . . .err. . .floated. . .somewhere else*. . . brutal. Yet there are other rumors (Dailytech, I think) that "next next gen" will start ramping power back into a more reasonable place. Which presumably points at Vista/DX10 as the culprit, which Ail is suggesting above.
65nm refreshes probably help quite a lot too, I'd guess.
Tho with NV, at least, part of what needs figuring out is what, if any, of this is "8950GX2" (no, I have no idea if that's a real code name, but you know what I mean) related on the most extreme end of the power rumors. Is there even a 8950GX2 in the cards pre 65nm refreshes?
*generic next-gen comments here
trinibwoy
12-Sep-2006, 14:32
I like this part best :grin:
The chip itself is another story, and from the initial info we have, it is going to be a weird beastie. So weird in fact that it deserves a bit of digging before we say for sure.
So we have big, slow, hot, late and now weird to add to the list of G80 characteristics!
If nothing else, "weird" stokes up the idea of the strange memory configuration...
Jawed
Although I'm usually all for integration, the external PSU would be welcomed in this particular case. Less heat in the box, no need to upgrade the PC PSU etc.
If nothing else, "weird" stokes up the idea of the strange memory configuration...
Jawed
Yeah, that does sniff like the aroma of 384-bit wafted past Charlie's nose too, and he's not quite willing to say it out loud yet. :lol: I bet he'd have gone with 512-bit as a rumor if he'd heard it, but I could see 384-bit as a "umm, second source please!" kind of reaction. :smile:
trinibwoy
12-Sep-2006, 14:55
I've been struggling to get rid of all the wires and cords behind my machine - don't want another one especially if it comes with a brick :(
I've been struggling to get rid of all the wires and cords behind my machine - don't want another one especially if it comes with a brick :(
Yeah, but that's still better than a new PSU for the PC, 20° higher temp in your case, additional loud fans and all that.
trinibwoy
12-Sep-2006, 14:59
from What I heard
(Rumors)
G80:32X2 ALUs: 16TMUs
sample benchmark indicates that G80 could be slower than /equal to previous generation on numerous older game.
this round better architecture and good clock scaling are equally important.
R600 will be quite faster than G80 on next 3DMark.
R600
1.architecture is better than G80 but speed scaling right now is still in stall.
G80
1.Good architecture for appropriate right time only but needs better speed scaling for lack of advance.(have to reach 1GHz in case of faster-clock R600 )
I'm tempted to take this post very seriously since this cat is usually on the ball with early rumours. But I have a couple questions.
When you say R600 will be quite faster in 3DMark is that based on empirical evidence or just what you know of R600/G80 designs? If it's the latter then you surely must know more about those designs than you're letting us in on here? ;)
Could you expand on "R600 architecture is better than G80" please? :) Why is it better?
trinibwoy
12-Sep-2006, 15:01
Yeah, but that's still better than a new PSU for the PC, 20° higher temp in your case, additional loud fans and all that.
Well the external PSU is only gonna help with the first bit and I don't mind upgrading to a beefier PSU. You're still gonna have to deal with heat/cooling issues even with the external brick.
I'm tempted to take this post very seriously since this cat is usually on the ball with early rumours.
Well, there was the unfortunate "R600 is 65nm" thing. :smile: So not always.
trumphsiao
12-Sep-2006, 15:20
Well, there was the unfortunate "R600 is 65nm" thing. :smile: So not always.
well I extricated some data from ATI Roadmap.Actually ATI did have 65nm/80nm R600 sample. same as Nvidia with either Charted-made G80 or TSMC-made G80 sample.
trinibwoy
12-Sep-2006, 15:48
Well, there was the unfortunate "R600 is 65nm" thing. :smile: So not always.
Well I didn't say always :) And I can't think of a more accurate (public) rumour source in recent times. I'm sure there are many whispers behind closed doors :D
INKster
12-Sep-2006, 15:49
I strongly doubt it, trumphsiao.
Nvidia does not trust Chartered Semi with anything more than a few low-end NV44's (Geforce 7100 GS) and all of the sudden, they would just handle them a brand new top-of-the-line and complex GPU design such as G80 ?
Sorry, that doesn't compute.
Besides, ATI's marketing has had a way of "distorting things" in their recent X1950 XTX vs 7950 GX2 materials (probably taking cues from Nvidia and Apple :D).
I believe the true reason for such delay is to hide as much as possible the *true* G80 config. from ATI, so that they have further difficulty adjusting the final R600 specs to match later on.
I believe the true reason for such delay is to hide as much as possible the *true* G80 config. from ATI, so that they have further difficulty adjusting the final R600 specs to match later on.
NV was said to believe they had a 6 week advantage on ATI, and this would just take them into the same range (if they are right about that). It could be just as simple as they are happy with the way sales are going right now, and don't see any reason to turn over the boat in the top-end (since no one I've seen suggested that they are anywhere close to mid/low G8x parts yet) right now.
Wrong thread and all, but these days I'm leaning Dec-Jan for R600, btw.
What about a PSU in a 5.25 bay? It's been done before, and would probably be better than an external brick.
trumphsiao
12-Sep-2006, 16:38
NV was said to believe they had a 6 week advantage on ATI, and this would just take them into the same range (if they are right about that). It could be just as simple as they are happy with the way sales are going right now, and don't see any reason to turn over the boat in the top-end (since no one I've seen suggested that they are anywhere close to mid/low G8x parts yet) right now.
Wrong thread and all, but these days I'm leaning Dec-Jan for R600, btw.
Rumor:
Nov is the uttermost month for either Vista /office2007 or G80 available.
Mariner
12-Sep-2006, 16:38
What about a PSU in a 5.25 bay? It's been done before, and would probably be better than an external brick.
Haven't one or two companies pre-announced this type of device already? I'm pretty sure I saw a report of such devices earlier this year. Or am I just going nuts?
Haven't one or two companies pre-announced this type of device already? I'm pretty sure I saw a report of such devices earlier this year. Or am I just going nuts?
Yeah, at least two I've heard of in the last couple months.
INKster
12-Sep-2006, 16:49
Haven't one or two companies pre-announced this type of device already? I'm pretty sure I saw a report of such devices earlier this year. Or am I just going nuts?
No, you are not: :D
http://www.thermaltake.com/product/Power/PurePower/w0130/w0130.asp
and
http://www.thermaltake.com/product/Power/PurePower/w0099/w0099.asp
nutball
12-Sep-2006, 17:07
I wonder what implications this has for the parts below the top tier. Is this power hunger only applicable to the GTXXTXXTXXX++ZOMG!!1! level parts, or does it extend all the way down the range? If it does then I for one will be waiting for G85/R650.
I wonder what implications this has for the parts below the top tier. Is this power hunger only applicable to the GTXXTXXTXXX++ZOMG!!1! level parts, or does it extend all the way down the range? If it does then I for one will be waiting for G85/R650.
It'll probably be only for the first or second high-end parts (i.e. the X1800XT and X1900XT of the next generation). The mainstream/budget parts will simply have less action going on obviously, and later parts and tweaking of the G80/R600 core should also improve the power demand for the just-a-bit-under-high-end parts that're released later in the gen.
LeStoffer
12-Sep-2006, 17:49
We hear there was another spin in August, not huge, but a spin anyway, and first Silicon is due back any day now. If all goes well, that means another 3-4 months before boards are done, and you are looking at late November.
I highly doubt that this is correct. Nothing should have prevented nVidia from working on boards (and drivers BTW) with their first spin of the G80 however buggy it might have been, so the additional 3-4 months from getting the new spin back to mass production doesn't compute. The extra spin should not mean more that maybe 5-6 weeks of delay. :???:
Sunrise
12-Sep-2006, 17:55
When you say R600 will be quite faster in 3DMark is that based on empirical evidence or just what you know of R600/G80 designs? If it's the latter then you surely must know more about those designs than you're letting us in on here? ;)
Could you expand on "R600 architecture is better than G80" please? :) Why is it better?
:lol:
Seriously trinibwoy, he can´t answer you these questions. Normally i tend to be open for such things, but he keeps contradicting himself way too much. So, the more he posts, the more i´m questioning it. No offence intendend, of course, i´m just not a believer in things that are extremely far-fetched. We already knew it´s 4:1 / 1:2:4 (talking to different people, getting different answers) of some kind, let alone that (both) are on 80nm/90nm, extremely power hungry (given the extreme amount of logic, so you don´t need evidence for it) and it will of course be very fast, otherwise it wouldn´t make much sense that NV builds such a beast in the first place. 65nm G80? Err, what?! 1GHz core clock? You gotta be kidding me.
Rumor:
Nov is the uttermost month for either Vista /office2007 or G80 available.
You´re reading way too much INQ. :wink:
NV has something planned in November all right, but if they really did a respin in August and silicon is still not back from the fab and they would need 3-4 months from now, there is no way they could ramp-up this fast, let alone selling it in November.
What INQ completely forgot is that the PCB was ready long ago (3-4 months is too much, 1-2 months is more like it), but without silicon, you won´t be able to ship in (mass), which should be obvious.
Anyone believing the G80 is not at least 500M transistors and a fair bit higher performance than G71 needs a good hit on his head for even daring to spread such FUD :p
Uttar
P.S.: I'm on vacation right now, back on friday. Actually wrote a nice little G80.txt with what I expect the chip to be. No time left to copy it here on this cybercafe, so I guess that in conclusion, each of you should send me 2 euros every time I go on vacation to pay for the trip... errr, sorry, I mean for the extra cybercafe fees to post this shit ;)
Just a reminder on the G80 release date bets:
geo : Nov 1-15
Uttar: October
Razor: October
:cool2:
Actually wrote a nice little G80.txt with what I expect the chip to be. :yep2:
I highly doubt that this is correct. Nothing should have prevented nVidia from working on boards (and drivers BTW) with their first spin of the G80 however buggy it might have been, so the additional 3-4 months from getting the new spin back to mass production doesn't compute. The extra spin should not mean more that maybe 5-6 weeks of delay. :???:
Also, if Nvidia is confident enough that their new spin will be volume-ready, they may actually initiate volume production several weeks before the "hot-samples" get back to the lab. It's all about risk management - if something is wrong then they have to toss or bin those chips but in the event that everything is okay they just saved themselves several weeks.
nutball
12-Sep-2006, 20:38
It'll probably be only for the first or second high-end parts (i.e. the X1800XT and X1900XT of the next generation). The mainstream/budget parts will simply have less action going on obviously, and later parts and tweaking of the G80/R600 core should also improve the power demand for the just-a-bit-under-high-end parts that're released later in the gen.
Right but if this means that the 7900GT equivalent is 100W+ against the current <~70W then this is a problem as far as I'm concerned. I don't mind the e-penz0r brigade having to deal with stupid power requirements (they love that sort of thing because it means bigger numbers of their power supplies) but for the rest of the planet it's a bit silly if performance per watt drops in the next generation.
Also, if Nvidia is confident enough that their new spin will be volume-ready, they may actually initiate volume production several weeks before the "hot-samples" get back to the lab. It's all about risk management - if something is wrong then they have to toss or bin those chips but in the event that everything is okay they just saved themselves several weeks.
Yeah. Some time ago I heard an interesting corollary of this point. We've all noticed from time to time that some high-end releases have good avail at launch, then a lull where it is hard to find, then good avail again. Typically, the community has pointed at "large initial demand" for this phenomenon and the IHVs of course are not exactly incentivized to dispute that. . . but what you are pointing at above is often the real cause, or at least as large a contributor, as initial demand. What's getting sold at first is that initial sampling run, and how much confidence (which is typically going to be less the more serious the change from the last chip was) there is will have an impact on how big that is. . .then a bit of a sag as the "real" ramp gets going.
http://www.hkepc.com/bbs/itnews.php?tid=667118&starttime=0&endtime=0
Guess Chartered and UMC isn't going to be making the g80 even if they do it will be very small portion.
UMC hasn't made high end nV chips yet have they?
http://www.theinquirer.net/default.aspx?article=34319
Respin rumors....
trinibwoy
13-Sep-2006, 00:41
You're a bit late there bud ;)
Brimstone
13-Sep-2006, 00:59
What clock rate do you guys figure?
600 Mhz for the dual core G80.
500 Mhz for the single core PS3 RSX.
600 Mhz for the dual core G80.
500 Mhz for the single core PS3 RSX.
Dude, you'll never need to buy another beer in a Sony bar for as long as you live if you hit with that one. :lol:
But I can't advise you to be saving up your thirst. :razz:
silhouette
13-Sep-2006, 01:26
http://www.beyond3d.com/forum/showthread.php?t=33605
Download the package related to graphics and open "under the hood - revving up shader performance.ppt" document, and check out the slide 3's comments. ;)
Chalnoth
13-Sep-2006, 01:56
Holy hell, 500MB for presentations!?
Good thing I'm at work, this puppy's downloading at about 5MB/sec :)
Edit:
And the meat is:
The NVIDIA D3D10 part, the G80, will have 48 pixel processors and unknown # of vertex processors. Not unified.
Bombshell?
Intel GMA 3000 has unified shader core, including dynamic load balancing
The NVIDIA D3D10 part, the G80, will have 48 pixel processors and unknown # of vertex processors. Not unified.
Jawed
Skrying
13-Sep-2006, 02:13
Oh so very interesting.
And with that the logic of G80 may be slower than 7950GX2 in older games should be dead, for good. :yep2:
OP updated.
Now. 48ps is 48x2, as would be the current counting? Or 48x1, and TMU is handled elsewhere? :grin:
OP updated.
Now. 48ps is 48x2, as would be the current counting? Or 48x1, and TMU is handled elsewhere? :grin:
Well they said pixel processors, so maybe just 48x1
OP updated.
Now. 48ps is 48x2, as would be the current counting? Or 48x1, and TMU is handled elsewhere? :grin:
It's my bedtime...
Did anyone work out how many transistors the ALUs of G71 consume? Or the entire pipe?
Scaling just the ALUs up by x2 gives us how many transistors? What effect does the predicted decoupling of the TMUs have? Oh and howsabout the integer functionality that needs to be added? Is it even worth bothering trying to estimate whether 48 pipes could fit into 500M transistors? Seems way too complicated.
The next slide has a misunderstanding, mixing up pixels for quads in the comments (in my opinion) and there's a lot of general typos throughout, so the deck looks fairly rushed. 48 pipes could easily be misconstrued, unfortunately...
Jawed
Didn't nVidia say that G71 has 48 pixel processors too when they were actually talking about the ALU count?
Skrying
13-Sep-2006, 02:53
A G71 with added functionality for Dx10 and the stuff needed to be added for a few new AA types?
A G71 with added functionality for Dx10 and the stuff needed to be added for a few new AA types?
I'm not down with that yet. Okay, Jen-Hsun likes to brag about his R&D budget, but he spent a sh*t-load of cash on G80, and I'm still thinking it is more ambitious than that. I had a sig that said that at one time, and I'm sticking with it.
Okay, based on absolutely nothing source-wise at all, I'm going to guess those 48 PS are not x2, and that texturing/filtering has been completely redesigned.
Skrying
13-Sep-2006, 03:03
I'm not down with that yet. Okay, Jen-Hsun likes to brag about his R&D budget, but he spent a sh*t-load of cash on G80, and I'm still thinking it is more ambitious than that. I had a sig that said that at one time, and I'm sticking with it.
I personally hope thats not it either. Though I dont see it being a lot more pixel processor wise then G71 already is. I think Nvidia has a lot of features to implement that will be very costly in the transistor department.
It could explain why G80 could be slower than their current high end part in older games...
G71 + redesigned memory controller (possibly 384-bit) + faster dynamic branching/better physics performance + FP16 HDR with AA support + better AA and AF (non angle dependent) + D3D10 support... how much transistors would that all add to a G71 which has around 278M? Would it push it in the range of 500M?
INKster
13-Sep-2006, 03:20
It could explain why G80 could be slower than their current high end part in older games...
G71 + redesigned memory controller (possibly 384-bit) + faster dynamic branching/better physics performance + FP16 HDR with AA support + better AA and AF (non angle dependent) + D3D10 support... how much transistors would that all add to a G71 which has around 278M? Would it push it in the range of 500M?
There's no reason to assume that they didn't take anything out either. ;)
Plenty of legacy still in there that probably won't be needed anyway in the near future.
well if the pure video engine is around 75 million transitors, each g71 pipeline will be around 8.3 million transistors (including vertex shaders of course). Going by that 32 pipelines fits into that nicely. 265 million (includes vertex and geometry shaders) + purevideo + d3d10+ modifications to thier memory controller, and other modifications to ROPS, TMU's etc.
There's no reason to assume that they didn't take anything out either. ;)
Plenty of legacy still in there that probably won't be needed anyway in the near future.
Ahhhh, that NV1 core is out finally then? :razz:
INKster
13-Sep-2006, 04:04
Ahhhh, that NV1 core is out finally then? :razz:
Oh, no, not thaaaat. ;)
Jen-Hsun's prerogative, NV1 is off-limits. :lol:
I hear he still collects them and all. The dream never dies...
Chalnoth
13-Sep-2006, 04:46
I'm not down with that yet. Okay, Jen-Hsun likes to brag about his R&D budget, but he spent a sh*t-load of cash on G80, and I'm still thinking it is more ambitious than that. I had a sig that said that at one time, and I'm sticking with it.
Okay, based on absolutely nothing source-wise at all, I'm going to guess those 48 PS are not x2, and that texturing/filtering has been completely redesigned.
Well, it may not be full x2, but I'd be willing to bet that it will at least be ALU + mini-ALU. Consider, after all, that this puppy will have something like twice the transistors of the G70, and I don't see why it should have only the same ALU power.
Skrying
13-Sep-2006, 04:53
Well, it may not be full x2, but I'd be willing to bet that it will at least be ALU + mini-ALU. Consider, after all, that this puppy will have something like twice the transistors of the G70, and I don't see why it should have only the same ALU power.
Yeah, but its not like the features and functionality they need to add is going to be small, in fact its going to be a huge amount of transistors.
Its certainly not going to have double, at least not IMO. Maybe 1.25x~1.5x as much, but no more.
Ailuros
13-Sep-2006, 05:16
If I remember correctly, that's actually an nVidia slide from a couple of years ago. I could be wrong, though. But regardless, bear in mind that physics places some pretty stringent limits upon clockspeeds. Just look at how Intel moved from the P4 to the Core 2 processors (clock speeds lowered by ~25%).
The slide is from NVIDIA (reminder: it is merely an estimate) and it's about two years old (well you can see where it starts out). Now look at the transistor bar and estimate 2006.
Well, it may not be full x2, but I'd be willing to bet that it will at least be ALU + mini-ALU. Consider, after all, that this puppy will have something like twice the transistors of the G70, and I don't see why it should have only the same ALU power.
Well, I said 48 *and* a new TMU/filtering engine. So that 48 would be a full 48, without giving up whatever percentage they give up today of their current 48 for TMU addressing, and all of the same capability, which apparently is not quite true today.
But that's a serious WAG. Point is, no I'm not expecting the same ALU power for ps processing.
Looking at NV's history, I do not believe that they are going to give up a serious DX9 boost to go with whatever they deliver (functionally complete, I'd think) on the DX10 side.
Ailuros
13-Sep-2006, 05:37
Well, it may not be full x2, but I'd be willing to bet that it will at least be ALU + mini-ALU. Consider, after all, that this puppy will have something like twice the transistors of the G70, and I don't see why it should have only the same ALU power.
Of course is this just pure speculation, but did anyone bother yet to calculate different scenarios with hypothetical "48 ALUs" in mind? Does anyone know yet what each ALU really is capable of? It can easily come down again as to how someone counts ALUs and these are of course just rumours.
But let's play with numbers; assume it will have 48 ALUs capable of 16 FLOPs each; that'll give at a 600MHz frequency 460+GFLOPs theoretical maximum from the PS ALUs alone.
A 7950GX2 has a theoretical maximum of 384 GFLOPs, from which I'd cut immediately 10-15% off for texture OPs.
None of the above has of course to make sense; all I'm trying to say is that without knowing what each unit is capable of, it's easy to get on the wrong track. One thing I'd like to note is that IMHO neither of today's ultra high end GPUs seem to be capable (due to various bottlenecks) to play out their high theoretical on paper floating point power.
When fundamental architectural changes occur (probably even more so for R6x0 being a USC), chances are high that arithmetic efficiency is a lot higher too. That should have been the goal IMO anyway years ago for both IHVs, when they defined the concepts for D3D10 GPUs.
Ailuros
13-Sep-2006, 05:44
Looking at NV's history, I do not believe that they are going to give up a serious DX9 boost to go with whatever they deliver (functionally complete, I'd think) on the DX10 side.
Given high chip complexity, power consumption and whatever else, I'd be willing to bet that it'll turn out a rather general trend. IHVs have later on good reason to convince us that +1 and +2 are of course way more efficient sollutions for D3D10.
Did any of you see any signs of "great GS" that I have missed so far? The messages addressed to developers sound rather like "handle with care" to me from all sides. One might argue for "slightly better" or "slightly worse", but more than often some things don't cut the cake for developers if it's "better than useless". They'll experiment and use the whole enchelada full scale when they get something "useful" from both sides.
Given high chip complexity, power consumption and whatever else, I'd be willing to bet that it'll turn out a rather general trend. IHVs have later on good reason to convince us that +1 and +2 are of course way more efficient sollutions for D3D10.
Y'know, I don't know I agree with that. I really don't. There is a part of me that thinks there is a serious chance of a significant split on the performance side between IHVs, with NV having a decided edge in DX9 and ATI having a decided edge in DX10. Maybe not gross "ZOMG!" kind of "decided", but noticeable across a wide swath on both sides.
Why? That's harder, and mostly inferred. In part, because we heard long, long ago from the horses mouth that ATI had done all the heavy lifting for R600 with Xenos and R5xx, and yet where is R600? Where is even a reliable schedule for R600? I'm left thinking that it's a not unreasonable thing to wonder if they might have (enough qualifiers there? :smile: ) made a decision they don't want it on the scene until it has Vista and DX10 to go with for people to really appreciate its competitive posturing vs G80. If they thought it would show well against G80 on DX9 alone, that wouldn't be a factor. Yet NV seemed quite willing and eager to get G80 out the door signficantly before Vista if the "bring up" Ghods had smiled.
And no, nobody has even hinted that analysis at me. Like I said --inferred.
Did any of you see any signs of "great GS" that I have missed so far? The messages addressed to developers sound rather like "handle with care" to me from all sides. One might argue for "slightly better" or "slightly worse", but more than often some things don't cut the cake for developers if it's "better than useless". They'll experiment and use the whole enchelada full scale when they get something "useful" from both sides.
Well, I think we had a thread on that when you were absent for awhile. It appears there is some serious api design limitation to what GS can do in the first place, no matter how much resource you're throwing at it. Of course, this is *me* saying this, so you may confidently assume a gross oversimplification. :smile:
Cripes! I'm still rereading and researching a day later so as not to overly annoy/amuse those of you who actually know what you're talking about. Ah, the hell with it....
Do I hear a second for any of these? :smile: Trying to keep it consensus-y.
Chose the wording of > 256-bit carefully, to allow for either 384 or 512. :wink: One second hand reported "spotting" does not spring make. Or something like that. Show me a piccie, and I'm in. :)You're right, of course. And I don't believe it, but apparently that 16 GS/VS rumo(u)r came to me from The Inq via Xbit (http://www.xbitlabs.com/news/video/display/20060703090833.html). Maybe I remembered it from the previous thread, but maybe Uttar shot it down in toying with nAo (http://www.beyond3d.com/forum/showthread.php?p=789473#post789473). (Given the confirmation of 48 PPs, he may have been shooting down just the pixel part.) It doesn't sound that far-fetched, though, considering R600 should have a poop-ton more VS/GS power available thanks to its USA, so NV may want to beef up that aspect. Then again, there's always the dynamic of an IHV forcing its weakness onto the market to basically waste the other's strength.
Apropo to the latest 48 PP rumor/confirmation, does this (http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1=%2220060149803%22.PGNR.&OS=DN/20060149803&RS=DN/20060149803) ("Multipurpose functional unit with multiply-add and format conversion pipeline") point to 48x1 pixel processors (props to nAo and Demi (http://www.beyond3d.com/forum/showthread.php?p=787876#post787876))? Does it at least shoot down the theory that they'd need separate ALUs for DX10/INT functionality?
Xenos has 16 point + 16 bilinear TMUs. Is it safe to assume R600 will continue this distinction? If so, and if G80 has discrete VS/GS and PS units, would it need decoupled, specialized TMUs a la Xenos? Or does VS/GS work lend itself to point-sampling and PS work to bilinear sampling, keeping "coupled" TMUs around another generation? Basically, would both the VS/GS units benefit from bilinear and the PS make use of point TMUs?
If we're considering coupled TMUs (and as long as I'm throwing all that at the wall), I suppose it's also worth considering whether 24 (24x2 pixel "processors") or 48 (48x1) TMUs are reasonable at this point in time (not including anything the VS/GS processors may have).
Ailuros
13-Sep-2006, 06:26
Y'know, I don't know I agree with that. I really don't. There is a part of me that thinks there is a serious chance of a significant split on the performance side between IHVs, with NV having a decided edge in DX9 and ATI having a decided edge in DX10. Maybe not gross "ZOMG!" kind of "decided", but noticeable across a wide swath on both sides.
Why? That's harder, and mostly inferred. In part, because we heard long, long ago from the horses mouth that ATI had done all the heavy lifting for R600 with Xenos and R5xx, and yet where is R600? Where is even a reliable schedule for R600? I'm left thinking that it's a not unreasonable thing to wonder if they might have (enough qualifiers there? :smile: ) made a decision they don't want it on the scene until it has Vista and DX10 to go with for people to really appreciate its competitive posturing vs G80. If they thought it would show well against G80 on DX9 alone, that wouldn't be a factor. Yet NV seemed quite willing and eager to get G80 out the door signficantly before Vista if the "bring up" Ghods had smiled.
And no, nobody has even hinted that analysis at me. Like I said --inferred.
For both IHVs sake both GPUs will have to be huge D3D9 powerhouses, since they cannot apparently get enough floating point power for coming game engines like UE3. Slight difference being that I wouldn't call any upcoming game a D3D10 game, no matter what the box states or what API any game might require to even start (well that's obviously a totally different chapter).
Both IHVs knew when D3D10 was set into stone what is coming and both obviously made their forecasts and according design decisions. Frankly I don't expect any ultimate D3D10 parts for the first round and before either/or get fully analyzed it's damn hard with the few sparse speculations to decide yet which part is more efficient for what.
Well, I think we had a thread on that when you were absent for awhile. It appears there is some serious api design limitation to what GS can do in the first place, no matter how much resource you're throwing at it. Of course, this is *me* saying this, so you may confidently assume a gross oversimplification. :smile:
There you might want to ask someone that was present during the different D3D10 drafts of the past, how many IHVs and how strongly were screaming that the requirements may end up too high. I'm probably the only one that recalls a tesselation unit proposed in early DirectX-Next drafts, which turned into something "fixed function" and being optional and then disappeared at the end through the whole process. APIs do not get defined by Microsoft alone; it's rather an endless Microsoft + IHVs counter-negotiation until they reach some common ground.
Thus IMHO the API limitation didn't come out of the blue, but the very same IHVs that were able to know that they can reach X transistor budget and nothing less than that.
***edit: chicken-egg dilemmas are weird I know ;)
Well, I would never suggest that either G80 or R600 will not be "DX9 powerhouses" compared to their predecessors. I'm wondering how they do against each other, and if that might be yin/yang DX9 vs DX10.
Here's an odd, but related thought. Is there a possibility that WDDM2.0, with its overhead improvements, could provide a competitive advantage in DX9 on Vista to ATI's USC? Could ATI bursting-at-64ps (rumored) be better able to leverage those overhead improvements than NV (rumored) at a constant 48ps? Or with the VS, similiarly if probably a different (but still less than ATI can burst) number for NV? The overhead improvements are not limited to "D3D10 mode", right? They are inherent in the driver model for Vista? Anything there potentially, or geo wandering in the wilderness again? :wink:
Tim Murray
13-Sep-2006, 07:06
600 Mhz for the dual core G80.
500 Mhz for the single core PS3 RSX.
If you're implying that RSX will be G80... I am going to explode. I will go ahead and change my nick to Tsar Bomba and everything. (Please just say it was a joke. I'll sleep better.)
Also, I think there was some consensus that G80 is not going to be anemic in the VS department as they first seem compared to R600's unified architecture.
dizietsma
13-Sep-2006, 10:29
Your timescales seem completely off, if you're talking about the launch version of G80. Process is now fixed and the R&D for the chip was finished a long time ago. It's just about to enter MP for chrissakes, the chip is utterly done.
My understanding was that 80nm was in production at TSMC from April and as it is just a shrink it supports most 90-nm libraries and IP anyway so would not takwe much time to convert over? That's assuming November as most people think for G80 and soon for 7600.
Why? That's harder, and mostly inferred. In part, because we heard long, long ago from the horses mouth that ATI had done all the heavy lifting for R600 with Xenos and R5xx, and yet where is R600? Where is even a reliable schedule for R600? I'm left thinking that it's a not unreasonable thing to wonder if they might have (enough qualifiers there? :smile: ) made a decision they don't want it on the scene until it has Vista and DX10 to go with for people to really appreciate its competitive posturing vs G80. If they thought it would show well against G80 on DX9 alone, that wouldn't be a factor. Yet NV seemed quite willing and eager to get G80 out the door signficantly before Vista if the "bring up" Ghods had smiled.
I dare say R520/R580 are all the evidence you need that ATI is going for D3D10 performance not DX9 performance.
Well, I think we had a thread on that when you were absent for awhile. It appears there is some serious api design limitation to what GS can do in the first place, no matter how much resource you're throwing at it. Of course, this is *me* saying this, so you may confidently assume a gross oversimplification. :smile:
I'd say it's the converse, the API is limited because the hardware to implement the ideal GS functionality turns out to be a huge complicated beast (best left till 65nm?...) so the API has been curtailed in order not to stretch the IHVs too far (remember: no caps). I aint afraid to say I perceive one IHV as seriously trailing on this (look at Xenos...), which increased the pressure to curtail the API.
Jawed
Xenos has 16 point + 16 bilinear TMUs. Is it safe to assume R600 will continue this distinction?
I think so.
If so, and if G80 has discrete VS/GS and PS units, would it need decoupled, specialized TMUs a la Xenos? Or does VS/GS work lend itself to point-sampling and PS work to bilinear sampling, keeping "coupled" TMUs around another generation? Basically, would both the VS/GS units benefit from bilinear and the PS make use of point TMUs?
In old discussions about vertex shading there seemed to be a concensus that filtered textures would be of very limited use in vertex shading. Well, that's how I remember it. But patent evidence seems to point towards shared TMUs amongst VS/GS/PS...
If we're considering coupled TMUs (and as long as I'm throwing all that at the wall), I suppose it's also worth considering whether 24 (24x2 pixel "processors") or 48 (48x1) TMUs are reasonable at this point in time (not including anything the VS/GS processors may have).
The patent about a special function ALU that is also an interpolator (using those big look-up tables that are dual-purpose) hints that at least one ALU will retain its duties in texture address calculation.
The way I see it, the second ALU (i.e., like G7x) will gain integer functionality. There's no point in putting integer functionality into every ALU in a GPU (if you can avoid it) because integer functionality will prolly have fairly limited use. At least in the early days of D3D10.
That's just off-the-cuff thinking though, I've only just decided this.
Jawed
Mariner
13-Sep-2006, 11:31
If you're implying that RSX will be G80... I am going to explode. I will go ahead and change my nick to Tsar Bomba and everything. (Please just say it was a joke. I'll sleep better.)
Brimstone has stated his belief in the G80/RSX linkage in various threads in the Console forums which is a pretty good indication that he's not joking! He's either got the scoop of the century or (almost certainly) is just utterly wrong! :smile:
What's getting sold at first is that initial sampling run, and how much confidence (which is typically going to be less the more serious the change from the last chip was) there is will have an impact on how big that is. . .then a bit of a sag as the "real" ramp gets going.
As long as the pinning stays the same after the respin, I see no problems there whatsoever.
Here's an odd, but related thought. Is there a possibility that WDDM2.0, with its overhead improvements, could provide a competitive advantage in DX9 on Vista to ATI's USC? Could ATI bursting-at-64ps (rumored) be better able to leverage those overhead improvements than NV (rumored) at a constant 48ps? Or with the VS, similiarly if probably a different (but still less than ATI can burst) number for NV? The overhead improvements are not limited to "D3D10 mode", right?
Because of Xenos's support for 8 concurrent render states I think it's highly likely that later WDDM functionality for concurrent render states will be in R600. It could be only partial support, since WDDM functionality seems to be more general than what's in Xenos.
I dunno enough about concurrent render states to know if a GPU can safely overlap render states while doing DX9 work (which would further decrease the "call overhead" that DX9 suffers with).
Jawed
Brimstone has stated his belief in the G80/RSX linkage in various threads in the Console forums which is a pretty good indication that he's not joking! He's either got the scoop of the century or (almost certainly) is just utterly wrong! :smile:
:lol:
As long as the pinning stays the same after the respin, I see no problems there whatsoever.
Neither do I. I was just pointing at one explanation for why sometimes high-end cards have good initial availablity at launch, and then dry up for a bit.
PeterAce
13-Sep-2006, 13:00
Architecture effecncy improvements and a leaning toward higher floating-point Pixel Shader performance that the IHVs are aiming for with these new gen GPUs for D3D10 surely will also help out in DX9.
Whats good for the gouse is good for the gander? (wow... that last part was geo-like!).
trinibwoy
13-Sep-2006, 14:34
It could explain why G80 could be slower than their current high end part in older games...
G71 + redesigned memory controller (possibly 384-bit) + faster dynamic branching/better physics performance + FP16 HDR with AA support + better AA and AF (non angle dependent) + D3D10 support... how much transistors would that all add to a G71 which has around 278M? Would it push it in the range of 500M?
I find this hilarious. You're basically saying G80 = G71 + change-everything-in-G71 + more features :lol: You can't improve everything about a chip and then say the result is old-chip + stuff. This whole unified shader thing is really reshaping everyone's opinions of what a new architecture should look like.
trinibwoy
13-Sep-2006, 15:04
There is a part of me that thinks there is a serious chance of a significant split on the performance side between IHVs, with NV having a decided edge in DX9 and ATI having a decided edge in DX10.
I'm not sure I follow this. What does "decided edge in DX10" really refer to? Synthetic geometry shader tests and the like? It seems that whoever is faster in the early DX10 titles will also be faster in the latest DX9 titles. The demands aren't going to change much - it's still going to be about high shader and memory performance.
Unified shading hardware isn't something that magically improves DX10 performance simply because the APi is unified. If it works it will work just as well for DX9.
I think geo is implying that nV will have some disadvantages in DX10 because of the architecture, which will favor the DX9 stuff more and the opposite for ATI.
My guess is (I feel the flames rising already...) that G80 will stomp all over R600 in everything but a few synthetic tests. Based solely on my gut feeling and no facts at all.
trinibwoy
13-Sep-2006, 15:39
I think geo is implying that nV will have some disadvantages in DX10 because of the architecture, which will favor the DX9 stuff more and the opposite for ATI.
I got that bit. But I guess what I'm asking is what is it about DX9 that could possibly facilitate that advantage. I can't think of any (dis)advantage that a unified design may have in DX10 that won't also translate to an (dis)advantage in DX9. Really looking for some insight here .....
Just from the top of my head, I guess a "real" unified DX10 architecture will work with DX9 content through some sort of wrapper. That in itself would slow it down significantly if it's the case. And writing new drivers from scratch has it's pains as well.
EDIT: as well as benefits like getting rid of the legacy stuff, so who knows?
trinibwoy
13-Sep-2006, 16:00
Hmmm I don't know. Wouldn't any wrapper approach be at the driver level and affect any DX10 hardware - unified or not? It's the driver that has to translate APi calls into hardware instructions so I don't see why extra overhead will only apply to a unified architecture. It's not like Nvidia gets DX9 for free with G80.
But since I don't know anything about this stuff I might be completely off :)
I got that bit. But I guess what I'm asking is what is it about DX9 that could possibly facilitate that advantage. I can't think of any (dis)advantage that a unified design may have in DX10 that won't also translate to an (dis)advantage in DX9. Really looking for some insight here .....
Wasn't there a quote from someone at ATI that mentioned that the Xenos was faster than an equivalent (# of shaders) non-unified design?
edit. found it. (http://www.xbitlabs.com/news/video/display/20060525104243.html)
Mr. Huddy said that Xbox 360 game console, which sports developed by ATI Xenos graphics core with unified shader architecture and 48 shader processors, loses 20% to 25% performance in pixel-shader limited games, when its graphics chip is configured as non-unified, e.g.,16 processors work strictly on vertex shaders, whereas 32 are assigned for pixel shaders.
I'm not sure I follow this. What does "decided edge in DX10" really refer to?
No different from G71 being better at backward-looking TMU-intensive pixel shaders and R580 being better at forward-looking ALU-intensive shaders (as well as dynamic branching).
The "advantages" that R580 has for SM3 code are still barely relevant in current games. Whilst R580 does generally show better performance than G71 (at least in the most popularly benchmarked games/resolutions with high-AA/AF), there are no laurels for R580's theoretical advantages. No sign of any, either.
Synthetic geometry shader tests and the like?
Yeah, R600 is prolly on a hiding to nothing if it ends up touting the highest (or significantly higher) SM4 performance, because we won't have any games to enjoy that with, or they'll be limited options, e.g. like z-feathered smoke in CoD2. SM4 prolly won't be a compelling option for gamers till R700 hits...
It seems that whoever is faster in the early DX10 titles will also be faster in the latest DX9 titles. The demands aren't going to change much - it's still going to be about high shader and memory performance.
That didn't happen with the SM1.x-to-SM2 transition.
Unified shading hardware isn't something that magically improves DX10 performance simply because the APi is unified. If it works it will work just as well for DX9.
But DX9 games tend to play to G71's strengths, which exclude intensive ALU code and dynamic branching - two features of SM3 that are barely capitalised-upon. So how's SM4, which is more of the same (and ignoring D3D10's architectural wonders that won't generally be in games for another 18 months), going to magically transform games whose SM2/3 performance is mainly bandwidth-bound?
G80 is prolly going to be better at texture-intensive stuff than R600. The signs are that R600 will have no more texturing capability than R580. Whereas it's considered unlikely that G80 will have merely 24 filtering TMUs. If R600 measures better at ALU-intensive shaders, is that going to make any real difference to SM2/3 games? R580 hasn't benefitted dramatically.
The other side of the coin with texturing, for example, is how effectively do the competing architectures use available bandwidth? If texture filtering quality is mandated by M$ (reduced angle-dependency, minimal brilinear artefacts, etc.) are we likely to see a more-level playing field? Does ATI's experience with the ring-bus, out-of-order threading etc. mean that second-generation texturing in R600 will have a significant advantage per byte of bandwidth?
The other main factor in DX9 performance/configuration differentiation between G80/R600 is the ROPs. ATI is obviously ahead now, but NVidia is making noises about spending way more transistors (or at the very least going an extra generation forwards in terms of features) in the next generation, while ATI will prolly be consolidating based upon what's in R5xx.
It might end up that with games commonly TMU-/bandwidth-/ROP-bound, efficient texturing or ROP will be all that sets the two apart. Nothing at all to do with D3D10 or unification. If one architecture spends significantly more transistors on texturing or ROP to the detriment of D3D10 performance, while the other architecture reverses those priorities, aiming at performant D3D10 features, then it seems there's a strong case that DX9 performance will be no indicator of D3D10 performance.
Still, it'll be fun to talk about these architectures and being in PCs we'll have plenty of ways to play with them and compare, unlike the mostly locked-away Xenos.
Jawed
L'Inq not really the wiser:
http://www.theinquirer.net/default.aspx?article=34359
In any case, NV is heavily downplaying the DX10 performance, instead shouting about DX9 to anyone who will listen.
Jawed
LeStoffer
13-Sep-2006, 16:55
The first thing is I was a little off on the dates, the tech day now seems more likely in late October with a launch in early November.
Indeed.
First is the arrangement of the chip, physically we are hearing that it is 2 * 2 cm, or about a 400mm die. Ouch. One of the reasons it is so big is the whole dual core rumor that has been floating around.
Oh no, not again! :wink:
trinibwoy
13-Sep-2006, 16:58
Wasn't there a quote from someone at ATI that mentioned that the Xenos was faster than an equivalent (# of shaders) non-unified design?
Yes, but that comparison is using the same API - which is exactly what I'm saying. That if unified is faster that discrete on DX10, it should also be faster than discrete on DX9.
trinibwoy
13-Sep-2006, 17:03
http://www.theinquirer.net/default.aspx?article=34359
I love it! 48 DX9 and 48 DX10 shaders? Keep your heads down my friends - the FUD is flying once again. Let the good times roll ! :lol:
Yes, but that comparison is using the same API - which is exactly what I'm saying. That if unified is faster that discrete on DX10, it should also be faster than discrete on DX9.
Unified should be faster if the same amount of calculations can be done both discrete or unified theoretically, only in ideal situations will a discrete match a unified (aslong as the overhead of the scheduler for unified is low).
trinibwoy
13-Sep-2006, 17:28
I'm getting a bit lost in the DX10 comparisons. When people say DX10 performance are they referring to new functionality or "more widespread use of DX9 functionality that was ignored by developers last generation" ?
No different from G71 being better at backward-looking TMU-intensive pixel shaders and R580 being better at forward-looking ALU-intensive shaders (as well as dynamic branching).
Well in that context DX9 and DX10 aren't any different. Developers can choose to write TMU intensive shaders in DX10 as well. So I still don't see how a design can be inherently faster in one API while slower in the next.
That didn't happen with the SM1.x-to-SM2 transition.
We all know it's not that simple. SM2 asked very different questions of the core shader pipeline than SM1 did and Nvidia failed to answer - DX10 and DX9 are not as different in that regard. So that's not going to happen unless G80 is NV30 part deux.
G80 is prolly going to be better at texture-intensive stuff than R600. The signs are that R600 will have no more texturing capability than R580. Whereas it's considered unlikely that G80 will have merely 24 filtering TMUs. If R600 measures better at ALU-intensive shaders, is that going to make any real difference to SM2/3 games? R580 hasn't benefitted dramatically.
Honestly, that makes no sense. Texture workload is obviously going to increase in the future. ATi is not going to design an inherently bottlenecked architecture that blazes through arithmetic and waits idly by for TMU's to do their thing.
The other main factor in DX9 performance/configuration differentiation between G80/R600 is the ROPs. ATI is obviously ahead now, but NVidia is making noises about spending way more transistors (or at the very least going an extra generation forwards in terms of features) in the next generation, while ATI will prolly be consolidating based upon what's in R5xx.
What do ROP's have to do with DX9/DX10 differentiation? I'm asking a serious question here.
It might end up that with games commonly TMU-/bandwidth-/ROP-bound, efficient texturing or ROP will be all that sets the two apart. Nothing at all to do with D3D10 or unification. If one architecture spends significantly more transistors on texturing or ROP to the detriment of D3D10 performance, while the other architecture reverses those priorities, aiming at performant D3D10 features, then it seems there's a strong case that DX9 performance will be no indicator of D3D10 performance.
It's stuff like this that keeps me continually confused about what people mean by D3D10 performance. It comes across as if texturing and ROP/AA performance are not also DX10 considerations !! :???: These aren't legacy operations that are going to the wayside next generation - they are going to be ever present and just as important as before.
The signs are that R600 will have no more texturing capability than R580.
The presumed 16 additional point-sampling TMUs would surely add something as well, no?
Die size, 2 cm x 2 cm:edit, sounds right it would be around 400 mm double the die size of the g71
Dx 9 performance is very fast: expected
12 memory chips: Rumor bin, was stated here
1/3 of its die size is for dx 10 functionality: around 120-166 million tranis if it is a 500 million trani chip sounds about right. (gotta account for pure video)
The rest of it is all pure speculation. Probably because of the 1/3 of die size is for dx10 operations. In Fuad's mind must of drew a few lines were dx9 and dx10 can't overlap. Of course dx10 performance will be poor if the g80 was designed with distinct dx10 portion and a dx9 portion, but it would be stupidity beyond believe if nV would try such a thing as they did with the fx.
Enough Inq bashing if that's alright, these threads get big enough v.quickly without noise to poop on the signal. Deleting pointless posts soon......
[wrt 16 bilinear + 16 point samplers earlier in the thread] I think so.
I'm not convinced that's the case for R6.
DegustatoR
13-Sep-2006, 19:45
You're starting to look desperate guys :)
chavvdarrr
13-Sep-2006, 20:07
Unified should be faster if the same amount of calculations can be done both discrete or unified theoretically, only in ideal situations will a discrete match a unified (aslong as the overhead of the scheduler for unified is low).
But what about transistor count?
What eats more transistors: 48 unified pipes or 16Vertex+48Pixel ones ?
Which will be faster on common for this and next year load: 48U or 16V+48P ?
The rest of it is all pure speculation. Probably because of the 1/3 of die size is for dx10 operations. In Fuad's mind must of drew a few lines were dx9 and dx10 can't overlap. Of course dx10 performance will be poor if the g80 was designed with distinct dx10 portion and a dx9 portion, but it would be stupidity beyond believe if nV would try such a thing as they did with the fx.
Traditionally, NV has always had advanced features, but they ran at piss poor perfromance: 32bit color but mega hit perf on TNT cards, 32bit pps on Fx, branching on last gen cards, ect. So following history its a safe bet to see where the G80 will be very very fast at current tec (ie DX9) but probably slow at next gen (DX10).
I'm getting a bit lost in the DX10 comparisons. When people say DX10 performance are they referring to new functionality or "more widespread use of DX9 functionality that was ignored by developers last generation" ?
Generally I mean (and Geo means, I guess) specific features of D3D10. The new stuff in D3D10 is a seriously big deal. Arguably a lot of this can be simulated in an SM3 GPU (e.g. using R2VB) but the implementation details in D3D10 make it work much more smoothly. There's also new data formats that provide for a big improvement in bandwidth usage for a given quality level of textures.
Well in that context DX9 and DX10 aren't any different. Developers can choose to write TMU intensive shaders in DX10 as well. So I still don't see how a design can be inherently faster in one API while slower in the next.
A new API provides scope for alternative methods to implement visual effects. So the actual code is different, depending on whether you look at the DX9 code or the D3D10 code. So the DX9 algorithm may well be TMU-intensive, while the D3D10 version is ALU-intensive.
If you're comparing two different GPU designs, they could have different trade-offs between support for TMU-intensive and ALU-intensive code. Clearly, we don't know, this is all hypothesis about the mechanism for performance disparities across the API gap.
We all know it's not that simple. SM2 asked very different questions of the core shader pipeline than SM1 did and Nvidia failed to answer - DX10 and DX9 are not as different in that regard. So that's not going to happen unless G80 is NV30 part deux.
But NVidia chose different trade-offs for TEX and ALU than ATI, which is merely an example of how GPUs that are functionally similar can perform differently, across the API gap (i.e. SM1.x to SM2).
We're not talking absolutes here ("NV30 is so useless at SM2 forget it"), merely that SM1.x performance didn't indicate SM2 performance.
And, there are genuinely hard/costly to implement features in D3D10. Post-GS cache has been the subject of surprise round here recently - it just wants to gobble-up transistors and asks awkward questions about parallelism...
Honestly, that makes no sense. Texture workload is obviously going to increase in the future. ATi is not going to design an inherently bottlenecked architecture that blazes through arithmetic and waits idly by for TMU's to do their thing.
Actually, what I was hinting at was that ATI's texturing architecture (out of order, cache design, ring-bus, programmable memory controller) might be capable of responding to large amounts of extra bandwidth, without needing to be sized-up. I don't know this is the case. The GDDR4 attached to X1950XT doesn't seem to back me up on this - I dunno, could be a driver issue. Have to wait and see :cry:
What do ROP's have to do with DX9/DX10 differentiation? I'm asking a serious question here.
Take a heavily ROP-dependent algorithm: if one GPU design is DX9++ and the other is DX9/D3D10-balanced then the latter may not dedicate as much transistor budget to eke out the last iota of ROP performance - hence a difference in performance for nominally the same visuals.
It's just a curve: x-axis is transistors spent, y-axis is efficiency per byte of bandwidth per clock per ROP. GPU A might cut-off at 90% up the curve, while GPU B cuts-off at 99%... If you draw a curve for streamout you might find that GPU A is at 90% while GPU B is at 80%. It's all about transistors spent (or die yield) versus R&D-effort.
And, transistor cost for ROPs or streamout doesn't just lie within the ROP and streamout portions of the die. Those functions are heavily dependent upon the avoidance of bottlenecks elsewhere. Which obviously also costs transistors.
It's stuff like this that keeps me continually confused about what people mean by D3D10 performance. It comes across as if texturing and ROP/AA performance are not also DX10 considerations !! :???: These aren't legacy operations that are going to the wayside next generation - they are going to be ever present and just as important as before.
Going forwards there are two key concepts I can think of that define new, viable, algorithms in D3D10:
take graphics work away from the CPU and make it run entirely on the GPU - the GPU can now write to local memory and re-circulate data in so many new ways as to go far beyond what the CPU could achieve (constrained by FLOPs, CPU<->GPU bandwidth, API-overheads)
provide enough programmability, resource usage models and brute force that complex, ALU-intensive, shaders can be used to reduce dependencies upon texturing thus making more effective use of available bandwidth and texturing-throughput - e.g. by using more efficient texture formats (that require post-processing in the form of extra ALU instructions) or dynamic branching to minimise texture fetchesGuess which gets easier to implement with D3D10:
http://www.cupidity.f9.co.uk/b3d56.jpg
http://www.cupidity.f9.co.uk/b3d57.jpg
Jawed
The presumed 16 additional point-sampling TMUs would surely add something as well, no?
You'd think so. In fact I do. I made a thread about this a few months ago, and the conclusions were very muted.
http://www.beyond3d.com/forum/showthread.php?t=30276
So the effect could be anything from "barely noticable" to "ZOMG! now we've started using it we can't go back!"
The likelihood is that point-sampling TMUs will be heavily loaded by the vertex pipes, so it's hard to know how much is left over.
Where's Jack?...
Jawed
pjbliverpool
13-Sep-2006, 22:15
Wasn't there a quote from someone at ATI that mentioned that the Xenos was faster than an equivalent (# of shaders) non-unified design?
edit. found it. (http://www.xbitlabs.com/news/video/display/20060525104243.html)
"Mr. Huddy said that Xbox 360 game console, which sports developed by ATI Xenos graphics core with unified shader architecture and 48 shader processors, loses 20% to 25% performance in pixel-shader limited games, when its graphics chip is configured as non-unified, e.g.,16 processors work strictly on vertex shaders, whereas 32 are assigned for pixel shaders. "
Only if assuming that the shaders are identical and the seperate shaders of a discrete architecture are not more efficient than those of a unified architecture due to their higher level of specialisation.
And what if you can afford to add more shaders in a seperate design because of the saved die space (if any?). Is unified still better? I.e. if 48ps + 8vs take up the same number of transistors as 48us + the scheduler, which is better then?
I don't think its as clean cut as the quote suggests.
Traditionally, NV has always had advanced features, but they ran at piss poor perfromance: 32bit color but mega hit perf on TNT cards, 32bit pps on Fx, branching on last gen cards, ect. So following history its a safe bet to see where the G80 will be very very fast at current tec (ie DX9) but probably slow at next gen (DX10).
Hmm there isn't much difference when it comes to pixel shaders from dx9 to dx10. Just a new syntax and geometry shaders. Yes it isn't going to be an ultra fast dx10 chip, but it probably won't be a slow one either and it really doesn't need to be super fast. Keep in mind dx10 has less overhead then dx9 so unless the g80 has piss poor geometry shader speed, that will be the only way it fails in dx10.
But what about transistor count?
What eats more transistors: 48 unified pipes or 16Vertex+48Pixel ones ?
Which will be faster on common for this and next year load: 48U or 16V+48P ?
That we will see with the g80 and r600 ;). I think the transistor count will end up close edge to ununified. But it seems the g80 is very ambitious and quite different from a traditionally design it might not really give us a hint at all. Sounds to me a unifed pool for TMU's + ROP's, a unified pool of geometry shaders + vertex shaders, and detached pixel pipelines.
I'm not convinced that's the case for R6.
I'd love to see an argument for why.
Jawed
trinibwoy
13-Sep-2006, 23:19
Generally I mean (and Geo means, I guess) specific features of D3D10. The new stuff in D3D10 is a seriously big deal. Arguably a lot of this can be simulated in an SM3 GPU (e.g. using R2VB) but the implementation details in D3D10 make it work much more smoothly. There's also new data formats that provide for a big improvement in bandwidth usage for a given quality level of textures.
A new API provides scope for alternative methods to implement visual effects. So the actual code is different, depending on whether you look at the DX9 code or the D3D10 code. So the DX9 algorithm may well be TMU-intensive, while the D3D10 version is ALU-intensive.
If you're comparing two different GPU designs, they could have different trade-offs between support for TMU-intensive and ALU-intensive code. Clearly, we don't know, this is all hypothesis about the mechanism for performance disparities across the API gap......
Gotcha, thanks for the clarifications.
Actually, what I was hinting at was that ATI's texturing architecture (out of order, cache design, ring-bus, programmable memory controller) might be capable of responding to large amounts of extra bandwidth, without needing to be sized-up. I don't know this is the case. The GDDR4 attached to X1950XT doesn't seem to back me up on this - I dunno, could be a driver issue. Have to wait and see :cry:
Oh, I read similiar capabilities as similiar performance, not similiar number of units. I completely agree - if R600 retains only 16 texture units they will have to achieve higher throughput through higher efficiency / bandwidth.
Take a heavily ROP-dependent algorithm: if one GPU design is DX9++ and the other is DX9/D3D10-balanced then the latter may not dedicate as much transistor budget to eke out the last iota of ROP performance - hence a difference in performance for nominally the same visuals.
It's just a curve: x-axis is transistors spent, y-axis is efficiency per byte of bandwidth per clock per ROP. GPU A might cut-off at 90% up the curve, while GPU B cuts-off at 99%... If you draw a curve for streamout you might find that GPU A is at 90% while GPU B is at 80%. It's all about transistors spent (or die yield) versus R&D-effort.
And, transistor cost for ROPs or streamout doesn't just lie within the ROP and streamout portions of the die. Those functions are heavily dependent upon the avoidance of bottlenecks elsewhere. Which obviously also costs transistors.
I see what you're getting at. I was looking at things under the unified-vs-discrete umbrella for expected performance in DX9 vs DX10 at a macro level. But if we look at it in terms of transistor budget allocated to making new features zippy then I see your point. Guess I need to redefine what I consider to be "DX10 performance" in order to keep up with you guys.
Having said that, DX10 performance is going to be defined a lot more by pixel/vertex/TMU/ROP/AA performance than geometry shading / streamout etc so how do you decide what defines a "fast" DX10 GPU? ATI better be blazing fast in the former categories before they decide to devote too much budget to the new whiz-bang stuff.
Having said that, DX10 performance is going to be defined a lot more by pixel/vertex/TMU/ROP/AA performance than geometry shading / streamout etc so how do you decide what defines a "fast" DX10 GPU? ATI better be blazing fast in the former categories before they decide to devote too much budget to the new whiz-bang stuff.
As I described earlier with those imaginary curves, is it better to go 90/90% or 99/80%?
In other words, as we were once so fond of asking, do we really need to be able to run Quake 3 at >300fps at 1600x1200?
To be fair, it's in the nature of SM3 that it's far harder to generate the extreme-FPS Q3 case. SM3 is still shiny and new.
I'm curious to see what will happen when games straddle this API gap. Will games end up being:
SM2/SM4 - SM2 games such as FEAR and CoD2 show that visual quality can go an awfully long way within the constraints of SM2. Devs then go to town on the SM4 code, because SM2 code provides well-defined limits
SM3/SM4 - devs aim to make them practically indistinguishable, whilst cutting off a lot of gamers with SM2 hardware
some mixture of these?Jawed
As I described earlier with those imaginary curves, is it better to go 90/90% or 99/80%?
In other words, as we were once so fond of asking, do we really need to be able to run Quake 3 at >300fps at 1600x1200?
To be fair, it's in the nature of SM3 that it's far harder to generate the extreme-FPS Q3 case. SM3 is still shiny and new.
I'm curious to see what will happen when games straddle this API gap. Will games end up being:
SM2/SM4 - SM2 games such as FEAR and CoD2 show that visual quality can go an awfully long way within the constraints of SM2. Devs then go to town on the SM4 code, because SM2 code provides well-defined limits
SM3/SM4 - devs aim to make them practically indistinguishable, whilst cutting off a lot of gamers with SM2 hardware
some mixture of these?Jawed
Well there really is no way to make sm 4 games look and feel the same as sm 3 games, things like tesselation and displacement just won't be possible with sm 3. Granted I don't think these new hardwares will be able to do these things at any reasonable speeds in game situations but thats just my opinion.
Well there really is no way to make sm 4 games look and feel the same as sm 3 games, things like tesselation and displacement just won't be possible with sm 3. Granted I don't think these new hardwares will be able to do these things at any reasonable speeds in game situations but thats just my opinion.
The implication was that SM4 version of code goes faster (just like SM3 Far Cry goes faster in some places than the SM2 version) or that extreme effects, such as "super voluminous smoke" take the place of crappy billboard smoke.
Question of degrees...
Jawed
I got that bit. But I guess what I'm asking is what is it about DX9 that could possibly facilitate that advantage. I can't think of any (dis)advantage that a unified design may have in DX10 that won't also translate to an (dis)advantage in DX9. Really looking for some insight here .....
I'm trying to figure that out myself. :smile:
Just brainstorming here, but wouldn't a unified architecture have better potential geometry shader performance?
ERK
EDIT: I guess that doesn't really address the point--the reason it may be better (load balancing and lots of geometry power) is just the reason it would also be good for DX9. :(
Just brainstorming here, but wouldn't a unified architecture have better potential geometry shader performance?
During geometry-only passes a unified architecture should be utterly incredible - subject to bandwidth/fetch though.
But code that uses dynamic branching could really suffer in comparison with a discrete architecture. In a discrete architecture it's possible to have each invocation (each primitive) run in its own "thread". In a unified GPU, the SIMD architecture of the pixel shader pipes gets applied to the GS, and so 16 (or 64, whatever) primitives all find themselves lumped together in one thread.
That means an IF THEN ELSE statement in the code will run both clauses for ALL primitives, even if 15 primitives want to execute THEN and 1 primitive wants to execute ELSE. So dynamic branching is handicapped somewhat...
Jawed
The implication was that SM4 version of code goes faster (just like SM3 Far Cry goes faster in some places than the SM2 version) or that extreme effects, such as "super voluminous smoke" take the place of crappy billboard smoke.
Question of degrees...
Jawed
True but I think that speed increase has alot more to do with the current Windows overhead. So it should come across to Dx9l to some degree. But this is uncharted territory for me so if someone else can chime in here it would be great ;)!
Brimstone
14-Sep-2006, 02:57
Brimstone has stated his belief in the G80/RSX linkage in various threads in the Console forums which is a pretty good indication that he's not joking! He's either got the scoop of the century or (almost certainly) is just utterly wrong! :smile:
I'm also guessing that PS3 RSX will have a XDR bus and nVidia might use XDR on their upcomming GPU's as well.
Rambus has clearly stated they're targeting both ATI and nVidia with Rambus technology. XDR makes a lot of sense for bandwidth hungry GPU's.
From a Rambus analyst meeting 6/01/2006
Harold Hughes: Could you restate the second question regarding licensing going
forward?
Person 1: Yes ...
Harold Hughes: With regard to types of companies or with regard to our
strategy?
Person 1: Types of companies and the strategy, timeline that we should expect,
you know, companies ...
Harold Hughes: Well, timelines, I wouldn't want to give you anything specific.
After all, you need to analyze these things, but the process is important.
Previously, we were able to engage with a relatively small number. Under
Sharon's structure, we're able to engage with a significant number, literally
in the dozens, quite frankly. And much of what we do is to use the 3 processes
that we discussed. The litigation process, which John described very ably, is
moving well. That is obviously focused primarily at DRAM companies and your
guess is is -- and you can listen to John and stuff when he expects something
to happen there, but we also work with that interaction of existing DRAM
companies and their ability to make higher performance products incorporating
our technology so as to talk -- so as to entice, if you will, controller
companies to see the advantages of those functions, as they themselves have to
compete. And, as we win those, we try to go further up the chain of platforms.
We've established, I think, a very good position in games. We're moving
forward in digital television. Obviously, one of the rungs on the ladder is PC
graphics. There there are obviously two suppliers, two incumbents [ATI
Graphics (ATYT) and Nvidia (NVDA)], and multiple suppliers, and ultimately up
to that. Now, when will that happen? I couldn't possibly commit. Do we
engage with them? Certainly. Sharon?
...
Person 4: Harold, you alluded to engagement with the graphics memory
companies, and is that just regarding systems licenses or have they been put on
notice for producing -- for GDDR itself?
Harold Hughes: Setting the legal issues aside -- John can talk about that --
obviously, there's a reason for us to want to embed the features that we've
created for the PlayStation 3 into a PC-based graphic part that then works with
a graphics controller produced by the incumbents. So, that would be the
primary business reason for talking to 'em.
http://rambus.org/cc/2006-06-01_Transcript.txt
Sony signed a second contract with nVidia. From what I understand, this contract revenue is greater than the first contract between Sony and nVidia (this needs to be double checked). My guess this is another GPU for the PS3 platform.
Sony needs to drive volume of XDR. Why use GDDR, when you could use XDR for the GPU and double the volume of XDR you consume? Elpida is going to be the main source of XDR, so Sony will have to utilize the PS3 as the major driver of XDR volume.
If you're implying that RSX will be G80... I am going to explode. I will go ahead and change my nick to Tsar Bomba and everything. (Please just say it was a joke. I'll sleep better.)
Also, I think there was some consensus that G80 is not going to be anemic in the VS department as they first seem compared to R600's unified architecture.
I actually thought the same quite a long time ago. It just seemed like a nice idea given the way the ps3 kept getting pushed back. If you are going to be late it is nice to have the technical edge at least. I have no real reason at all to think that, and yes a massive change such as that should be leaked by now etc.. etc... it just seemed to fit nicely.
trumphsiao
14-Sep-2006, 04:06
................
trumphsiao
14-Sep-2006, 04:10
Die size, 2 cm x 2 cm:edit, sounds right it would be around 400 mm double the die size of the g71
Dx 9 performance is very fast: expected
12 memory chips: Rumor bin, was stated here
1/3 of its die size is for dx 10 functionality: around 120-166 million tranis if it is a 500 million trani chip sounds about right. (gotta account for pure video)
The rest of it is all pure speculation. Probably because of the 1/3 of die size is for dx10 operations. In Fuad's mind must of drew a few lines were dx9 and dx10 can't overlap. Of course dx10 performance will be poor if the g80 was designed with distinct dx10 portion and a dx9 portion, but it would be stupidity beyond believe if nV would try such a thing as they did with the fx.
sorry ,Final Die size of G80 is smaller than that of R580. Flectronics just received order for first batch of limited G80 boards. and you will at least see mass of G80 products by Dec.
Rangers
14-Sep-2006, 04:41
"Mr. Huddy said that Xbox 360 game console, which sports developed by ATI Xenos graphics core with unified shader architecture and 48 shader processors, loses 20% to 25% performance in pixel-shader limited games, when its graphics chip is configured as non-unified, e.g.,16 processors work strictly on vertex shaders, whereas 32 are assigned for pixel shaders. "
You guys are completely misinterpreting this qoute. The shaders in 360 are able to be dynamically allocated. Huddy is saying that if you "lock down" Xenos as 16 vertex, 32 pixel, ratio, then when you are pixel shader limited, performance goes up when you "un-lock" the ratios, because it just dynamically allocates more ALU's to pixel shading when that is the bottleneck. Which is kind of a big duh. In the former locked case, it is limited to 32 shaders and 16 vertex and that is that. In the latter case, if the bottleneck is pixel shading, then it may allocate (via time slicing, etc) say, 8 ALU's to vertex and 40 to pixel, thereby relieving the pixel shader bottleneck, etc.
Speak nothing to how it would perform versus a non unified architecture at all.
Also on Nvidia, if ATI ever wakes up and makes actual competitive performance size chips, I think Nvidia's chasing of 40% profit margins is going to hurt them, because they seem to be making "relatively" small chips time after time. If these rumors of G80 being smaller than R580 are true of course.
Skrying
14-Sep-2006, 04:45
So is G80 on 90nm or 80nm? How much of a difference would it cause on die size?
Ailuros
14-Sep-2006, 05:31
The implication was that SM4 version of code goes faster (just like SM3 Far Cry goes faster in some places than the SM2 version) or that extreme effects, such as "super voluminous smoke" take the place of crappy billboard smoke.
Question of degrees...
Jawed
If Far Cry would had been differently optimized from day one, any additional SM2.0_b or SM3.0 path wouldn't had brought any worth mentioning performance increases.
http://www.mitrax.de/?cont=artikel&aid=24&page=10
All I'm trying to say is that FC isn't necessarily the best example for such cases.
So is G80 on 90nm or 80nm? How much of a difference would it cause on die size?
80nm is only 17% smaller so I don't see the g80 being smaller then the r580+ even if it was on 80nm
Chalnoth
14-Sep-2006, 11:50
80nm is only 17% smaller so I don't see the g80 being smaller then the r580+ even if it was on 80nm
The NV71 is roughly half the die size of the R580, so there's no reason for it to be much bigger, if at all.
LeStoffer
14-Sep-2006, 11:52
Guys and girls: Regarding Crysis and performance on DX10 vs DX9 please keep this in mind:
Game Informer: There’s been some confusion on someone who said that Crysis runs better on Windows XP and DX9 than Vista and DX10. Can you help clarify that?
Crytek’s CEO and President Cevat Yerli: What I said was DX9 on Vista runs better than DX9 on XP. So if I have a gaming PC, and I’m comparing DX9 on XP versus DX9 on Vista, Vista runs better. The operating system is optimized for graphics drivers and optimized for gaming. It’s optimized for more controller functions – the game simply runs more smoothly. Just seeing Far Cry - it runs better. That’s a hell of an achievement because for me that’s reason enough to change operating systems.
http://www.gameinformer.com/News/Story/200608/N06.0830.2058.31148.htm
The NV71 is roughly half the die size of the R580, so there's no reason for it to be much bigger, if at all.
Possibly, would be interesteing to see a 500 mill tran chip at the same size as a 380 mill, god it will be getting pretty hot :sad:
Guys and girls: Regarding Crysis and performance on DX10 vs DX9 please keep this in mind:
http://www.gameinformer.com/News/Story/200608/N06.0830.2058.31148.htm
Nothing can make me install Vista. Nothing. If that means no more gaming at some point, then it means no more gaming for me.
EDIT: which means, I'll buy a console and not stop gaming completely. My next "PC" will be a Mac.
But code that uses dynamic branching could really suffer in comparison with a discrete architecture. In a discrete architecture it's possible to have each invocation (each primitive) run in its own "thread". In a unified GPU, the SIMD architecture of the pixel shader pipes gets applied to the GS, and so 16 (or 64, whatever) primitives all find themselves lumped together in one thread.
That means an IF THEN ELSE statement in the code will run both clauses for ALL primitives, even if 15 primitives want to execute THEN and 1 primitive wants to execute ELSE. So dynamic branching is handicapped somewhat...
The "width" of the threads is an implementation detail, it's not dictated by whether an architecture is unified or not. Consider a CPU a unified shader architecture, and just about everything is possible, whether it is the granularity being 16, 64 or 1024, variable, or even changing on-the-fly while branches are taken. Just like R5xx vertex shaders are SIMD while G7x vertex shaders are MIMD.
trumphsiao
14-Sep-2006, 13:04
Possibly, would be interesteing to see a 500 mill tran chip at the same size as a 380 mill, god it will be getting pretty hot :sad:
Nvidia just placed order on flectronics . everything is settled down. No one knows exactly the G80 architecture shall be except some Nvidia employers . but beware the limited G80 boards coming in late Dec.PCB board number is P355.:razz:
trinibwoy
14-Sep-2006, 14:28
Limited boards in late Dec huh. Guess we'll need a newer and even more improved rumours thread by then :smile:
trumphsiao
14-Sep-2006, 14:41
Limited boards in late Dec huh. Guess we'll need a newer and even more improved rumours thread by then :smile:
somone should go to flectronics and sneakly steal one by Nov . equally less than 1/10 of PS3 numbers provided in America you can buy around the world this year .
Dave Baumann
14-Sep-2006, 15:30
Wasn't there a quote from someone at ATI that mentioned that the Xenos was faster than an equivalent (# of shaders) non-unified design?
edit. found it. (http://www.xbitlabs.com/news/video/display/20060525104243.html)
"Mr. Huddy said that Xbox 360 game console, which sports developed by ATI Xenos graphics core with unified shader architecture and 48 shader processors, loses 20% to 25% performance in pixel-shader limited games, when its graphics chip is configured as non-unified, e.g.,16 processors work strictly on vertex shaders, whereas 32 are assigned for pixel shaders."
Interestingly the performance differences you see between unified and non-unified when reducing the number of quads for both PS and VS is even greater than the testing scenario here.
Interestingly the performance differences you see between unified and non-unified when reducing the number of quads for both PS and VS is even greater than the testing scenario here.
Dunno whether this is doable or not but it would be nice to turn unified shading off and configure the GPU has having some meaningful ratio between PS and VS (not 2:1)..as 4..6:1 to make it more similar to current non unified GPUs..
Dave Baumann
14-Sep-2006, 15:53
Whas been done is to exclusively force VS threads on one SIMD and PS threads on the other two, effectively making it non unifed. Further to that the quads can be turned off in each SIMD array, mimicking lower end boards - the ratio stays the same (1 VS : 2 PS) but when you look at lower end discrete boards they are closer to these ratios in the first place.
PeterAce
14-Sep-2006, 16:18
Whas been done is to exclusively force VS threads on one SIMD and PS threads on the other two, effectively making it non unifed. Further to that the quads can be turned off in each SIMD array, mimicking lower end boards - the ratio stays the same (1 VS : 2 PS) but when you look at lower end discrete boards they are closer to these ratios in the first place.
Intersting in several ways.
I wonder if switching off quads within a SIMD array, will be used (rather that lowering the core clockrate) for the lower power requirements of Vista 3D interface.
Sunrise
14-Sep-2006, 16:32
somone should go to flectronics and sneakly steal one by Nov . equally less than 1/10 of PS3 numbers provided in America you can buy around the world this year .
It´s Flextronics, not flectronics and those PCBs you are talking about already were in production several weeks ago, while supply is limited at first, like i hinted in my last post. NV just really wants to be out first and they have far better cards (nice play of word here, lol) on their hands than ATi. The price will be pretty steep, so you shouldn´t expect mass demand, anyway.
trumphsiao
14-Sep-2006, 16:44
It´s Flextronics, not flectronics and those PCBs you are talking about already were in production several weeks ago, while supply is limited at first, like i hinted in my last post. NV just really wants to be out first and they have far better cards (nice play of word here, lol) on their hands than ATi. The price will be pretty steep, so you shouldn´t expect mass demand, anyway.
What I talked is Flecteronics would at least have production samples by Nov. Nvidia just placed order several days ago.
trumphsiao
14-Sep-2006, 16:50
Dunno whether this is doable or not but it would be nice to turn unified shading off and configure the GPU has having some meaningful ratio between PS and VS (not 2:1)..as 4..6:1 to make it more similar to current non unified GPUs..
Dunno,so dedicated style like G80 overall is there any advantage over R600 ??
Dunno,so dedicated style like G80 overall is there any advantage over R600 ??
we really don't know, both approaches have pros and cons, even though in the long term the unified approach will probably be the most used one
Dunno,so dedicated style like G80 overall is there any advantage over R600 ??
Depends on what the applications are needing what really, and also depends on how many processing units each chip has.
Dunno,so dedicated style like G80 overall is there any advantage over R600 ??
In the situation which favors the load balance of the G80, I assume R600 should have a tiny bit of overhead to fight with. But the opposite scenario would let R600 wipe the floor with G80 IF(!) G80 should indeed be 100% "hard-wired" like that.
chavvdarrr
14-Sep-2006, 19:48
In the situation which favors the load balance of the G80, I assume R600 should have a tiny bit of overhead to fight with. But the opposite scenario would let R600 wipe the floor with G80 IF(!) G80 should indeed be 100% "hard-wired" like that.What about IF(!) G80 has more units?
What about IF(!) G80 has more units?
Well.if it's not unified and it has not more units..then something has gone terribly wrong ;)
NocturnDragon
14-Sep-2006, 22:03
Well.if it's not unified and it has not more units..then something has gone terribly wrong ;)
Maybe it only has half as many but it has über-units! :P
Maybe it only has half as many but it has über-units! :P
It does not matter, the important thing is that it should be able to perform more work (in theory..:) ) if we assume that a unified shader is on average more efficient/takes more area.
I know that I'm really oversimplifying here but you get the picture, right? ;)
NocturnDragon
14-Sep-2006, 22:49
It does not matter, the important thing is that it should be able to perform more work (in theory..:) ) if we assume that a unified shader is on average more efficient/takes more area.
I know that I'm really oversimplifying here but you get the picture, right? ;)
Sure I get the picture.
I was joking remembering the rumors about the R420 before it was launched.
Dave Baumann
14-Sep-2006, 22:53
For area inefficient designs, curious that such schemes have been implemented by PowerVR for handheld devices and by Intel for (cost consious) integrated graphics... ;)
Tim Murray
14-Sep-2006, 23:06
For area inefficient designs, curious that such schemes have been implemented by PowerVR for handheld devices and by Intel for (cost consious) integrated graphics... ;)
Okay, sweetie, we get the picture. Now how about giving us some of those "unified versus traditional" benchmarks instead of making vague references to them? :razz:
For area inefficient designs, curious that such schemes have been implemented by PowerVR for handheld devices and by Intel for (cost consious) integrated graphics... ;)
It's not about inefficient designs, it's about {insert design A here} begin less efficient than {insert design B here} for a given {insert your comparison metric here}
In this case to know how to fill the empty spaces reading this thread title is a necessary and sufficient condition ;)
I think the ultimate is a "unified" design that has a heterogeneous shader architecture :) Basically have a bunch of still quite heavily threaded units that do well at general purpose stuff (branching, random memory access, etc...), but which can also farm out a vector of data to vectors of ALUs. Then remove any restriction on which type of datum gets processed where (fragments/vertices/primitives can be processed either by the general purpose units, or by the vector units, or (for complex shaders) by both types of unit).
DaveBaumann - I'm not sure that really shows anything much relevant to G80/R600. A non-unified design should have a lower worst-case performance at a given area than a unified design - at the performance levels we are talking about this could be the difference between tolerable and unplayable framerates, hence motivating the unified design. Also, I don't think area comparisons at the low end are particularly valid when moving to the high end. For example, 1 unified shader might very well be smaller than 1 VS and 1 PS, yet 8 unified shaders could be larger than 2VS and 8PS.
Furthermore, aren't both the Intel and PVR designs both deferred? I would expect this to make a unified design more attractive (you don't have to be as smart when scheduling things since the vertices being processed are completely independent from the fragments being processed).
Dave Baumann
15-Sep-2006, 00:04
Control logic for unified has to simply cost less than lobbing in a few extra, decicated, ALU - its at the low end where this is going to be felt most.
For reference, previous Intel designs were certainly deferred, however I've not heard of it in relation to G965 - the do make note of dynamic load balancing as a result of the unified architecture though.
LightHeaven
15-Sep-2006, 00:09
I havent read all the pages because the thread is big, but on Gamefest there's a slide where they state G80 will be a 48 pixel pipes Gpu and that the number of vertex pipes was still undecided... Is it a already know information, pure speculation, or did someone inside Ms reveal something that shouldn't?
Dave - I don't suppose you could clue us in as to how the shader interconnect scales? I was thinking that a dedicated architecture will have dedicated buses from VS to rasterizer, scheduler to PS, PS to ROP, which can be sized independently. The unified approach has to deal with the general case (any unit can send data around)... does this end up costing more on a unified architecture (interconnect is maybe more complicated, but I guess you could send all the different data types via the same interconnect and load balace for better utilization) or the dedicated one?
StellaArtois
15-Sep-2006, 07:30
Is it a already know information, pure speculation, or did someone inside Ms reveal something that shouldn't?
To summarise: Interesting, but old info.
http://www.beyond3d.com/forum/showpost.php?p=829310&postcount=107
What about IF(!) G80 has more units?
It definitely woun't have more VS/GS then R600 overall "pipes", for example. Assuming an app which only uses VS and no PS (purely theoretical extreme case), you'd have all R600 pipes doing VS, and I think there's no chance that G80 will have that many available.
But theoretical cases aside, it's the apps/games which will favor either g80 or r600. So the programming will decide the winner this time since the architectures are (ass-umed) that different.
trinibwoy
15-Sep-2006, 14:32
I think it will just come down to whether Nvidia guessed right. If R600 load-balancing results in a pixel/vertex shader allocation relatively close to what Nvidia has hard-wired into G80 then things will be interesting. R600 is obviously going to wipe the floor in any and all theoretical shader bound tests and it has a lot more potential for dominance if it doesn't pay too high a price in overhead for its flexibility.
That's of course assuming that Nvidia was not able to work some magic and squeeze more performance/efficiency out of each unit so that 48 G80 PS ~ 64 R600 PS.
ninelven
15-Sep-2006, 18:57
I don't really think "guessing right" is the right terminology. At the end of the day, it is up to the game developers and how they achieve the desired results. There really isn't any way to predict how aggressive they will be in exploiting the advantages of a unified architecture.
Dave Baumann
15-Sep-2006, 19:14
Well, one of the points of it is not to have to have the need to specifically exploit for it.
Of course, it does also allow the developer more freedom in how they distribute the resources such that they could make choices that they wouldn't necessarily on a traditional architecture.
NocturnDragon
15-Sep-2006, 19:41
It's pretty clear that if the developers will use a PS/VS balance close to what the G80 will have, the match for Ati will be harder, no one outside NDAs knows yet who might win, but in all other scenarios (without even going to the eccessive PS and VS limited cases) the R600 will have a edge.
Well, one of the points of it is not to have to have the need to specifically exploit for it.
Of course, it does also allow the developer more freedom in how they distribute the resources such that they could make choices that they wouldn't necessarily on a traditional architecture.
Bouncing Zabaglione Bros.
15-Sep-2006, 20:36
It's pretty clear that if the developers will use a PS/VS balance close to what the G80 will have, the match for Ati will be harder, no one outside NDAs knows yet who might win, but in all other scenarios (without even going to the eccessive PS and VS limited cases) the R600 will have a edge.
But is that how developers engineer their games? As far as I can tell, developers have a baseline, and then loads of stuff you can turn on in order to get extra features and IQ if you have the horsepower to do so (ie more effects, higher resolutions, AA, AF etc). We've seen many triple-A games that were simply not able to run at their full capabilities when they were released, because they had the capability to outstrip even what the top end graphics cards could offer.
For a fixed PS/VS architechture like G80, you have a sweet-spot that the developer may try and hit for his baseline. However, as soon as you start turning on all those extra features, chances are you will move away from that baseline, leaving some of your fixed architecture idle while waiting on where ever the bottleneck has been moved to.
The advantage of the unified architecture, is of course that it can effectively reconfigure itself to whatever balance the games needs (whatever the settings you choose), in order to get the maximum performance out of all the transistors. In fact, the unified architecture could reconfigure itself on the fly depending what the game is doing from scene to scene.
Although there may be overhead for this kind of flexible, unified approach when compared to a fixed architecture operating at it's sweet-spot, the picture gets very blurry when you consider that a game may easily move the fixed architecture away from it's sweet-spot as you enable all the extra eye-candy.
In this case the goalposts get moved, but the unified approach enables you to "chase" those moving goalposts and dynamically create whatever new sweet-spot is required, whereas the fixed architecture is left to suffer.
There really isn't "the balance" for a given game. Or even a frame. Load distribution varies all the time. Just think about how objects tend to occupy less pixels as they are moved away from the viewpoint, and you'll surely see what I mean.
Or think about how definitely not vertex bound a tone mapping pass is going to be, regardless of how many gazillions of polygons are visible.
well usually the max settings is what we target at 30 fps on the fastest card available or next gen is expected at. Usually target around a 50% increase in performance.
This is why we see many games on different engines come out with around the same polys per frame, so far other then Far Cry no games really go above 200k per scene, Far Cry is an odd ball, at least for outdoors, where pixel shaders weren't taxed much so they could do that since the vertex calculation amounts were limited. 300-350k is the target for polys per frame for this gen cards (7900 and x1900xtx) So vertex shaders won't be overly taxed on next gen. Actually I don't think vertex shaders are taxed at all yet, so staying with the fixed vertex shader counts really doesn't hurt. The more vertecies that are drawn the harder it will be for pixel shaders units to keep up, since there are alot more calculations needed to be done on this front (this was all the talk about FEAR with the 1 to 8 pixel shader ratio).
Edit
Using simple shaders like normal mapping current gen cards could do well about 3 million polys per scene with sm 3.0 4 lights per object in one pass. So adding in more complex shaders that figure will drop conciderable manily because the pixel shaders will be taxed greatly.
The more vertecies that are drawn the harder it will be for pixel shaders units to keep up, since there are alot more calculations needed to be done on this front (this was all the talk about FEAR with the 1 to 8 pixel shader ratio).Is this right, that more vertices leads to more pixel shader calcs? And what was that ratio, VSs:PSs (surely not tex:math, given the context)?
Is this right, that more vertices leads to more pixel shader calcs? And what was that ratio, VSs:PSs (surely not tex:math, given the context)?
for each new vertex, normals have to be recalculated for the pixel shaders to do thier thing specifically for animated objects and some instances for static as well. So the pixel shaders will be used more with more verticies being drawn.
edit: its highly dependent on the shader being drawn, regular normal mapping and simple parallex is about 1 to 2 or 3, once ya get into it POM, it will sky rocket.
for each new vertex, normals have to be recalculated for the pixel shaders to do thier thing specifically for animated objects and some instances for static as well. So the pixel shaders will be used more with more verticies being drawn.
Which normals? Aren't they (usually) sent down with the vertex data (position, tex coords, etc).
Do you mean the interpolation for the vertex data from the VS to the PS?
Pixel shaders handle fragments, they don't care about vertecies... They will be used more if those vertecies generate more fragments.
Please correct me if I am wrong.
Which normals? Aren't they (usually) sent down with the vertex data (position, tex coords, etc).
Do you mean the interpolation for the vertex data from the VS to the PS?
Pixel shaders handle fragments, they don't care about vertecies... They will be used more if those vertecies generate more fragments.
Please correct me if I am wrong.
vertex normals, to calculate anything like normal maps, once a mesh moves, vertex normals change, so these have to be introduced into the pixel shader to get the final outcome. So lets looks at regular lighting phong or blinn with normal mapping. All vertex normals are calculated at load time and then when animated objects are concerned vertex normals are recalculated as needed and sent to the pixel shader to spit out the output.
In a phong type lighting system without any type of bump mapping you are correct there is no change since we won't be using the vertex normal to calculate anything.
But anyways, what the goal of real time lighting is to do all the vertex calculations at one time and one time only in one pass to do this vertex calculations are done in a post processing stage.(deffered shading and some other new lighting models coming out) but this fails with animated objects.
sorry had to get a quick bite to eat ;), but now with newer games being much more interactive with their environment, more animated objects are going to be used. Crysis for example all thier trees are interactive, which just increases the vertex work load and inturn increases the pixel shader work load that much more. (the ratio will stay the same, but the pixel shader limit will be hit first since the ratio of the pixel shaders will be higher), and this is where the r600 "should" come out on top, but it all depends on the number of calculations that can be done on each GPU.
MistaPi
16-Sep-2006, 10:48
From someone I trust:
- 40% dedicated to D3D10 (a little unclear if this means of the entire die).
- NDA expires beginning of november.
- Hard launch.
- Aims for good availability for 3 SKU's for chrismas sales.
LeStoffer
16-Sep-2006, 12:45
From somone I trust:
- 40% dedicated to DX10 (a little unclear if this means of the entire die).
That sounds totally weird: dedicated :???: VS and GS should be unified on G80 and the new added integer instruction set shouldn't demand that much extra die space - especially not if their patent to combine fp and int is used in a single ALU?
What am I missing here?
There's so much more into D3D10/SM4.0 than integer support..a 40% increase would not surprise me at all
- Aims for good availability for 3 SKU's for chrismas sales.
Shocka! :twisted:
trumphsiao
16-Sep-2006, 15:26
Shocka! :twisted:
Benchmark 3Dmark06 increase very much same as from R9700 Pro to X800XT.
I also heard G80 architecture supporting orthogonalized Frame Buffer which exactly I dunno what it can do for ?
I also heard G80 architecture supporting orthogonalized Frame Buffer which exactly I dunno what it can do for ?
It means that it does not matther what format your frame buffer is, all features (related to the frame buffer) are supported, basically your fb is orthogonal (is not related at all, it comes from linear algebra..) to anything else....MAYBE ;)
trumphsiao
16-Sep-2006, 15:29
From someone I trust:
- 40% dedicated to D3D10 (a little unclear if this means of the entire die).
- NDA expires beginning of november.
- Hard launch.
- Aims for good availability for 3 SKU's for chrismas sales.
I aleady heard 80nm SOI G80 will simultaneouly come out with Vista/New Office.
I aleady heard 80nm SOI G80 will simultaneouly come out with Vista/New Office.
to Uttar: ZRAMMM! LOL :)
trumphsiao
16-Sep-2006, 15:36
From someone I trust:
- 40% dedicated to D3D10 (a little unclear if this means of the entire die).
- NDA expires beginning of november.
- Hard launch.
- Aims for good availability for 3 SKU's for chrismas sales.
1. I wager this time we will see lots of G80GT.
2.Nvidia just placed order several days ago which punctually confirm we would/shall at least see batch of G80s by Nov or Dec
3.but first batch of G80s only 40K for sure.(Hope OEM like Dell will not swallow too much portion of G80s .
trinibwoy
16-Sep-2006, 16:18
It means that it does not matther what format your frame buffer is, all features (related to the frame buffer) are supported, basically your fb is orthogonal (is not related at all, it comes from linear algebra..) to anything else....MAYBE ;)
How big of a deal is this? Besides FP16 AA support what's the current situation with support of different FB formats?
How big of a deal is this? Besides FP16 AA support what's the current situation with support of different FB formats?
It would be nice to have all blending modes + MSAA and/or MRT working on all fb formats, MSAA and MRT completely orthogonal
Aren`t they supposed to be putzing around with a new AA algorithm this turn?Perhaps that`s where the orthogonality comes from
Aren`t they supposed to be putzing around with a new AA algorithm this turn?Perhaps that`s where the orthogonality comes from
what do you mean?
That they were supposed to implement a new way of doing AA, and I`m guessing(and i underline that) that they`ve made it work with all blending modes(and as a wet dream, with alpha textures AND within surfaces in order to alleviate specular/shader aliasing(but the last part is just something that i`d like, even though i know it`s beyond unlikely:))).
Chalnoth
16-Sep-2006, 19:06
It means that it does not matther what format your frame buffer is, all features (related to the frame buffer) are supported, basically your fb is orthogonal (is not related at all, it comes from linear algebra..) to anything else....MAYBE ;)
I'd be a bit surprised if they offered blending and MSAA for FP32 rendertargets, and texture filtering for FP32 textures.
I'd be a bit surprised if they offered blending and MSAA for FP32 rendertargets, and texture filtering for FP32 textures.
IF (quite a big if) they will use their pixel shader ALUs to perform blending (thus removing blenders from ROPs) it should be easier (but not easy) to support MSAA and alpha blending on FP32 RT.
DmitryKo
16-Sep-2006, 23:07
ZRAMMM!Too good to be true.
PS. I sold my Rendition Verite V2100 card when the Rendition V3300 was cancelled, then shortly thereafter a V4400 core was announced - it would feature 4 MB eDRAM and a dedicated socket, and I thought "WOW! That would be my next 3D card"... but then it was cancelled altogether and so I bought a Riva TnT, then changed it for GF2 MX400, GF3 Ti 200, GF4 Ti 4200, GF5900LE, GF6600... and still hoping to see embedded memory in a PC part someday :)
39 pages of comments... G80 must be the most mysterious chip ever :)
39 pages of comments... G80 must be the most mysterious chip ever :)
This comment is not fair! ;)
39 pages of comments... G80 must be the most mysterious chip ever :)
Ooooh, the finger twitched over the neg rep on that one, Mr. Smarty Pants Person. :lol:
If one added up the pages on the four main R520 threads, it would be truly frightening --tho the ~6 month delay obviously was a major contributor on the what/why/how front to that total.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.