PDA

View Full Version : G80 rumours


Pages : [1] 2 3

IbaneZ
21-Feb-2006, 01:05
http://www.xbitlabs.com/news/video/display/20060220100915.html

Nvidia’s code-named G80 graphics processing unit (GPU) will incorporate 48 pixel shader processors and an unknown number of vertex shader processors, some unofficial sources said. The chip is still expected to support feature-set of DirectX 10 along with Shader Model 4.0, even though it will not take advantage of the unified processors that can compute both pixel and vertex shaders.

LVSeminole
21-Feb-2006, 04:46
I know what they mean(I think) when they say hybrid. But, I dont get what they mean when they say "dedicated". I am taking "dedicated" to mean about the same thing as unified. Now, maybe this means the Vertex and Pixel shader processors work independently of each other.

This is all assuming I am right in thinking the NV40/G70 dont opterate independently in that sense.

Geo
21-Feb-2006, 05:32
Whee! 48ps and some unknown number of vs (but unlikely to be less than 10). Plus HDR/AA, recently confirmed. Even at 80nm that sounds pretty aggressive.

Farid
21-Feb-2006, 06:21
I heard most part of this rumor... Except the 48 PS part, which is new.

stevem
21-Feb-2006, 06:24
It does sound good, & fits well with G71 rumours/timeframe.

Hellbinder
21-Feb-2006, 06:55
So it will be an R580??? :?:

That hardly seems to make any sense :!:

R600 is 64 unified Shaders, with an enhanced scheduling logic based partially off Xenos.

ants
21-Feb-2006, 06:59
I'm guessing that they took the TMUs out of the Pixel pipes and are now sharing them between the VS and PS units. Hybrid design?

The 48 pixel shaders seems a bit low though, very high clocked part (if expecting 2x previous generation)?

ants
21-Feb-2006, 07:01
...
R600 is 64 unified Shaders, with an enhanced scheduling logic based partially off Xenos.

How many texture units?

Basic
21-Feb-2006, 07:45
I am taking "dedicated" to mean about the same thing as unified.
There were probably a negation missing there, but just to make it clear.
No in this case "dedicated" means the opposite of "unified". VS and PS are different parts of the hardware. They should have the same capabilities from a software pov though.

Kombatant
21-Feb-2006, 07:53
R600 is 64 unified Shaders, with an enhanced scheduling logic based partially off Xenos.
Care to explain your certainly about that?

Chalnoth
21-Feb-2006, 08:40
I won't comment on Hellbinder's certainty, but that sounds close to correct. But the texture units are a big question: will it be like the Xenos with one set of 16 bilinear sample texture units, and one set of 16 point sample texture units? Twice that seems highly unlikely.

Ailuros
21-Feb-2006, 10:54
Pardon if I twinch when I read about 48 pixel processors these days. Is that an ALU, a SIMD channel or anything else?

Chalnoth
21-Feb-2006, 11:08
Agreed. Not close to enough is known about the supposed architecture the rumor is about to say anything about what that means.

JHoxley
21-Feb-2006, 11:17
I'm not 100% clear on where the various chip codenames fall into place...

The G70 is out now, the G71 expected soon? This G80 is therefore the next generation (GeForce 8 series?) product.

By them saying "hybrid" would it be realistic to read that as a "We have some D3D9 hardware that we bolted some extra goodies on so that it fits the D3D10 specification"? Rather than being a chip that was built from the ground up to be a D3D10 part first and a legacy/D3D9 part second?

Jack

Chalnoth
21-Feb-2006, 11:22
Well, nVidia's never going to throw out all of the previous R&D and design an utterly new architecture. That'd be a big waste of resources. So no matter what, we're going to see the heritage of the NV4x architecture in the G80.

In the end, though, it all comes down to performance. For example, DX10 still has vertex and pixel shaders, though the instruction set is now identical between the two. This doesn't necessarily require a unified pipeline design, and such a unified design would be a performance optimization.

So this may be all that the rumor is stating: that it isn't unified. This doesn't say anything about the performance of some of the things that DX10 requires, which will be interesting to see.

Ailuros
21-Feb-2006, 11:31
I'd speculate for G80 that there won't be separate units for GS/VS; it could be the origin of the "hybrid" rumour.

G80 according to what many speculate so far sounds like the first step into the USC direction and in that regard I don't expect to see as many common aspects with NV4x actually.

overclocked
21-Feb-2006, 11:45
Its almost a fact or 99% that ATIs SM4 part is Unified .
Is there more than me that for the sake of ATIs statements about effeincy rather dont see a unified approach in nVidias SM4 part because of the more interesting comparisons between the two?

Ailuros
21-Feb-2006, 12:03
Most likely it will be the case. There will definitely be longwinded debates about unified and non-unified, which is expectable from a technical POV, but if you ask me it's more important what comes out at the other end.

phenix
21-Feb-2006, 12:14
So it will be G80 vs R600

Does that mean NV50 will be competing aginst R700 or there will be no NV50 at all?

Fodder
21-Feb-2006, 12:18
NV50 was the successor to NV40, NV47 was spun off into G70, G80 is the successor to G70. You do the math. :wink:

Ailuros
21-Feb-2006, 12:24
NV50 was the successor to NV40, NV47 was spun off into G70, G80 is the successor to G70. You do the math. :wink:

That said Rivatuner might very well report 256-bit NV50....etc. ;)

phenix
21-Feb-2006, 12:35
NV50 was the successor to NV40, NV47 was spun off into G70, G80 is the successor to G70. You do the math. :wink:

Yeah but there was this crazy theory that G70 and G80 naming schemes were just for filling the gap before the coming of real NV50. Well, whatever.

Fodder
21-Feb-2006, 12:42
G70 certainly, but I think that G80 and NV50 both referred to same real next generation part.

Tim
21-Feb-2006, 13:22
I won't comment on Hellbinder's certainty, but that sounds close to correct. But the texture units are a big question: will it be like the Xenos with one set of 16 bilinear sample texture units, and one set of 16 point sample texture units? Twice that seems highly unlikely.

We don't even know what the shader processors is like in the R600, if it is C1 like (no minis) then 64 is way too little - with no minis I would expect at least 96 (or more likely 128). Even with minis 64 sound a bit low as it would only result in a vertex + pixel shader improvement per clock of about than 20-25% over R580. Of cause Ati could boost the performance of each shader processor, make some efficiency improvements and/or increase clock speed – to get a more reasonable performance improvement.

DegustatoR
21-Feb-2006, 13:48
“We will do a unified architecture in hardware when it makes sense. When it’s possible to make the hardware work faster unified, then of course we will. It will be easier to build in the future, but for the meantime, there’s plenty of mileage left in this architecture.”
It makes sense in D3D10 (3 different shader types with unified syntaxis). So why are everyone assuming that G80 won't be unified?

I still have troubles believing that NV will make some kind of a "transitional" D3D10 architecture first -- why? for what reason? what's keeping them from doing "proper" D3D10 architecture right from the start?

Remember, if we're talking about NV50 when we're talking about G80 then this architecture was in development for a very long time (Longhorn was originally planned for 2004 isn't it?). I don't think that NV would spend a very long time on a "transitional" architecture, that just doesn't add up from any point of view...

Ailuros
21-Feb-2006, 13:50
Food for thought: under conditionals one could see USCs being much closer to the CPU concept (in relative terms). Think about that when speculating what would make more sense for per ALU capabilities in a USC.

JHoxley
21-Feb-2006, 13:55
Well, nVidia's never going to throw out all of the previous R&D and design an utterly new architecture.Yeah, makes sense. It's possibly so subtle as to be unimportant but I can see different results with a "lets upgrade our D3D9 part to D3D10" versus a "lets create a D3D10 part" - even if both have the same heritage and underlying theory...

So this may be all that the rumor is stating: that it isn't unified. This doesn't say anything about the performance of some of the things that DX10 requires, which will be interesting to see.I've been thinking very much the same thing, and was trying to get some of the D3D developers to comment on it, but they couldn't :cry:

There's loads of places I've seen that will make this very challenging. It'll probably end up matching the true Java mentality - "Write once, test everywhere" :lol:

If it's of any interest to you guys, I was poking around D3D10 last night and came across the fixed set of counters as well as IHV-specific counters - meaning that it's now built directly into the API rather than via some obscure queries and/or PIX stuff. Whether it'll actually be useful or not is difficult to say - but it could allow the graphics engines to perform some dynamic load balancing when it detects that certain stages are maxed out...

Jack

Jawed
21-Feb-2006, 14:06
I still have troubles believing that NV will make some kind of a "transitional" D3D10 architecture first -- why? for what reason? what's keeping them from doing "proper" D3D10 architecture right from the start?
NVidia (Kirk) seemed adherent to the view that vertices and fragments will retain enough peculiarities that a shader pipe tailored to each type is the best in the medium term (let alone the peculiarities of geometry shading, whatever they turn out to be?...).

I like ants's idea, earlier, that the shader pipelines will be discrete, but texturing will be shared. NVidia seems all lined-up to deliver fully-decoupled texturing, and if so then it wouldn't be difficult to share that texturing between GS, VS and PS.

The final remaining question is over dynamic branching. At a guess this is one thing that NVidia prolly already has working well in NV40/G70 VS, as each vertex in flight is independent of its compadres. So there's none of the PS's thread-size affliction to kill VS dynamic branching.

And I imagine that's a key component in NVidia's desire to stick with a discrete architecture.

GS could be similar, with a very fine granularity being preferable, like VS.

So PS is the odd one out and we wait to see how NVidia attacks dynamic branching performance there.

Jawed

Ailuros
21-Feb-2006, 14:46
If PS db on G80 should be lacklustering I'd be very disappointed.

DegustatoR
21-Feb-2006, 14:59
The final remaining question is over dynamic branching. At a guess this is one thing that NVidia prolly already has working well in NV40/G70 VS, as each vertex in flight is independent of its compadres. So there's none of the PS's thread-size affliction to kill VS dynamic branching.
So you think that they'll sacrifice general architectural efficiency in favor of having good DB in VS and GS? That's rather doubtful in our times of PS being the bottleneck most of the time (when CPU isn't a bottleneck of coarse).

I mean, why bother with VS and GS when your chip will be PS-limited anyway? It well may be that going with US will provide you with better PS performance and any VS (and GS?) DB drawbacks will be hidden by PS bottleneck.

Mariner
21-Feb-2006, 15:26
NVidia (Kirk) seemed adherent to the view that vertices and fragments will retain enough peculiarities that a shader pipe tailored to each type is the best in the medium term (let alone the peculiarities of geometry shading, whatever they turn out to be?...).


But did Kirk say this just because he was responding to a questions about ATI's unified shader in Xenos? Bearing in mind NVidia's apparent 180 degree turnaround as regards support for HDR+MSAA I do wonder how much PR there is in Kirk's interviews (in fact with any interviews from IHVs).

It wouldn't surprise me if G80 did have a unified shader but I think it will be more interesting if G80 takes a noticeably different approach to R600. Certainly more to discuss on this board! :smile:

Jawed
21-Feb-2006, 15:57
I mean, why bother with VS and GS when your chip will be PS-limited anyway? It well may be that going with US will provide you with better PS performance and any VS (and GS?) DB drawbacks will be hidden by PS bottleneck.
Perhaps the view is that a unified architecture would harm VS and GS dynamic branching performance, for example.

Also, you can bet that 3DMk will continue to be heavily vertex bound and will prolly introduce some heavy geometry-bound tests.

This isn't just an argument about dynamic branching though - I'd say that's just a small part of it. I think there was talk of, for example, interpolators in pixel shader hardware that would be wasted while a unified architecture was not processing pixels. etc.

Jawed

SugarCoat
21-Feb-2006, 17:25
So you think that they'll sacrifice general architectural efficiency in favor of having good DB in VS and GS? That's rather doubtful in our times of PS being the bottleneck most of the time (when CPU isn't a bottleneck of coarse).

I mean, why bother with VS and GS when your chip will be PS-limited anyway? It well may be that going with US will provide you with better PS performance and any VS (and GS?) DB drawbacks will be hidden by PS bottleneck.

We have no idea the extent to which Nvidia has gone to make a more efficient PS pipe on this chip, i wouldnt use a G70 and double the PS pipe count for example as that would be a severe over simplification. And dont make the mistake that unified is more efficient. That will only be the case through hardware innovation and software support. It is NOT more efficient in terms of basic design. On the contrary it may even be noticably less efficient especially in its teething years.



But did Kirk say this just because he was responding to a questions about ATI's unified shader in Xenos? Bearing in mind NVidia's apparent 180 degree turnaround as regards support for HDR+MSAA I do wonder how much PR there is in Kirk's interviews (in fact with any interviews from IHVs).

It wouldn't surprise me if G80 did have a unified shader but I think it will be more interesting if G80 takes a noticeably different approach to R600. Certainly more to discuss on this board! :smile:



I'll believe Nvidia's open HDR+AA capabilities when i see them, i suggest you do the same and watch out for spin ;).

Mintmaster
21-Feb-2006, 19:04
NVidia (Kirk) seemed adherent to the view that vertices and fragments will retain enough peculiarities that a shader pipe tailored to each type is the best in the medium term (let alone the peculiarities of geometry shading, whatever they turn out to be?...).

I like ants's idea, earlier, that the shader pipelines will be discrete, but texturing will be shared. NVidia seems all lined-up to deliver fully-decoupled texturing, and if so then it wouldn't be difficult to share that texturing between GS, VS and PS.
The only justification for this is that NVidia doesn't care about high performance vertex texturing, and wants to save die space that way.

The problem isn't sharing texture units and caches. Yes, there's a bit of data routing, but that's not a big deal. The problem is that if you want to be able to do vertex texturing fast enough, then you need to absorb the latency, and keep many vertices in flight. Without texturing, you only need enough vertices in flight to keep your pipelined ALU fed (maybe 10 stages?) which is, as arjan correctly pointed out in another thread, about an order of magnitude less than if you want full speed texturing. The register space needed is much less without fast VTF, which is why good dynamic branching in the VS is pretty cheap compared to implementing it in the PS. Indeed, VTF is the biggest impetus to go unified.

My guess is NVidia will do enough to keep VTF speed "acceptable", say the equivalent of 10-15 math instructions. Someone here said the 6800U is capable of 22M vertex fetches per second, which equates to about 70 cycles. Not sure about G70, but I am curious.

Mintmaster
21-Feb-2006, 19:07
I'll believe Nvidia's open HDR+AA capabilities when i see them, i suggest you do the same and watch out for spin ;).
You really think they'll forego this feature in G80?

Jawed
21-Feb-2006, 19:16
The problem is that if you want to be able to do vertex texturing fast enough, then you need to absorb the latency, and keep many vertices in flight. Without texturing, you only need enough vertices in flight to keep your pipelined ALU fed (maybe 10 stages?) which is, as arjan correctly pointed out in another thread, about an order of magnitude less than if you want full speed texturing. The register space needed is much less without fast VTF, which is why good dynamic branching in the VS is pretty cheap compared to implementing it in the PS. Indeed, VTF is the biggest impetus to go unified.
Yeah, you're right, I wasn't thinking of latency. Gawd :oops:

Seems NVidia is looking at a tight spot without a unified architecture (well, tighter than I was thinking).

Jawed

Chalnoth
21-Feb-2006, 20:04
The only justification for this is that NVidia doesn't care about high performance vertex texturing, and wants to save die space that way.

The problem isn't sharing texture units and caches. Yes, there's a bit of data routing, but that's not a big deal. The problem is that if you want to be able to do vertex texturing fast enough, then you need to absorb the latency, and keep many vertices in flight. Without texturing, you only need enough vertices in flight to keep your pipelined ALU fed (maybe 10 stages?) which is, as arjan correctly pointed out in another thread, about an order of magnitude less than if you want full speed texturing.
Well, they could still have high-performance VTF if the vertex units share the texture units with the pixel units. This way, you'd still only have ~10 vertices in flight per vertex pipeline (maybe less), but sometimes an instruction will request a texture op, so it'll go spend time in the very deep texture pipeline, before eventually making its way back to the short vertex pipelines. I believe this is exactly how ATI is obtaining good dynamic branching performance in the pixel shader.

SugarCoat
21-Feb-2006, 20:23
You really think they'll forego this feature in G80?



Not if they figure out how to do it. ~8 months ago Nvidia had little to no interest in researching how to do HDR+AA on a usable level. Now magically they're going to know how to do such a thing? Did ATi hand over an instruction manual explaining to them how to do such a thing? The problem is the HDR+AA on nvidia hardware thing popped up as soon as ATI revealed that the R500 series were all capable of it, so magically Nvidias coming cores will include it. I'm pretty sure it was stated that the G71 would be capable of this which i dont see happening. I dont doubt the R&D capabilities of Nvidia at all, i just wouldnt put the possability that they may have a tough time figuring out how to do it themselves out of my mind. I strongly believe if we see anything it will be a method noticably different then that of ATI's, and the problem is not a matter of if Nvidia can do it but when, i'm just not as easily convined that they can drop it in that easily.

Chalnoth
21-Feb-2006, 20:33
Not if they figure out how to do it. ~8 months ago Nvidia had little to no interest in researching how to do HDR+AA on a usable level. Now magically they're going to know how to do such a thing?
Well, if nVidia knows how to do multisampling on an integer framebuffer, then they know how to do it on a floating point framebuffer. There's really no difficulty at all here. It's all about transistor cost (and performance for a specific transistor cost).

Ailuros
21-Feb-2006, 21:11
Not if they figure out how to do it. ~8 months ago Nvidia had little to no interest in researching how to do HDR+AA on a usable level. Now magically they're going to know how to do such a thing? Did ATi hand over an instruction manual explaining to them how to do such a thing? The problem is the HDR+AA on nvidia hardware thing popped up as soon as ATI revealed that the R500 series were all capable of it, so magically Nvidias coming cores will include it. I'm pretty sure it was stated that the G71 would be capable of this which i dont see happening. I dont doubt the R&D capabilities of Nvidia at all, i just wouldnt put the possability that they may have a tough time figuring out how to do it themselves out of my mind. I strongly believe if we see anything it will be a method noticably different then that of ATI's, and the problem is not a matter of if Nvidia can do it but when, i'm just not as easily convined that they can drop it in that easily.


I severely doubt that; IHVs set priorities and decide how much die space they can dedicated to a given featureset. Since not everything is going to fit at all times into a targeted transistor count, they set priorities. Or are you trying to tell me that ATI isn't taking similar approaches either?

caboosemoose
21-Feb-2006, 21:24
I make this comment purely to wind people up, but it's true all the same:

When I met with Kirk in March 2004 during the build up to dropping the NV40 bomb I asked him about design lead times and how difficult it must be to choose the right feature set in such a fast moving market place when laying down the parameters of a chip that wouldn't be on sale for a couple of years minimum. And he intimated that at that time NV50 was essentially done and dusted. Not taped out or anything like that, but certainly complete in terms of high level design. In other words, you would infer from his comments that any decision regarding unified architectures and supporting AA with HDR would have already been taken back then. Of course, plans and products can and do change, but as we head towards DX10 it's kinda interesting to reflect on, I would think.

Chalnoth
21-Feb-2006, 21:39
Sure. And, since at that time they would have known the NV40 architecture quite well, they would also have been aware of where it was weak (dynamic branching, MSAA with FP16 render targets, vertex texturing), and it would have been very natural to shore up those weaknesses with the NV50.

As an example, nVidia has given as a reason why the NV4x didn't support MSAA with FP16 rendertargets that performance just wasn't going to be good enough. This would seem to imply that one of the first things that nVidia would work on to improve on the NV4x architecture is the performance of a MSAA with FP16 implementation on the architecture, which would naturally lead to all-around better multisampling performance.

Thus, I claim that the primary weaknesses of the NV4x architecture can be seen without paying any attention to what ATI has been doing, and we can therefore expect a very competitive part on all levels from nVidia with the G80.

caboosemoose
21-Feb-2006, 21:49
I largely agree, though I think back then in early 2004 there must have been some scope for interpretation of the future importnace of fp16 / HDR rendering (and not wanting to contradict myself, even today there's barely any HDR games available, and when G80 hits the deck HDR may still be a minority feature).

I'm also not sure how his comments square with knocking up a DX10-compatible chip in general and a unified chip specifically. On these terms, even if G80 isn't unified, you'd have to think that G90 has largely been defined and hence which ever way you look at it NVIDIA already has a unified design in the bag.

SugarCoat
21-Feb-2006, 22:12
I severely doubt that; IHVs set priorities and decide how much die space they can dedicated to a given featureset. Since not everything is going to fit at all times into a targeted transistor count, they set priorities. Or are you trying to tell me that ATI isn't taking similar approaches either?


You just dont go from one way of thinking, to switching your entire featureset because the other guys did it. Basically 8 months ago Nvidia saw it as too expensive to impliment into hardware. Now you honostly think they've turned a complete 360 and have this ability within the G71 or G80? Went from "wont happen" to "easy feature and feat we just tacked on" that quick? It makes zero logical sense. I have to be honost in that i dont see a bright future for raw HDR, i think custom implimentations like Valve's are better reprisentations of where we're headed. If you want nice HDR then it will problably take more work on the part of devs rather then video card engineers.

Cores and what technology/features they possess are written long before the actual product arrives, i see zero hope of the current Nvidia core architecture doing feasable HDR+AA equal to that of ATI and i dont have too much hope for the G80 either. Once again we should remember where even the possability came from; and thats Nvidia hopefuls on forums and small tech sites that also said the G71/G80 would also magically move back to a more angle independant AF. And it didnt start happening until literally the day that ATI started marketing these features. As soon as ATI does something remotely custom or unique Nvidia follows? And so promptly? No.

Chalnoth
21-Feb-2006, 22:24
You just dont go from one way of thinking, to switching your entire featureset because the other guys did it. Basically 8 months ago Nvidia saw it as too expensive to impliment into hardware.
They never said it was too expensive to implement into hardware. They said that the performance wasn't there, so they left it out (presumably to save a few transistors). As I posted just a couple above, this makes it utterly obvious that a next-gen architecture would work on the performance of MSAA + FP16, and therefore implement it.

Jawed
21-Feb-2006, 22:46
I truly believe it was miserliness on transistor budgets, plus - more importantly - an eye to the future with "programmable ROPs" becoming a part of the shader pipeline by dint of the shader core being able to read Colour/Z (and corresponding AA samples) and then able to arbitrarily blend etc.

Maybe that future is upon us with G80. G80 could hide this programmability behind the DX or OGL settings that set render states and activate operations such as AA resolve. It seems to me that NVidia has a habit of aggressively introducing such features (e.g. through OGL extensions) where they can get some exposure in the wild for a few years before finally making it into OGL and DX officially.

I think it's more pertinent to ask, what killer feature could NVidia bring, soon, with programmable ROP functionality? Arbitrary tone-mapping without a post-process pass? Per-fragment full-precision early-out Z testing?

---

ROPs are "fairly independent" chunks of the rendering pipeline. e.g. RV530 has double-rate Z with AA off, a feature that no other ATI PC GPU has (Xenos is the same).

Logically, the difference between FX8 and FP16 AA sample storage & AA resolve is nothing more than data format. The arithmetic is conceptually the same.

Jawed

Mariner
21-Feb-2006, 22:56
They never said it was too expensive to implement into hardware. They said that the performance wasn't there, so they left it out (presumably to save a few transistors). As I posted just a couple above, this makes it utterly obvious that a next-gen architecture would work on the performance of MSAA + FP16, and therefore implement it.

Unfortunately, this is the kind of thing which makes me wonder whether Kirk's comments about Unified Shaders may also have been FUD and leaves me with no real clue to what to expect for G80. If Kirk could pooh-pooh MSAA+FP16 despite obviously knowing it would be supported a year or so later in G80, might he not do the same about a Unified Architecture? Of course, the only reason I can see that he might do this is if NVidia were unsure whether ATI would have a US out a long time before the end of the G70 line.

Perhaps I'm reading too much into stuff which isn't really there... I may be turning into linthat. :???:

Ailuros
21-Feb-2006, 22:59
I think it's time for a real evolution in terms of texture filtering and antialiasing besides all the rest. I guess I shouldn't place my hopes too high though *sigh* :(

Rys
21-Feb-2006, 23:06
I think it's time for a real evolution in terms of texture filtering and antialiasing besides all the rest. I guess I shouldn't place my hopes too high though *sigh* :(
Someone will give you that 16x16 MSAA EER one day dude, don't give up on the dream!

bdmosky
21-Feb-2006, 23:49
Why haven't we seen use of Table Feline filtering like SA mentioned in this article posted years ago?

ftp://gatekeeper.research.compaq.com/pub/DEC/WRL/research-reports/WRL-TR-99.1.pdf

Ailuros
22-Feb-2006, 01:19
You just dont go from one way of thinking, to switching your entire featureset because the other guys did it. Basically 8 months ago Nvidia saw it as too expensive to impliment into hardware.

And one way to interpret it is that they didn't have enough hardware space left to implement a viable sollution. Didn't I mention priorities? Where's floating point filtering or vertex texturing in competing products? Obviously if you'd ask me I'd prefer to have a MSAA + float HDR combination instead, but design decisions have to be taken one way or another.

Now you honostly think they've turned a complete 360 and have this ability within the G71 or G80?

G80 rather and it's a whole new generation and not just a refresh like G70 or G71 to the original NV4x.

Went from "wont happen" to "easy feature and feat we just tacked on" that quick? It makes zero logical sense. I have to be honost in that i dont see a bright future for raw HDR, i think custom implimentations like Valve's are better reprisentations of where we're headed. If you want nice HDR then it will problably take more work on the part of devs rather then video card engineers.

Keep an eye on the UE3 engine then in the foreseeable future. I wouldn't suggest that anyone would play adequately those type of games with anything less than G8x/R6xx either.

Cores and what technology/features they possess are written long before the actual product arrives, i see zero hope of the current Nvidia core architecture doing feasable HDR+AA equal to that of ATI and i dont have too much hope for the G80 either.

ATI didn't use any magic wand to enable HDR+MSAA. Nor is it even close to rocket science either. Upcoming IMG's SGX which is a tiny core for PDA/mobiles (presented in FPGA already) is also capable of such combinations, and albeit TBDRs favour AA combinations with anything floating point it didn't take them unmeasurable resources either to implement it.

We have already a confirmation that NV's next generation will be capable of float HDR + MSAA combinations. And G71 is obviously a refresh of the NV4x line.

Once again we should remember where even the possability came from; and thats Nvidia hopefuls on forums and small tech sites that also said the G71/G80 would also magically move back to a more angle independant AF.

Albeit I personally would love the option, no one ever from NVIDIA stated from what I've seen anything that would suggest any of the kind. Not even a hint or a promise. It's not impossible but if they need the added transistors elsewhere for smaller angle dependency and their transistor budget should be too tight for anything else they've implemented, then I doubt that it will happen.

By the way angle independent is a tad exaggerated. I know I'm cutting straws here, but it's in reality less angle dependent.

And it didnt start happening until literally the day that ATI started marketing these features. As soon as ATI does something remotely custom or unique Nvidia follows? And so promptly? No.

Yes it would. What do you mean by promptly anyway? NV40 was introduced in spring 2004 and it's more than two years from that until G80 (aka a real new generation) arrives.

ManicOne
22-Feb-2006, 01:59
Quick questions; Ailuros is it your thinking that UE3 based titles will require R600/G80 based hardware to perform at maximum? Would 7900GTX/X1900s be up to the task (max settings/high res)? Could this be the fastest outdating of top-end hardware yet seen?

Ailuros
22-Feb-2006, 02:05
Quick questions; Ailuros is it your thinking that UE3 based titles will require R600/G80 based hardware to perform at maximum? Would 7900GTX/X1900s be up to the task (max settings/high res)? Could this be the fastest outdating of top-end hardware yet seen?

It's not that G71/R580 won't be playble in say UT2k7. I just believe that sollutions past those would be more ideal for it. Where do I get my assumptions from? Past history.

Mintmaster
22-Feb-2006, 03:04
Well, they could still have high-performance VTF if the vertex units share the texture units with the pixel units. This way, you'd still only have ~10 vertices in flight per vertex pipeline (maybe less), but sometimes an instruction will request a texture op, so it'll go spend time in the very deep texture pipeline, before eventually making its way back to the short vertex pipelines. I believe this is exactly how ATI is obtaining good dynamic branching performance in the pixel shader.
That doesn't make any sense.

The current data for the vertex processing, i.e. all live registers as well as all vertex input data, must be held in a FIFO or cache. This is where the die space is taken up in pixel units relative to vertex units, and what is meant by "in flight". You put a request in the deep texture pipeline and store the processor state for the current vertex in order to stop your vertex pipe from being stalled. However many cycles it takes between your request and the return of the data, multiply that by the number of vertex shaders to get the number of vertices needed to be in flight for single cycle VTF (i.e. call texld, use the result right after without speed loss). You can divide that number by your targetted latency.

Right now NVidia is doing pretty much what you're suggesting. If there is an independent instruction stream you can execute to fill in the time for the texture fetch then you're fine, but if you need the texture data to continue, then you just have to wait because the vertex pipe isn't long enough to prevent stalling.

Not sure what you're saying about ATI's dynamic branching. They still keep a similar number of pixels in flight as NVidia does, but the pixels are not all executing the same instruction (hence the smaller batch), so the pixel shader has to change between execution states much more frequently and has to keep track of more information too.

DemoCoder
22-Feb-2006, 03:40
Ah, but the number of vertices "in flight" is often orders of magnitude smaller than the number of pixels in flight, and pixels have more state than vertices, therefore, the die space taken should be proportionately less, and therefore, they should not have much difficultly increasing the size of their register file, fifos, caches, et al. The cost-benefit ratio is probably looks good relevant to investing transistors in other areas.

Chalnoth
22-Feb-2006, 03:52
That doesn't make any sense.

The current data for the vertex processing, i.e. all live registers as well as all vertex input data, must be held in a FIFO or cache.
Of course. But the same is true for pixel shaders. The only possible difference might be a difference in storage space, but that is easily handled. You'd just have all of this in-flight data stored in the queue position in the texture unit until the texture fetch is completed, after which the thread is sent right back to the vertex shader units, temporary storage information and all.

Mintmaster
22-Feb-2006, 10:42
Ah, but the number of vertices "in flight" is often orders of magnitude smaller than the number of pixels in flight,
Only because you don't have as many vertex processors and don't do texturing. For latency free texturing, you have no fewer vertices in flight than pixels when comparing equal processing power, i.e. # of VS = # of PS and pixel:vertices = 1:1. Since this is not the case (pixels:vertices >> 1), you use a scheme like Xenos where one group of 16 shader units would only have a small duty cycle to work on vertices.

Once you take into account time division of a more powerful unified processor versus fixed vertex processors, you don't save anything here.
and pixels have more state than vertices, therefore, the die space taken should be proportionately less, and therefore, they should not have much difficultly increasing the size of their register file, fifos, caches, et al. The cost-benefit ratio is probably looks good relevant to investing transistors in other areas.
I don't buy that pixels have more state.

Vertex data is loaded all at once to take advantage of burst access, and that's about 50 bytes on average, IIRC. As you finish with your input, you create intermediate values and finally output that needs to be stored, the latter of which can easily get over 100 bytes. Generally, pixels won't need to store near that much, depending on the tolerance you build in for register usage. I may be missing something, but I think storing registers, a primitive pointer to the post-transform cache, and interpolation factors (weights for each point in the primitive) are all you need per pixel, and maybe Z and screen position since they've already been calculated. The rest is per-batch state data, I think.

If it was so cheap, NVidia would have made VTF much faster already, and ATI wouldn't have skipped this feature. There's a lot of data to keep in flight for vertex processing, and the fact that there's no texturing keeps the vertex shader units compact.

Mintmaster
22-Feb-2006, 10:54
Of course. But the same is true for pixel shaders. The only possible difference might be a difference in storage space, but that is easily handled. You'd just have all of this in-flight data stored in the queue position in the texture unit until the texture fetch is completed, after which the thread is sent right back to the vertex shader units, temporary storage information and all.
I'd be very surprised if all the state data is sent to the texture unit, which is what you seem to be implying.

Chalnoth
22-Feb-2006, 13:34
I'd be very surprised if all the state data is sent to the texture unit, which is what you seem to be implying.
Well, it has to be stored someplace. Why not the texture unit's latency-hiding queue?

Xmas
22-Feb-2006, 13:46
Well, it has to be stored someplace. Why not the texture unit's latency-hiding queue?
Because it is much easier to have the data stay in one place (the temp register file) instead of moving it around all the time.


You'd just have all of this in-flight data stored in the queue position in the texture unit until the texture fetch is completed, after which the thread is sent right back to the vertex shader units, temporary storage information and all.
That doesn't help since as the vertices are coming back from the texture unit the vertex shader units would need enough free space to store all temporary data. And the only way the VS can guarantee to have this free space is not to process other vertices taking up that space in the meantime.

Chalnoth
22-Feb-2006, 18:25
Because it is much easier to have the data stay in one place (the temp register file) instead of moving it around all the time.
Sure. But the important point here is that you don't want to have to have the full temp register file to store all of the vertices required for latency hiding in the texture units. Thus you make use of the pixel shader's register file.

That doesn't help since as the vertices are coming back from the texture unit the vertex shader units would need enough free space to store all temporary data. And the only way the VS can guarantee to have this free space is not to process other vertices taking up that space in the meantime.
Ah, you're right, I was missing a little something. But you can still solve the problem by sharing with the pixel shader's register file.

Mintmaster
22-Feb-2006, 21:25
Sure. But the important point here is that you don't want to have to have the full temp register file to store all of the vertices required for latency hiding in the texture units. Thus you make use of the pixel shader's register file.
So now you're using the pixel shader's FIFO for storing the in flight vertex data?

You're not far from a unified shader architecture now, except your solution goes through the trouble of putting all the data in the same place (the hardest part of a USA), but still using different execution units.

Doesn't make much sense to me.

Chalnoth
22-Feb-2006, 21:51
I'm not seeing how it would be that difficult.

Anyway, I suppose it all depends upon how the register file is stored. The way I'm thinking of it there would be a "local" register file within the ALU's or TEX units, a "global" register file stored somewhere else, and a queue for each unit (just pointers to locations in the global register file). The local register file would only need to store those few registers that are read from during the execution of a few instructions, as well as a pointer to the requisite position in the global register file.

Then you have the queue for each unit. The queue in each unit is just a list of pointers to instructions that can be executed right now. Each clock, the unit in question takes the next instruction from the queue, loading the values it needs from the global register file, and placing the instruction in the pipeline (or instructions, in the case of multi-issue), with one limitation: if the small output buffer of the unit is not flushed, then the unit stops execution, not reading from the input queue.

With the above sort of design, I'm really not seeing much of any problem. That is to say, each execution unit still has full control over its own execution. It's just the location where the data is stored that changes.

DemoCoder
22-Feb-2006, 22:51
Only because you don't have as many vertex processors and don't do texturing. For latency free texturing

Also because there is no need. No forseeable workload at present pits a 1:1 vertex/pixel ratio. Most games are going to be PS limited, and most VS programs are short in comparison. Having 1:1 vertices in flight would be a waste. Vertices by their vary nature are batch data due to their decidedly lower frequency. A single vertex can generate hundreds of pixels.

I'm not sure "latency free" VTF is needed. Only "less latency". Unless your game is vertex limited, I don't see the benefit of striving for latency free. I could be convinced.


Once you take into account time division of a more powerful unified processor versus fixed vertex processors, you don't save anything here.


Unified processing is orthogonal to thread batching. No one's claiming non-unified "saves more" or is "more efficient", only that it is "not needed" to achieve competitive performance.



I think storing registers, a primitive pointer to the post-transform cache, and interpolation factors (weights for each point in the primitive) are all you need per pixel, and maybe Z and screen position since they've already been calculated. The rest is per-batch state data, I think.


You need storage for 32 registers max, a pointer for each input (16 possible), interpolator state, screen position, predicate, Z, loop (maybe), plus some way to identify which outstanding texture requests pertain to which fragment. If you can handle say, 10,000 pixels in which, even with batches of 16, that's 625 per-batch state, plus 10,000*per pixel state, vs a few dozen vertices in flight, if even that. Doubling or tripling the amount of state available for vertices is much lower cost than doubling pixels, and vertices don't need latency free VTF IMHO. My point is, the vastly lower frequency of vertex processing requires alot less concurrent contextual state.

I don't see how you can justify the idea that increasing the amount of vertex state is expensive vis-a-vis pixel shaders. Only if one held the few that every triangle was a sub-pixel in size, would this even begin to be a relevant point.



If it was so cheap, NVidia would have made VTF much faster already, and ATI wouldn't have skipped this feature. There's a lot of data to keep in flight for vertex processing, and the fact that there's no texturing keeps the vertex shader units compact.

False argument. There are many 3D features which have been comparably cheap in the past, but still left unimplemented until later chip revisions. IHVs have many reasons for leaving out features or enhancements which do not always have to do with die space. You should be well aware that in any given project, developers and engineers have a laundry list of enhancements, features, and changes they want to make, and not all of them make it into a product release, even if they are easy or relatively cheap to do, because other priorities exist, like time to market.

I don't buy the argument that IHVs add only what is absolutely the best course of action given die space. There are lots of features that have made it into graphics cards that were frankly hardly ever used and essentially wasted space, and not always because of performance deficits, but market-mismatch. (Npatches anyone?)

Ailuros
23-Feb-2006, 00:52
I don't buy the argument that IHVs add only what is absolutely the best course of action given die space. There are lots of features that have made it into graphics cards that were frankly hardly ever used and essentially wasted space, and not always because of performance deficits, but market-mismatch. (Npatches anyone?)

NV's HOS in NV20 would be another example. Do large IHVs though nowadays opt for such risks or do they rather implement what is absolutely necessary?

It seems to get even worse with D3D10.

Mintmaster
23-Feb-2006, 00:53
Also because there is no need. No forseeable workload at present pits a 1:1 vertex/pixel ratio.You're not understanding my point. The per-processor cost is what's important when comparing this solution to a USA, because the latter will only spend a small fraction of it's time on vertices when vertices to pixels is much less than 1:1.

Consider a fixed ratio of 7:1 (pixel to vertex) for the sake of not giving a USA any advantage. 42 PS & 6 VS for the traditional architecture, 48 units for the USA. For a 100 cycle texture latency, the former needs a 600 vert cache for fast VTF. The latter's 4800 pixel/vertex cache would be used for 12.5% of the time. In the end, the die cost is the same if you assume pixels and vertices need the same space for state data, so this aspect of the argument is irrelevant.

The advantage of the USA, of course, is when the load deviates one way or the other from 7:1. The traditional architecture will have both the cache and execution units of either the VS or PS idle. Right now, we don't have fast VTF, thus the register space is small in the VS and we don't care much if it sits idle.

Not sure why you're saying a USA is orthogonal to the issue. We're talking about pixels vs. vertices here, and why it makes sense to separate the processing units versus unifying them. Of course you could do fast VTF the easy way by simply lengthening the FIFO in your vertex pipeline.

I'm not sure "latency free" VTF is needed. Only "less latency". Unless your game is vertex limited, I don't see the benefit of striving for latency free. I could be convinced.
It depends on the technique. For simple displacement mapping, you can request your fetch, then transform your point and normal, and when you have the data just displace along the normal. 6 cycle latency should keep the pipeline from stalling.

Other uses of VTF are not so forgiving:
-You could move low frequency effects from the pixel shader to the vertex shader. One example: Store SH coefficients in a 3D texture to represent incoming light at any point in space, and then do per-vertex PRT. This saves a series of per-pixel 3D texture loads.
-Physics in the vertex shader could need multiple dependent accesses
-Techniques like this: Dynamic Ambient Occlusion and Indirect Lighting (http://download.developer.nvidia.com/developer/SDK/Individual_Samples/DEMOS/OpenGL/src/dynamic_amb_occ/docs/214_gems2_ch14.pdf) (not my favourite technique, but an example nonetheless)
-This caustics algorithm (http://www.beyond3d.com/forum/showthread.php?t=22417), and any other raytracing type things for vertices

There are plenty of other possibilities. Remember that we're just getting our feet wet in VTF right now. A few years ago I made a similar mistake of short-sightedness in not seeing the cost of dynamic branching (i.e. lower processing density) outweighting the benefits, especially with a stencil buffer there to serve us. Now I'm pretty sure I was wrong.

Of course, you could make the case for R2VB, but that's another debate.

You need storage for 32 registers max, a pointer for each input (16 possible), interpolator state, screen position, predicate, Z, loop (maybe), plus some way to identify which outstanding texture requests pertain to which fragment.
I guarantee you that no current GPU can give you latency free texturing (which is bandwidth efficient, naturally) when there are truly 32 live registers after compiler optimization. NV40/G70 start (gracefully) dropping in speed after using only a couple AFAIK. The facility for 32 registers is just there for flexibility. If you have lots of math or don't need texture results immediately then you don't need as many pixels in flight.

I don't see why you need a pointer for each input. One primitive pointer to the post transform cache is enough AFAICS, so I'd like a little more explanation please.

Screen position, loop, predicate, and request pointer are small potatoes (<10 bytes?). Per sample Z should be done afterwards using the primitive pointer, because there's no need to calculate it beforehand and store it. Top of the pipe Z-reject is per quad, as anything more detailed is pointless.

I guess we can't settle this without having HDL for modern processors, but IMHO pixel state information is at worst comparable to vertex state info. The latter can reach 172 bytes with 10 iterators and position, if I'm not mistaken, and right now there's no cost for using all the iterators.

I don't see how you can justify the idea that increasing the amount of vertex state is expensive vis-a-vis pixel shaders. Only if one held the few that every triangle was a sub-pixel in size, would this even begin to be a relevant point.
See the first point in this post.

False argument. There are many 3D features which have been comparably cheap in the past, but still left unimplemented until later chip revisions. IHVs have many reasons for leaving out features or enhancements which do not always have to do with die space.
...
I don't buy the argument that IHVs add only what is absolutely the best course of action given die space. There are lots of features that have made it into graphics cards that were frankly hardly ever used and essentially wasted space, and not always because of performance deficits, but market-mismatch. (Npatches anyone?)
Okay, fair enough. That point of mine is not very strong. I still think NVidia could have used this feature to their advantage if it was fast, and there would be more than just one game using the feature.

aaronspink
23-Feb-2006, 01:01
But did Kirk say this just because he was responding to a questions about ATI's unified shader in Xenos? Bearing in mind NVidia's apparent 180 degree turnaround as regards support for HDR+MSAA I do wonder how much PR there is in Kirk's interviews (in fact with any interviews from IHVs).

As a baseline you should assume that anyone from any company that is allowed to talk to the press is doing it solely for PR reasons, having been fully briefed and coached by PR, and available solely at the PR departments wishes unless continuously proven otherwise (which won't happen cause they'll be fired long before then).

Unless Nvidia is unlike everyother company out there (unlikely), someone in Kirk's position is primarily PR and Management and very little either day to day or mid to long term architecture.

Aaron Spink
speaking for myself inc.

Mintmaster
23-Feb-2006, 01:20
I'm not seeing how it would be that difficult.

Anyway, I suppose it all depends upon how the register file is stored. The way I'm thinking of it there would be a "local" register file within the ALU's or TEX units, a "global" register file stored somewhere else, and a queue for each unit (just pointers to locations in the global register file). The local register file would only need to store those few registers that are read from during the execution of a few instructions, as well as a pointer to the requisite position in the global register file.
Chalnoth, if you have full speed texturing then you need a queue to absorb the latency entirely. Once you have this queue, there's no need for any more, because all other instructions have less latency. Of course the texture unit must keep track of all the requests, but that's an independent system with relatively small storage requirements.

If you put this queue in the texture unit, then you couldn't hide extra latency (say from bandwidth restrictions or incoherent access) with additional ALU instructions, because your pixel processors can't access the data. Xmas knows what he's talking about, and it makes sense to keep the queue with the pixel processors.

I didn't say you're method is "difficult", I just said it'll need just about all the routing a USA needs, but you're not getting the load sharing benefits. You bring all your vertex data to a place where they can be operated on by the powerful and plentiful pixel shading units, but all they're allowed to do is load texture data.

Chalnoth
23-Feb-2006, 01:48
But you escape the problem of load balancing entirely this way, which is apparently what nVidia is worried about. And besides, there can be some benefits in going "half way" as it allows you to do better research on how to go all the way.

Regardless, it is fairly probable that nVidia won't change vertex texturing much for the next architecture. It's more likely that improvements to the pixel shader will be beneficial.

Mintmaster
23-Feb-2006, 02:17
But you escape the problem of load balancing entirely this way, which is apparently what nVidia is worried about.
Really? That's surprising.

Regardless, it is fairly probable that nVidia won't change vertex texturing much for the next architecture. It's more likely that improvements to the pixel shader will be beneficial.
This I'll agree with. My original comment in this thread was that keeping the VS and PS separate makes sense if you don't care about VTF. I don't think they really should care that much from a practical point of view (though a developer point of view is very different).

JF_Aidan_Pryde
25-Feb-2006, 05:57
As a baseline you should assume that anyone from any company that is allowed to talk to the press is doing it solely for PR reasons, having been fully briefed and coached by PR, and available solely at the PR departments wishes unless continuously proven otherwise (which won't happen cause they'll be fired long before then).

Unless Nvidia is unlike everyother company out there (unlikely), someone in Kirk's position is primarily PR and Management and very little either day to day or mid to long term architecture.

Aaron Spink
speaking for myself inc.
Er.. I think saying that he's just a PR prop and that he doesn't get involved with archiecture is a little extreme. You don't need to have a Phd at Caltech to do that.

Megadrive1988
12-Mar-2006, 21:46
from reading some of the posts about G80 and G90, the current thinking is something like:

G80 = NV50 (or a revised NV5X) has more decoupling going on but not a unifed shader architecture.

G90 = NV55 (or a revised NV5X refresh) still not a full USA

then, NV60 (call it G100 if you will) a full USA


p.s.

I hope that NV50~G80 has 12 Vertex Shader 4.0 units. having 10 VS would'n't be much of a leap.

Increasing by 2 Vertex Shaders each generation increases the raw geometry performance less and less (i.e. NV20 (1x VS) to NV25 (2x VS) was a big step but but going from 6 VS to 8 is not).

Megadrive1988
12-Mar-2006, 23:27
NV40 was introduced in spring 2004 and it's more than two years from that until G80 (aka a real new generation) arrives.


true, agreed. G71 is still NV4X, whereas G80 is NV5X, thus a real new generation.


G71 is still based on technology that came out in 2004, and thus, an architecture that was architected in the early part of this decade, well before NV30-GeForce FX came out.

Geo
18-Mar-2006, 08:39
This year should also see Nvidia produce prototype chips made using the latest manufacturing techniques that can etch circuitry only 65 nanometers wide, about 100 times thinner than a human hair and more than 25 percent smaller than those on current Nvidia products.


http://today.reuters.com/business/newsArticle.aspx?type=technology&storyID=nN15213119

"Prototype" by the end of the year, and they've already said G80 will be out before the end of the year. So I'm leaning towards sticking a fork in the possibility (slim anyway, in my estimation) that G80 is 65nm.

Chalnoth
18-Mar-2006, 08:55
A prototype might be produced 6-12 months before production, though.

DegustatoR
18-Mar-2006, 10:41
G80 will probably be on 90 or 80.

Megadrive1988
18-Mar-2006, 12:10
so.....

initial G80 on 90nm or 80nm later this year, then a 'G81' speedbump on 65nm in early 2007 ?

EasyRaider
18-Mar-2006, 15:12
I hope that NV50~G80 has 12 Vertex Shader 4.0 units. having 10 VS would'n't be much of a leap.

Increasing by 2 Vertex Shaders each generation increases the raw geometry performance less and less (i.e. NV20 (1x VS) to NV25 (2x VS) was a big step but but going from 6 VS to 8 is not).
Carmack stated that being able to cache results between passes (for shadow map rendering) would improve performance more than doubling the amount of vertex units. As such, I think 8 will be enough, at least if NV also reduces vertex texture latency by a lot.

Ailuros
19-Mar-2006, 00:21
Carmack stated that being able to cache results between passes (for shadow map rendering) would improve performance more than doubling the amount of vertex units. As such, I think 8 will be enough, at least if NV also reduces vertex texture latency by a lot.

D3D10 also requires geometry shading.

SugarCoat
19-Mar-2006, 03:30
so.....

initial G80 on 90nm or 80nm later this year, then a 'G81' speedbump on 65nm in early 2007 ?


i dont think we'll see any 65nm graphics cores until mid 07 at the earliest. ATI and Nvidia both seem content on riding through most of this year on 90nm let alone the half node. At 65nm things get very complex, same goes for all shrinks but the smaller the harder. TSMC doesnt even have a 65nm fab afaik yet. And major chip firm AMD has yet to make the leap as well and will be releasing its new platform chips continuing on 90nm through most of this year (just to point out its not a walk in the park). People seem to jump the gun too much when it comes to GPU fab size..

Razor1
19-Mar-2006, 03:53
Carmack stated that being able to cache results between passes (for shadow map rendering) would improve performance more than doubling the amount of vertex units. As such, I think 8 will be enough, at least if NV also reduces vertex texture latency by a lot.

Hmm possibly, but raw poly counts are soon going to be pretty much doubling, so the vertex shaders will still be important ;)

suryad
20-Mar-2006, 19:38
i dont think we'll see any 65nm graphics cores until mid 07 at the earliest. ATI and Nvidia both seem content on riding through most of this year on 90nm let alone the half node. At 65nm things get very complex, same goes for all shrinks but the smaller the harder. TSMC doesnt even have a 65nm fab afaik yet. And major chip firm AMD has yet to make the leap as well and will be releasing its new platform chips continuing on 90nm through most of this year (just to point out its not a walk in the park). People seem to jump the gun too much when it comes to GPU fab size..

I agree with you.

JoshMST
20-Mar-2006, 22:00
Hell, there won't be a 3rd party available 65 nm line until very late 2007, and most likely 2008. AMD is not even going to have 65 nm until late this year, and the hurdles with that process are bigger than 90 nm was. TSMC, UMC, and the rest won't have a usable 65 nm process for large parts for quite some time (though that may not be true for much smaller ASICs).

Geo
20-Mar-2006, 23:06
Hell, there won't be a 3rd party available 65 nm line until very late 2007, and most likely 2008. AMD is not even going to have 65 nm until late this year, and the hurdles with that process are bigger than 90 nm was. TSMC, UMC, and the rest won't have a usable 65 nm process for large parts for quite some time (though that may not be true for much smaller ASICs).

So, Josh, you don't believe Reuters? Or you think NV will have prototype 65nm chips by end of '06 and then sit on it for a year?

Jawed
20-Mar-2006, 23:10
To be fair, just because you can make a prototype doesn't mean you can pump out millions.

Supposedly TSMC has had prototype 65nm dies coming out for a while now:

http://www.electronicstalk.com/news/tsc/tsc101.html

Jawed

rwolf
21-Mar-2006, 00:54
http://www.us.design-reuse.com/news/news10229.html

SAN JOSE, Calif.--(BUSINESS WIRE) -- April 26, 2005 -- Taiwan Semiconductor Manufacturing Company (NYSE:TSM)(TSE:2330), unveiled its newest semiconductor manufacturing process today at a Technology Symposium attended by over 400 of the industry's leading IC companies. First wafers are expected in December 2005.



TSMC's first 65nm silicon was a fully functional SRAM that featured more than 100 million transistors and was validated in April 2004. Since then, some customers including Altera Corp. and others, have taped out and received functional prototypes of their own designs, including logic and memory, for initial validation and benchmarking. Engineers at multiple companies are designing to the process, and tapeouts of production devices are expected to reach TSMC in the second half of 2005.

http://www.emsnow.com/newsarchives/archivedetails.cfm?ID=10701

The order is TSMC's first for its 65-nm process service, and is expected to help boost the company's revenues next year. The foundry supplier will begin to deliver the chips in the first half of 2006.

With the order from Qualcomm, TSMC currently counts five customers using its 65-nm process, far more than the numbers recorded by United Microelectronics Corp. and Semiconductor Manufacturing International Corp. (SMIC) for 65-nm process orders. Altera, Freescale and Broadcom are among TSMC's 65-nm customers.

Qualcomm unveiled its first 65-nm chips for cellphones on Oct. 18, indicating an intention to take on rival Texas Instruments (TI) in the handset chip field.

EasyRaider
21-Mar-2006, 19:45
D3D10 also requires geometry shading.
So? The GS is separate, it shouldn't steal any resources from vertex shading, should it? Or do you mean that for geometry shading to be useful, far higher vertex counts will be needed?

EasyRaider
21-Mar-2006, 19:54
Hmm possibly, but raw poly counts are soon going to be pretty much doubling, so the vertex shaders will still be important ;)
Since when was VS performance important for anything but 3DMark score? I'm guessing you could double triangle counts in all of today's titles and still be pixel limited in most interesting cases.

Razor1
21-Mar-2006, 20:00
Since when was VS performance important for anything but 3DMark score? I'm guessing you could double triangle counts in all of today's titles and still be pixel limited in most interesting cases.

Sorry miss read thinking shadow volumes.

Ailuros
21-Mar-2006, 22:02
So? The GS is separate, it shouldn't steal any resources from vertex shading, should it? Or do you mean that for geometry shading to be useful, far higher vertex counts will be needed?

Why are you so sure that GS will be separate from VS on G80? Rather the opposite sounds way more likely to me and yes I'm also speculating.

EasyRaider
21-Mar-2006, 22:16
Why are you so sure that GS will be separate from VS on G80? Rather the opposite sounds way more likely to me and yes I'm also speculating.
I'm not sure, it was an assumption, and thinking about it, a badly founded one.

RobertR1
22-Mar-2006, 01:33
http://money.cnn.com/2006/03/21/news/companies/microsoft.reut/index.htm?cnn=yes

Would be a bit of waste for a DX10 card at some point this year if this is true.

Ailuros
22-Mar-2006, 07:26
http://money.cnn.com/2006/03/21/news/companies/microsoft.reut/index.htm?cnn=yes

Would be a bit of waste for a DX10 card at some point this year if this is true.

I'd frankly have D3D10 GPUs in the second half of this year, than another (Lord help) DX9.0 refresh of a refresh of a refresh of a refresh....

Geo
22-Mar-2006, 12:08
I'd frankly have D3D10 GPUs in the second half of this year, than another (Lord help) DX9.0 refresh of a refresh of a refresh of a refresh....

As long as the spec is nailed down, and they have betas to test with, and the D3D10 cards are faster at DX9 than the current crop of DX9 cards. . .then there really isn't a downside to going early it seems to me.

The caveat might be if waiting the extra time gives you a chance to go to a lower process.

The other caveat is of course competition --if G71 or R580 gains a decisive upper hand in the market then that adds pressure on the other company to shake things up sooner rather than later, while the top dog won't be feeling any particular pressure to mess with a winning hand sooner than necessary.

RobertR1
22-Mar-2006, 18:30
As long as the spec is nailed down, and they have betas to test with, and the D3D10 cards are faster at DX9 than the current crop of DX9 cards. . .then there really isn't a downside to going early it seems to me.

True. If you can get much better performance from a DX10 card on DX9 games, on windows XP, I'm all for it but I'm not in the mood to pay just for marked check boxes, esp if these card debut much before vista and will likely have a refresh coming up around the time that Vista is actually released and a handful of DX10 games are out.

trinibwoy
22-Mar-2006, 19:08
True. If you can get much better performance from a DX10 card on DX9 games, on windows XP, I'm all for it but I'm not in the mood to pay just for marked check boxes, esp if these card debut much before vista and will likely have a refresh coming up around the time that Vista is actually released and a handful of DX10 games are out.

Well there is definitely going to be another refresh cycle before Vista starts appearing in the mainstream so what would you have these companies do? They surely wouldn't have been planning any significant DX9 refreshes. Since a lot of the focus seems to be on efficiency in addition to the new DX10 functionality, upcoming designs should hopefully bring decent performance gains in DX9 as well. If we're lucky, maybe even some IQ improvements as icing on the cake.

nutball
23-Mar-2006, 08:31
Wouldn't it be in the interests of both IHVs to stretch out this current cycle as long as possible to reap maximum return from the R&D investment they've put in to the current generation GPUs?

It's not like either of them is in a catastrophically bad position -- ATI have the single-card performance crown (just!), NVIDIA are selling all the 79xx's they can build (:razz:) with presumably pretty good margins. And from what I read here ATI have some interesting mid-range parts coming along soon. What's the rush?

I realise it might not make the hardware geeks happy to see a period of "stagnation", (rather than getting a new architecture with yet another bunch of features that they won't be able to use in games for twelve months!), but then ATI and NV aren't really in business to make geeks happy, are they?!

_xxx_
23-Mar-2006, 08:40
What's the rush?

Having a better performing part than the competition + being first-to-market with new (or "new") features = money and shareholders all happy and nice, as well as OEM design wins and image bonus.

Chalnoth
23-Mar-2006, 10:44
Well, ATI's not really in a good position, because their parts are much larger in die area for the same performance, and thus likely they have smaller margins than nVidia right now.

And conversely, nVidia's got their own problems with a lack of MSAA with FP rendertargets and slower dynamic branching, problems that can only be completely solved with a new architecture. So they can't wait around too long, otherwise the software's going to really start showing off these weaknesses.

Dave Baumann
23-Mar-2006, 11:05
Well, ATI's not really in a good position, because their parts are much larger in die area for the same performance, and thus likely they have smaller margins than nVidia right now
This is actually only an "issue" if the revenues generated from the parts fall outside their margin model.

_xxx_
23-Mar-2006, 11:12
And conversely, nVidia's got their own problems with a lack of MSAA with FP rendertargets and slower dynamic branching, problems that can only be completely solved with a new architecture. So they can't wait around too long, otherwise the software's going to really start showing off these weaknesses.

These are only problems to us here at B3D and alike, which is less than 5% of the population.

Until that starts making any significant difference, we'll have (and need) the next gen stuff anyway. They can't wait too long, but more for the reason I gave above IMHO. Image and "leadership on paper" at least.

nAo
23-Mar-2006, 11:30
And conversely, nVidia's got their own problems with a lack of MSAA with FP rendertargets and slower dynamic branching, problems that can only be completely solved with a new architecture.
You really don't need any completely new architecture to 'fix' those problems, especially the first one.

trinibwoy
23-Mar-2006, 12:31
This is actually only an "issue" if the revenues generated from the parts fall outside their margin model.

Maybe they need a more aggressive model :wink:

Dave Baumann
23-Mar-2006, 12:37
As a consumer, why is that good for you?

Geo
23-Mar-2006, 13:39
This is actually only an "issue" if the revenues generated from the parts fall outside their margin model.

Which at the low end, IIRC, is lower by 11% (err, actually, if looked at as a percentage, it's lower by 24%) than NV's stated target. 34% vs 45%, I believe.

Tho ATI has also said the upper end of their range is 38% (I think --doing this from memory).

Edited: Because math are hard.

_xxx_
23-Mar-2006, 14:20
You need a new rumour calculator, geo... ;)

(45-34 = 11)

Geo
23-Mar-2006, 14:34
You need a new rumour calculator, geo... ;)

(45-34 = 11)

Doh! I plead caffeine deficiency! :wink:

Ailuros
23-Mar-2006, 14:39
You really don't need any completely new architecture to 'fix' those problems, especially the first one.

Theoretically IHVs wouldn't need any completely new architecture for a lot of "fixes". IMHO they just set priorities according to the resources (time and transistor budgets) they have for each refresh and the according targets of course.

After reading behind the G71 performance per Watt reminders, I don't think there was much headroom for minor or major changes after all.

Sunrise
23-Mar-2006, 16:10
Wouldn't it be in the interests of both IHVs to stretch out this current cycle as long as possible to reap maximum return from the R&D investment they've put in to the current generation GPUs?Well, it would always be in each companies' best interest to lengthen those cycles as much as they possibly can, problem is that competition doesn´t always let them or they are not only both behind in terms of performance per watt (like ATi is now) in serveral segments, but also their business model (or margin model) needs to have several cores taped-out for several segments to meet it and to be successful (market share, revenues).

That alone wouldn´t be any problem for them, if they could put out cores in time that bring them back enough revenues "to fight" that disadvantage. Problem is their disadvantage in performance per watt limits their "playing field" so much that they also always need the most advanced manufacturing (process technology) first/early, which in itself is very bad, because they have to rely on them to execute their plans.

ATi is (for several years now) in a position where their own business model is kinda "lackluster" in comparison to NV, which is also stark architecture related, obviously. They always need "loads of different cores" for each segment, while NV likes to play the "fewer cores, more segments" strategy and they execute them without any "major" problems (delays don´t have to be a problem per se), because they have more performance per watt and per die space.

Thus, ATi can´t just "sit there and twiddle their thumbs", they have to work hard to meet their business model, which doesn´t allow them to wait for Vista to arrive. This obviously is just a vast simplification and each new generation could change that, but "waiting" is no option for ATi (not at this time).

It's not like either of them is in a catastrophically bad position -- ATI have the single-card performance crown (just!), NVIDIA are selling all the 79xx's they can build (:razz:) with presumably pretty good margins. And from what I read here ATI have some interesting mid-range parts coming along soon. What's the rush?NV is already selling G73, while ATi still needs months to have RV560/RV570 production ready. Yes, they have 1800GTO, but that´s not an option for the future. So while ATi is hard at work, NV earns money, gains market share and can switch their main ressources to other projects. It´s not catastrophic by any means, but it could be better than that, obviously.

I realise it might not make the hardware geeks happy to see a period of "stagnation", (rather than getting a new architecture with yet another bunch of features that they won't be able to use in games for twelve months!), but then ATI and NV aren't really in business to make geeks happy, are they?!Exactly. They are in business to make their pockets happy. Other things are just walking aside of this and geeks are such a small percentage of the market that being the performance leader alone doesn´t really fulfill this goal.

trinibwoy
23-Mar-2006, 16:13
As a consumer, why is that good for you?

Talk about changing the subject. A company's margin profile never has anything to do with benefits to the consumer. The original post (of this tangent) was referring to ATi's welfare, not the consumer's.

Dave Baumann
23-Mar-2006, 16:32
Talk about changing the subject. A company's margin profile never has anything to do with benefits to the consumer. The original post (of this tangent) was referring to ATi's welfare, not the consumer's.
But as consumers we have no insights to this at the moment.

At present we have no idea as the the actual effects of transistioning to the X1000 series. We will get a little more understanding next week, but that still won't give us any indications as to the how things stand given the latest competetive environment, and we can't begin to get some insight until another 3 months after that. By this time a number of other things may have changed to alter those conditions.

Mariner
23-Mar-2006, 17:53
I'd even go as far as saying higher margins aren't necessarily the be all and end all of a product. After all, if your competitor's margins for a part are 50% better than yours but you have 90% of the market sales, who's happier? :wink:

Not that anyone does hold 90% of the market, obviously!

trinibwoy
23-Mar-2006, 18:21
But as consumers we have no insights to this at the moment.

At present we have no idea as the the actual effects of transistioning to the X1000 series. We will get a little more understanding next week, but that still won't give us any indications as to the how things stand given the latest competetive environment, and we can't begin to get some insight until another 3 months after that. By this time a number of other things may have changed to alter those conditions.

Well that isn't exactly the angle I was taking. You had mentioned that ATi's margins being lower than Nvidia's wasn't an issue as long as their revenues were consistent with ATi's margin model. My response to that was, maybe their model was too conservative.

Exactly why it's conservative? No idea, outside of what we already know of market share in each segment and product positioning at the moment. I'm going to lean on the optimistic side, and read your statement to mean "They're out of the rough and it's full speed ahead with a more bullish outlook from ATi in the future".

trinibwoy
23-Mar-2006, 18:23
I'd even go as far as saying higher margins aren't necessarily the be all and end all of a product. After all, if your competitor's margins for a part are 50% better than yours but you have 90% of the market sales, who's happier? :wink:

Not that anyone does hold 90% of the market, obviously!

Well in that situation, there isn't really a "competitor", is there :smile:

Mariner
23-Mar-2006, 18:55
Well in that situation, there isn't really a "competitor", is there :smile:

Hey, I never said anything about them being a good competitor! :wink:

Dave Baumann
23-Mar-2006, 20:02
I'm going to lean on the optimistic side, and read your statement to mean "They're out of the rough and it's full speed ahead with a more bullish outlook from ATi in the future".
Thats your perogative, but there was no implication as to outlook there, merely a statement of the fact that we have little insight for now or the future.

kemosabe
23-Mar-2006, 21:09
Well right now a more bullish outlook doesn't seem to be consistent with at least one analyst's latest (http://finance.messages.yahoo.com/bbs?.mm=FN&action=m&board=15969433&tid=atyt&sid=15969433&mid=169453) ATYT (http://finance.messages.yahoo.com/bbs?.mm=FN&action=m&board=15969433&tid=atyt&sid=15969433&mid=169454) report (http://finance.messages.yahoo.com/bbs?.mm=FN&action=m&board=15969433&tid=atyt&sid=15969433&mid=169455). Looks like further market share losses in both desktop and notebook might be forthcoming, and ATYT investors aren't seeing much of a light at the end of the tunnel. That missed product cycle has been disastrous. :sad:

trinibwoy
23-Mar-2006, 21:21
Thats your perogative, but there was no implication as to outlook there, merely a statement of the fact that we have little insight for now or the future.

Okie dokes. But things are gonna swing one way or the other - i just picked one the more uplifting one :)

Geo
04-Apr-2006, 14:12
http://www.theinquirer.net/?article=30745

Fudo expecting G80 by "end of summer". Fudo really ought to listen to the CCs.

nAo
04-Apr-2006, 14:13
http://www.theinquirer.net/?article=30745

Fudo expecting G80 by "end of summer". Fudo really ought to listen to the CCs.
IF he says so..it has to be trueee ;)

_xxx_
04-Apr-2006, 15:01
http://www.theinquirer.net/?article=30745

Fudo expecting G80 by "end of summer". Fudo really ought to listen to the CCs.

LOL @ R580+. CJ's april joke slides making rounds... :lol:

Dave Baumann
06-Apr-2006, 10:24
Wrong thread for this discussion tack.

dst
10-Apr-2006, 13:47
http://www.cooltechzone.com/Special_Reports/Insider_Series/NVIDIA_G80_Delayed_200604092276/

Jawed
10-Apr-2006, 14:04
:lol: that should wind up a few peeps: "G80 is going to be mostly G70/G71 with DirectX 10.0 stapled on."

Doesn't jibe with Jen's B$, but then again, does anything?

Jawed

Geo
10-Apr-2006, 14:17
Another "priced cheaper" hint, I see. Not that that turned out true with G71 anyway.

If Xbit is to be believed, R6x0 has moved back too. What's not as clear is if NV's hint towards November-ish was pre or post this most recent delay.

satein
10-Apr-2006, 14:35
And also they posts ATI R600 Details Revealed too...
ATI R600 Details Revealed (http://www.cooltechzone.com/Special_Reports/Insider_Series/ATI_R600_Details_Revealed_200604102280/)

ATi R600 is 64 pixel/vertex shaders on 80/65nm process and is expected to be somewhere around Christmas this year.

pjbliverpool
10-Apr-2006, 15:05
And also they posts ATI R600 Details Revealed too...
ATI R600 Details Revealed (http://www.cooltechzone.com/Special_Reports/Insider_Series/ATI_R600_Details_Revealed_200604102280/)

ATi R600 is 64 pixel/vertex shaders on 80/65nm process and is expected to be somewhere around Christmas this year.

Hmm, only 64 unified shaders? Unless they are a big step up from those currently employed in Xenos, colour me unimpressed.

Regarding the G80, this is also dissapointing news. It looks like nvidia is pulling an ATI with the R300, really flogging the architecture to death.

Still, lets say 32 pipes/10 vertex shaders, 650-700Mhz clock speed with fast GDDR4 and DX10 functionality bolted on. It could still be a great card.

trinibwoy
10-Apr-2006, 15:16
:lol: that should wind up a few peeps: "G80 is going to be mostly G70/G71 with DirectX 10.0 stapled on."

Sure winds me up :) Not only would it be exceedingly lame from an enthusiast standpoint, but then you'd have to ask - what exactly has Nvidia been working on since NV40? I'm betting "mostly G70" was translated from "not unified".

Geo
10-Apr-2006, 15:29
Sure winds me up :) Not only would it be exceedingly lame from an enthusiast standpoint, but then you'd have to ask - what exactly has Nvidia been working on since NV40? I'm betting "mostly G70" was translated from "not unified".

Well, as my current sig would suggest, I'm not buying it either.

Tho there seems to be a certain tension in some messages of late from the usual wiseguys. Observations that given the market realities that these DX10 parts need to be DX9 monsters performance-wise much more so than DX10 monsters.

I suppose there's some truth in that, but there's also some truth in the idea that IHVs hate to waste work. I'd think any major investment would be with an eye towards scalability and evolution rather than one-off thinking. So it seems to me whether it is performant as we might like or not is an entirely different question from how much work went into it in the first place for 1st gen parts.

Jawed
10-Apr-2006, 15:31
I sorta doubt you can merely "bolt-on" DX10, as such - in other words I expect G80 to be more radical than implied there. Not necessarily anything amazing: features first, performance second as it were.

Jawed

DegustatoR
10-Apr-2006, 15:35
I'm betting "mostly G70" was translated from "not unified".
I'm betting they've took it all out of thier a** ;)

trumphsiao
10-Apr-2006, 16:01
I'm betting they've took it all out of thier a** ;)


G80's Pixel Shader ALUs configuration is MIMD:grin:

SugarCoat
10-Apr-2006, 18:34
Sure winds me up :) Not only would it be exceedingly lame from an enthusiast standpoint, but then you'd have to ask - what exactly has Nvidia been working on since NV40? I'm betting "mostly G70" was translated from "not unified".


the term is terrible. i would be utterly shocked, and i mean shocked, if didnt improve or change every aspect of their current architecture. Both of those read like an inq article. I dont think they know what the hell they're talking about, and have simply decided to come to their own conclusions based on what is already known. The Nvidia comments go against common sense and the etiquette Nvidia has established over the last 2 years. They'd look like fools. And the ATI one; they said their ATI "source" was questioning GDDR3 or GDDR4, when its well known ATI wants GDDR4 due to their investment in the memory controller. Second problem is they refered to it only as the "R600" which quite possibly is no longer a codename for a core, and it may never have been.

Chalnoth
10-Apr-2006, 18:51
I think the statement may also be a misunderstanding of what is meant by, "G7x with DX10 stapled on."

Basically, nVidia isn't going to throw out the work they've done in previous years with regards to the basic architecture. Here I'm talking about things like texture units (cache especially), triangle setup, memory controllers, and now shader units.

What they are going to do is take their current architecture and ask, "How can we make this better?"

So sure, it will be a G70 with DX10 tacked on, but no moreso than the GeForce3 was a GeForce2 with DX8 tacked on, or the GeForce FX was a GeForce4 with DX9 tacked on, or the GeForce 6x00 was a GeForce FX with SM3 tacked on. With each of these architectural changes, we can see the heritage of what came before, combined with lots of the new.

In other words, just because you can expect that nVidia will leverage the advantages of their existing architecture in designing this new one doesn't mean you can't expect significant changes.

JoshMST
10-Apr-2006, 18:54
Haha, from what little I know about DX10 functionality, it would still be hard to "staple" that functionality on.

Probably from about 10,000 feet the chip architecture looks the same, basically that it will probably have 16 ROPS, 24 shader pipelines, and 8 to 10 vertex shaders. Of course, once you get down to the nitty gritty you will see that probably a lot of register work has gone on, the ALU's in each shader pipe will be definitely different, and it will probably have some kind of scheduler in there that will "mimic" unified pipelines to the OS all the while sending the correct commands and data to the separate pixel and vertex shaders.

Chalnoth
10-Apr-2006, 20:11
I don't think you need to mimic a unified architecture at all with DX10. There still are vertex and pixel shaders, after all.

Razor1
10-Apr-2006, 21:38
I sorta doubt you can merely "bolt-on" DX10, as such - in other words I expect G80 to be more radical than implied there. Not necessarily anything amazing: features first, performance second as it were.

Jawed

Well first thing thier mini ALU's go out the window so thats going to be the first major change.



Hmm that pun kinda fits :lol:

Chalnoth
10-Apr-2006, 21:55
I doubt the mini ALU's are going anywhere.

Geo
11-Apr-2006, 23:13
I don't think you need to mimic a unified architecture at all with DX10. There still are vertex and pixel shaders, after all.

Chal, if we assume they are on the way to unification anyway, and you have a major investment to make for Vista just anyway you look at it. . .then wouldn't doing Vista drivers for a non-uni imply some significant. . .err, loss of an opportunity. . .to leverage that (software) work forward into v2 of your Vista parts? In other words, its one-off work. I'm not saying the IHVs never do one-off work, but they hate it, particularly in significant chunks.

Now, maybe they don't have a choice. Maybe they don't like the risk/reward of changing too much at once. But I'm at G80 pretty much where I was with R520 (and Xenos, for that matter) this time last year. . .whatever they did, they must have been looking to maximize their leverage as best they could. The difference being, that this time the added mountain of Vista drivers is on the pile too. ATI didn't have that (or at least with the same degree of urgency) re R520 development.

p.s. 5k. Ahhhh.

Jawed
11-Apr-2006, 23:47
I get the feeling features of the first iteration of D3D10 have been cut-back - so NVidia's "one-time" drivers won't be so extensive.

Jawed

SugarCoat
12-Apr-2006, 00:58
that would complicate things quite a bit i would think. What you are suggesting would be an annual DX upgrade then with a refresh of features. How would that not play total hell with chip manufacturers let alone game devs. Thought the idea was to pack as many features in as possible, not cut them back. Needless to say, DX9 SM2.0 has been the primary shader for the majority of titles for the last what, almost 5 years? News of something like that would not be met with any form of understanding. The majority of early adopters will undoubtbly be gamers, i know i'd be pretty upset to learn, that i just got vista and all this new hardware and MS promises to have the Full DX10 update next year. I can rebuy another 500-1200 dollar video card setup to get support for the full thing then. If anything Direct X should be something that doesnt get into the habbit of changing too often as it has been.


I can see it now though. Finally arrived, Revolutionary graphics by ATI and Nvidia, fully vista compliant, a whole new gaming experiance, almost full support for DIRECTX 10; but we swear we'll get there some day!

Ailuros
12-Apr-2006, 06:43
G80's Pixel Shader ALUs configuration is MIMD:grin:

Just quoted again since some might have overlooked it.

Here's an extraction of a NV presentation that I saw at the 3DCenter forums (no link provided):

Can output tris from GS to different slice
- Possibly not writing to all slices
- Adds extra VS/GS operations
Regular MRT writes to all MRTs
- Fixed B/W usage
- But lower GS/VS ops

By the way can someone kindly clarify what Microsoft will call it after all, since I constantly see reminders at 3DCenter that it's going to be named D3D10?

Ailuros
12-Apr-2006, 06:53
Chal, if we assume they are on the way to unification anyway, and you have a major investment to make for Vista just anyway you look at it. . .then wouldn't doing Vista drivers for a non-uni imply some significant. . .err, loss of an opportunity. . .to leverage that (software) work forward into v2 of your Vista parts? In other words, its one-off work. I'm not saying the IHVs never do one-off work, but they hate it, particularly in significant chunks.

Now, maybe they don't have a choice. Maybe they don't like the risk/reward of changing too much at once. But I'm at G80 pretty much where I was with R520 (and Xenos, for that matter) this time last year. . .whatever they did, they must have been looking to maximize their leverage as best they could. The difference being, that this time the added mountain of Vista drivers is on the pile too. ATI didn't have that (or at least with the same degree of urgency) re R520 development.

p.s. 5k. Ahhhh.

From the educated speculations I've been reading into so far, G80 sounds like a somewhat "weird" architecture. You can't call it obviously a USC due to pixel and geometry/vertex shading coming from separate units, but with all the other possibilities of functionalities coming from the same units a closer description at this point (yes highly speculative from my behalf) I wouldn't hesitate to call it some sort of "hybrid"-whatever.

This all can make sense if all tidbits are true, or utter nonsense. At this point PS ALUs being MIMD makes sense, assuming that GS/VS come from the same unit also (see former post/quote, wherever anything relevant appears GS/VS are mentioned close together) and some essential question marks remain concerning texture samplers and ROPs (or entire lack thereof).

By the way I'm just a layman but those tidbits about the GS quoted above do not sound to me like the ultimate GS sollution.

Jawed
12-Apr-2006, 09:04
Better get used to the idea Sugarcoat - D3D10+1 is a real concept. Dunno what the timing is.

Jawed

Geo
12-Apr-2006, 12:59
Regular MRT writes to all MRTs
- Fixed B/W usage
- But lower GS/VS ops

"Fixed B/W usage"? Some more words on that, please? What are they pointing at there?

trumphsiao
12-Apr-2006, 13:22
"Fixed B/W usage"? Some more words on that, please? What are they pointing at there?

"require less texture Bandwidth than other DX9 GPUs " ???

Geo
12-Apr-2006, 13:44
"require less texture Bandwidth than other DX9 GPUs " ???

Okay, that's a start. :smile:

This is as a result of "Regular MRT writes to all MRTs"? If so, why? And what does it tell us about what they did to the arch under the covers? New mem controller, for instance? Or something else?

Xmas
12-Apr-2006, 14:30
"Fixed B/W usage"? Some more words on that, please? What are they pointing at there?
To put that into context
http://developer.nvidia.com/object/dx10-instancing-gdc-2006

It's a comparison between texture (render target) arrays and traditional MRTs. I think what they're comparing here is a use case where you need to write to only a subset of the slices/MRTs, based on a dynamic condition. Seems a bit far-fetched to me, but I really can't make much sense out of it.

When the GS creates a triangle, it can add a render target index so that triangle will only be rendered to a certain render target array slice. So if you want to render a triangle of the same size/position to slices 1, 3 and 4, you output 3 triangles, do the setup 3 times, and run a pixel shader that outputs one color value. Framebuffer bandwidth use is proportional to the number of slices you render to.

However, with MRTs you only have a single triangle and you can output data to multiple buffers at once. The limitation, at least in DX9, is that if the shader possibly writes to render target N, it also has to write to all previous targets. And there is no alpha test in D3D10 (you can discard a pixel in the shader instead), so if you want to write to target 1, 3 and 4 but not 2, you would have to use blending and set alpha to 0 in the color for target 2.

trumphsiao
12-Apr-2006, 15:19
The one thing I cant plumb is How to compare DX9 and DX10 GPU ALU in between ?

Ailuros
13-Apr-2006, 06:02
Better get used to the idea Sugarcoat - D3D10+1 is a real concept. Dunno what the timing is.

Jawed

I can't get used to the idea because I don't like it so far, meaning I quite agree with most of what Sugarcoat said. I'm just hoping there won't be a +2 also, since it'll get even more ridiculous then.

JHoxley
13-Apr-2006, 15:46
DX9 SM2.0 has been the primary shader for the majority of titles for the last what, almost 5 years?DX9 didn't drop till November 2002 if my memory serves correctly, so things like SM2 have only really been the primary path for a couple of years at most.

Better get used to the idea Sugarcoat - D3D10+1 is a real concept. Dunno what the timing is.I can't get used to the idea because I don't like it so far, meaning I quite agree with most of what Sugarcoat said. I'm just hoping there won't be a +2 also, since it'll get even more ridiculous then.The PDC slides make mention of "10+1" and "10+2". Note that it seems deliberately vague that 10+1 is not necessarily 11 - they seem to have left the door open for 10+1 being a 10.1, 10a or whatever...

Then again, that particular PDC slide is with reference to DXGI - which is a whole different story.

And I don't think it'll hurt the developers (hardware or software) as much as you might think. Microsoft don't develop the API inside an air-tight box with no windows :wink:

Jack

trinibwoy
13-Apr-2006, 18:10
DX9 didn't drop till November 2002 if my memory serves correctly, so things like SM2 have only really been the primary path for a couple of years at most.

Yep, and the first serious DX9 title (Far Cry) was released in early 2004 and DX9 certainly wasn't the "primary" path at that time.

Chalnoth
13-Apr-2006, 18:19
Chal, if we assume they are on the way to unification anyway, and you have a major investment to make for Vista just anyway you look at it. . .then wouldn't doing Vista drivers for a non-uni imply some significant. . .err, loss of an opportunity. . .to leverage that (software) work forward into v2 of your Vista parts? In other words, its one-off work. I'm not saying the IHVs never do one-off work, but they hate it, particularly in significant chunks.
Why? The only thing that makes unified shaders more desirable for DX10 is that the instruction set is now identical between pixel and vertex shaders. That's about it. The choice of whether or not to go unified is purely a question of die size required for the performance boost. Is it worth the die size increase?

As just a purely fake and hypothetical thought experiment, consider if a particular architecture would require twice the die area for the same pixel shader throughput if it were to be a unified architecture. It's much more likely that an architecture with twice the pixel and vertex shaders, to get the die area back to the same as the unified architecture, would be higher-performing.

If, on the other hand, the unified architecture only required 5-10% more die area, it would most likely be a win in performance.

So, the questions one needs to ask are, how much overhead is required? Can we just build an architecture with more pipelines instead that ends up being faster in the end?

SugarCoat
13-Apr-2006, 19:08
DX9 didn't drop till November 2002 if my memory serves correctly, so things like SM2 have only really been the primary path for a couple of years at most.

The PDC slides make mention of "10+1" and "10+2". Note that it seems deliberately vague that 10+1 is not necessarily 11 - they seem to have left the door open for 10+1 being a 10.1, 10a or whatever...

Then again, that particular PDC slide is with reference to DXGI - which is a whole different story.

And I don't think it'll hurt the developers (hardware or software) as much as you might think. Microsoft don't develop the API inside an air-tight box with no windows :wink:

Jack


I could of swore they said they didnt want any more sub catagories with half ass support like DX9x with SM2.0x because it caused too many problems with random support through-out games and in hardware.


I'm trying to remember the first abundance of SM2.0 based titles and i cant. But i'll take your word for it. All the same, 2-3 years is quite awhile. I just dont want to see DirectX enter this format of a yearly upgrade program. Hardware is fine, you got the speed increases, but you at least had the common basic functions for awhile, i am afraid that would strip that security. Will be interesting to see how this plays out.

Geo
13-Apr-2006, 19:14
I could of swore they said they didnt want any more sub catagories with half ass support like DX9x with SM2.0x because it caused too many problems with random support through-out games and in hardware.


Well, but this is a whole different thing, you see. You can tell because "." and "+" don't look anything at all alike! This new one is clearly sanitation engineers, not garbage men. . .

Razor1
13-Apr-2006, 19:51
I doubt the mini ALU's are going anywhere.

Hmm what would the use of the mini ALU's be if there is no more half precision? Well it might be there, but didn't MS say only full percision will be allowed in Dx 10?

trinibwoy
13-Apr-2006, 20:06
Hmm what would the use of the mini ALU's be if there is no more half precision? Well it might be there, but didn't MS say only full percision will be allowed in Dx 10?

Why do you think the mini-ALU's are solely responsible for half-precision support?

Chalnoth
13-Apr-2006, 20:12
Hmm what would the use of the mini ALU's be if there is no more half precision? Well it might be there, but didn't MS say only full percision will be allowed in Dx 10?
The mini ALU's are full precision. There are only two parts of the NV4x architecture that benefit from partial precision:
1. Decreased register pressure (the units available can be more active).
2. Free FP16 normalization (this is a separate functional unit).

A mini ALU is basically a unit to deal with simple operations, like swizzling, moves, etc. so that you don't eat up available math cycles in performing such operations. I don't see any reason to get rid of these entirely, no matter the pipeline architecture.

Razor1
14-Apr-2006, 13:04
The mini ALU's are full precision. There are only two parts of the NV4x architecture that benefit from partial precision:
1. Decreased register pressure (the units available can be more active).
2. Free FP16 normalization (this is a separate functional unit).

A mini ALU is basically a unit to deal with simple operations, like swizzling, moves, etc. so that you don't eat up available math cycles in performing such operations. I don't see any reason to get rid of these entirely, no matter the pipeline architecture.

Why do you think the mini-ALU's are solely responsible for half-precision support?

That is true, but wouldn't it be more advantageous to make them full ALU's so die size won't increase too much while boosting overall shader performance. I see from where it wouldn't be advantageous if they were all full ALU's because you will not have all the ALU's being used for complex tasks so there is still a need for them.

_xxx_
14-Apr-2006, 14:19
That is true, but wouldn't it be more advantageous to make them full ALU's so die size won't increase too much while boosting overall shader performance. I see from where it wouldn't be advantageous if they were all full ALU's because you will not have all the ALU's being used for complex tasks so there is still a need for them.

That would require more work in other areas of the chip as well, so it's hard to tell how much die size that would cost.

BByte
14-Apr-2006, 17:26
I could of swore they said they didnt want any more sub catagories with half ass support like DX9x with SM2.0x because it caused too many problems with random support through-out games and in hardware.

I think the point (or at least a part of it) was to get rid of jumps like 2.0 => 2.x, but not 2.0 => 3.0.

Chalnoth
16-Apr-2006, 05:31
I am looking at the 1900XTX at #6, the 7900GTX and Gt are nowhere to be found on tiger's top sellers. The author of the article claims the GTX is on top of the 1900's on those charts, which I have just found to be false.
Since that list is of specific board products, it isn't indicative of the relative sales of chips. The "top seller" list that techreport looked at may have been late on a day where they had just gotten a bunch of BFG 7900's in.

Mintmaster
16-Apr-2006, 05:54
Basically, nVidia isn't going to throw out the work they've done in previous years with regards to the basic architecture. Here I'm talking about things like texture units (cache especially), triangle setup, memory controllers, and now shader units.This I agree with, as NVidia have a very transistor efficient architecture. But the rest...

So sure, it will be a G70 with DX10 tacked on, but no moreso than the GeForce3 was a GeForce2 with DX8 tacked on, or the GeForce FX was a GeForce4 with DX9 tacked on, or the GeForce 6x00 was a GeForce FX with SM3 tacked on. With each of these architectural changes, we can see the heritage of what came before, combined with lots of the new.
I don't agree at all.

Geforce3 was an enormous step from Geforce2. First you have the major memory controller change. With the same bandwidth and fillrate it almost doubled the Geforce2's performance. Then you have the complete overhaul of the rasterization. MSAA and double/quadruple speed Z test (don't remember) can't be just tacked on. You have dependent texture reads as well, which requires a very substantial pipeline change. Then, of course, there's the pixel and vertex shaders. AF was another new feature, and who knows what else happened internally. There was very little kept from the previous gen except maybe part of the register-combiner structure of the pixel shader.

I don't see how you can say NV40 was just GeForceFX with SM3.0 tacked on either. The shader pipeline changes were enormous, the performance difference even bigger. I don't even know where to start here for specifics.

For these two cases, I'd say it was much more a case of taking a few modules out of previous designs and inserting them rather than building upon a previous design. Just because you've learned lessons from the prev gen doesn't mean you're restricted to merely tacking onto it. Some cases can be classified as tacking on, like GF3->GF4 or R3xx->R4xx, but definately not all.

Chalnoth
16-Apr-2006, 06:05
For these two cases, I'd say it was much more a case of taking a few modules out of previous designs and inserting them rather than building upon a previous design. Just because you've learned lessons from the prev gen doesn't mean you're restricted to merely tacking onto it. Some cases can be classified as tacking on, like GF3->GF4 or R3xx->R4xx, but definately not all.
That's kind of what I was saying. We should expect similar changes with nVidia's first DX10 architecture. For example, it will require significant changes to the memory controller to have good performance with MSAA on FP16 rendertargets, which we know is coming.

Rys
16-Apr-2006, 13:18
Cleaned this thread up a bit (deleted some detritus and moved some of the decent discussion here (http://www.beyond3d.com/forum/showthread.php?t=29968), and there's another thread for it in Boards and Drivers too). Let's keep this one for G80 rumours.

Tahir2
16-Apr-2006, 21:58
...or the GeForce 6x00 was a GeForce FX with SM3 tacked on.

You do not give NVIDIA credit where it is due. The Geforce FX range (5th series) needed a complete overhaul. Geforce 6 series bears very little resemblance to Geforce FX.

Ailuros
16-Apr-2006, 22:23
Flop or not, IHVs usually don't develop a design from ground up. There are always elements/philosophies from former generations present.

MulciberXP
17-Apr-2006, 00:12
You do not give NVIDIA credit where it is due. The Geforce FX range (5th series) needed a complete overhaul. Geforce 6 series bears very little resemblance to Geforce FX.

I think he clarified that 2 posts ago...

satein
18-Apr-2006, 19:49
Just in case anyone interest, PcWatch japan posts an article with Nvdia David D. Kirk about DX10 support for NV hardware.
"It does not hurry the reformation of architecture too much," NVIDIA (http://pc.watch.impress.co.jp/docs/2006/0419/kaigai262.htm) for japanese page (head line is from translation),
translation by babelfish (http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=ja_en&trurl=http://pc.watch.impress.co.jp/docs/2006/0419/kaigai262.htm)

Reading from the translation, Nv hardware may support DX10 with Independent Sheder (like that of G71), and it is no need for US if the cost of implemention is too high. Kirk also mentioned that the performance of G71/square mm (die/di size) is high very, comparison to that of Xbox 360 GPU (Xenos) performance/square mm is less! (<== so does he try to hint that RSX will be more powerful than Xenos?)

Edit: Correction some typo...

no-X
18-Apr-2006, 20:23
Performance per square mm? It doesn't make sense in this case. G71 doesn't contain northbridge nor 10MB of EDRAM...

Voltron
18-Apr-2006, 21:03
im thinking nvidia probably has the tools to take edram and northbridge variables into account. i would imagine they are acutely aware of xeno's performance characteristics. perhaps that is why they have been making very positive statements about G80 so far in advance of release.

Dave Baumann
18-Apr-2006, 22:01
Or a representative is making some pithy marketing comments.

Ailuros
18-Apr-2006, 22:20
PC and console GPUs are so fundamentally different usually, that they don't even qualify for a comparison.

If that comparison would be between Xenos and RSX, then I don't see any clear advantage in terms of performance/mm^2 for the latter.

compres
18-Apr-2006, 22:23
Or a representative is making some pithy marketing comments.

My thoughts exactly. However, have we any way of disproving him? Performance per mm^2 on which workload anyway?

one
19-Apr-2006, 04:14
Just in case anyone interest, PcWatch japan posts an article with Nvdia David D. Kirk about DX10 support for NV hardware.I posted all comments by David Kirk in the article into a thread in Console Talk, please take a look at it.

http://www.beyond3d.com/forum/showthread.php?t=30014

DemoCoder
19-Apr-2006, 06:29
Well, even Mint has been conceding that NVidia does very well in terms of transistor budget efficiency compared to ATI, although Mint attributes it to dynamic branching performance and not US.

Chalnoth
19-Apr-2006, 07:00
Well, good dynamic branching performance and good unified shader performance require very similar things.

Mintmaster
19-Apr-2006, 13:30
Well, even Mint has been conceding that NVidia does very well in terms of transistor budget efficiency compared to ATI, although Mint attributes it to dynamic branching performance and not US.
That was for R520, for which US isn't a candidate anyway.

Xenos is tough to say. 235M transistors seems pretty efficient for what it's capable of, but we don't have direct access to the hardware to see how comparable it is to other architectures. I have no idea how well the Vec4+scalar system works, how efficient the extra 16 point samplers are, etc.

Personally, I don't think it's as much of a technology advantage from NVidia as it is a case of misplaced priorities on ATI's part. What's the rush for DB, especially when you know XB360 can be the driving force for getting that feature into games?

So far, it looks like this won't hurt ATI as much as I though it would. Instead, NVidia is reaping record margins. They must be making a killing on the 7600GT.

Mintmaster
19-Apr-2006, 13:36
Well, good dynamic branching performance and good unified shader performance require very similar things.
I agree. I think unified shaders only makes sense (i.e. nets you more perf per sq mm) once you throw in dynamic branching in the PS and VTF in the VS. High geometry perf also gives it an edge, but we don't see that much on the PC side.

If your design goals don't include fast VTF or DB, it's probably not worth it.

Geo
19-Apr-2006, 13:44
Well, good dynamic branching performance and good unified shader performance require very similar things.

Interesting point. We haven't actually seen R620 yet (I'm just tired of 'R6x0' --ATI reps feel free to post up and tell me I'm wrong. :wink: ), so can't say for sure yet what parts and pieces are coming from R520, what from Xenos, and what is brand new.

I'm sure they'd have been much happier (as would we all this side of Jen-Hsun) if R520 had been on time and maybe increased dev use of dynamic branching this generation. . .but it is also possible that even knowing they were a little too far ahead of the market on DB that if the R520 implementation is largely what is going into R620, that they considered that a reasonable investment/cost now to incrementalize their R6 development.

After all, we've been hearing for years that Vista/DX10 is the giant inflection point.

Chalnoth
19-Apr-2006, 18:44
I had thought that ATI's next-gen PC product is actually going to be called the R600 (with the R420 and R520 being so-named because the R400 and R500 were unified products that were cancelled).

Geo
19-Apr-2006, 19:38
I had thought that ATI's next-gen PC product is actually going to be called the R600 (with the R420 and R520 being so-named because the R400 and R500 were unified products that were cancelled).

So did I, Chal. But there's way too much "R6x0" floating around from the wiseguys. I sense a disturbance in the force. :wink:

Geo
11-Jun-2006, 01:24
http://www.dailytech.com/article.aspx?newsid=2785


During the show we saw some of our first NVIDIA slides with G80 information. It's much too early to speculate about specifics of G80, but several vendors who have been reliable in the past have told me that G80 is a dual-core GPU with 48 pixels-per-cycle per core. Given that the 90nm G71 is based on a design with a mere 24 pixels-per-cycle core, I am having a little bit of trouble swallowing this tall order. The NVIDIA roadmap we've seen stretches into Q4'06, and does not have G80 on the map yet. That's not to say G80 is not a 2006 component, but it's not on the roadmap yet as such.



Well, 48 per core would actually be 96 on a dual-core. . .

48 I can believe, 96 I'm finding a little more difficult this side of a GX2 kind of configuration.

sonyps35
11-Jun-2006, 07:29
Wow, I really have been wondering, imo ATI should have been smoked this round due to a 32 pipe G70. Nvidia just dropped the ball on that.

Basically I'll be surprised if ATI keeps up next round. Nvidia wont be stupid twice. ATI continues to make just baffling design decisions. I dont know what they are thinking, but it frustrates me greatly. This isn't rocket science guys. Make it go faster. In real games.

I'm just wondering if G80 wont absolutely destroy ATI's next part.

BByte
11-Jun-2006, 07:33
48 I can believe, 96 I'm finding a little more difficult this side of a GX2 kind of configuration.

Well isn’t GX2 exactly how you’d do dual-core for GPUs? Though maybe you could fit it on one PCB.

sonyps35
11-Jun-2006, 07:35
Nvidia is smart.

I bet it's exactly a dual core mainstream solution.

Wow.

SugarCoat
11-Jun-2006, 08:44
Wow, I really have been wondering, imo ATI should have been smoked this round due to a 32 pipe G70. Nvidia just dropped the ball on that.

Basically I'll be surprised if ATI keeps up next round. Nvidia wont be stupid twice. ATI continues to make just baffling design decisions. I dont know what they are thinking, but it frustrates me greatly. This isn't rocket science guys. Make it go faster. In real games.

I'm just wondering if G80 wont absolutely destroy ATI's next part.


What in the heck are you talking about? What bad or baffling design decisions?

Ailuros
11-Jun-2006, 09:07
Wow, I really have been wondering, imo ATI should have been smoked this round due to a 32 pipe G70. Nvidia just dropped the ball on that.

Both IHVs plan their strategy according to a sales synergy logic. How is NVIDIA's mobile GPU market share moving lately? That was according to my understanding one of the primary reasons why G71 ended up as we know it today.

In any case where's the guarantee that an 8 quad G7x would had been any faster then today's G71? I have severe doubts it would had exceeded the 550MHz mark and if you sit down and think over it you'll see that they not only managed to yield similar performance with a frequency increase, it came with a way smaller transistor budget, it consumes way less power per single GPU and finally seems to allow dual-core GX2's at reasonable prices and power consumption.

Now try the same with a hypothetical 8 quad part.

Basically I'll be surprised if ATI keeps up next round. Nvidia wont be stupid twice. ATI continues to make just baffling design decisions. I dont know what they are thinking, but it frustrates me greatly. This isn't rocket science guys. Make it go faster. In real games.

I'm just wondering if G80 wont absolutely destroy ATI's next part.

When was either G80 or R6x0 set in stone (in a relative sense) according to your understanding and what kind of changes could have been made ever since?

I personally am not that sure yet that either/or fore mentioned GPU will make a huge performance difference in today's games compared to current GPUs. Just take a deeper look at the GX2 configuration and unit amount and tell me how much headroom there really can be after all.

dizietsma
12-Jun-2006, 07:14
Two 6 quad parts on 80nm "glued" together seems a simple and cheap way to get the performance ..plus the architecture improvements along with it. nvidia seem very risk averse nowadays so I think it is a possible not too far from the mark.

_xxx_
12-Jun-2006, 09:31
http://www.dailytech.com/article.aspx?newsid=2785



Well, 48 per core would actually be 96 on a dual-core. . .

48 I can believe, 96 I'm finding a little more difficult this side of a GX2 kind of configuration.

Don't we have 48 ALUs in the G71 already?

Also, who says that it won't be a GX2 kind of config? ;)

SugarCoat
12-Jun-2006, 10:05
Don't we have 48 ALUs in the G71 already?

Also, who says that it won't be a GX2 kind of config? ;)


Me. Its a serious downfall of the GX2, so much so that i dont think they would ever place hopes of their primary flagship solution on requiring SLI profiles. The problem being that if the card doesnt have a profile performance may be cut by a very large percentage. Now for the dozen or so popular titles that get released a year, they may be on the ball with profile releases and driver updates on launch day. But for everything else, you expect them to have a new profile out on day one, especially the discrete titles? Negates any bonus that type of setup the GX2 has unless you like waiting for all your performance. Not to mention the annoyance of having to get a new driver everytime a game is released just for the profile. And depending on user created profiles would be pretty dumb for a $600+ card dont you think? It wont happen in my opinion. Not unless they can make some huge innovation to the way SLI works (something that removes the need for profiles all together) which i think we would of heard about if it was the case anytime soon.

_xxx_
12-Jun-2006, 10:16
Well it could be possible that they found a solution for the whole profile juggling. Noone said that the chips will work exactly the same as the current stuff, right?

Nick
12-Jun-2006, 10:27
It wont happen in my opinion. Not unless they can make some huge innovation to the way SLI works (something that removes the need for profiles all together) which i think we would of heard about if it was the case anytime soon.
That doesn't seem totally unlikely. All they need to do is figure out how to go from doubling the number of 'pipelines' to doubling the number of cores, with almost the same performance scaling. With cores close enough (or even on the same package) they could have massive bandwidth and low latency between them to balance rendering tasks/threads. Totally different from SLI, with all memory shared. That doesn't exclude true SLI between different boards of course...

The main advantage would be increased yields, allowing to reduce the price for a mighty amount of transistors.

DegustatoR
12-Jun-2006, 11:35
There is no point in doing dual core GPUs. Dual chip cards maybe but not dual cores.

_xxx_
12-Jun-2006, 11:48
There is no point in doing dual core GPUs. Dual chip cards maybe but not dual cores.

We're talking about two dies packaged together, not dual-core on one die like Athlon X2.

Acert93
12-Jun-2006, 11:55
Dual core would be nice if they could share the same memory space, communicate effeciently (ie nearly a 2x performance boost in every area across the board), and avoided the profile issue and worked right out of the box. But that seems to be asking a lot. But if this was possible we could see some changes in the market. e.g.

Tier 1 - 1 Core
Tier 2 - 2 Cores
Tier 3 - 3 Cores
Tier 4 - 4 Cores

This could leverage production of chips and possibly reduce costs in development and in manufacturing. Of course there would need to be speed binning... anyhow, my idea is probably silly.

Kaotik
12-Jun-2006, 12:47
This could leverage production of chips and possibly reduce costs in development and in manufacturing. Of course there would need to be speed binning... anyhow, my idea is probably silly.
First I'll assume you mean GPU's, not cores, after all 1 gpu can be seen as multicore anyway.

Either way, silly or not, I'm sure Mr. Burns & co would love that :lol:

Nick
12-Jun-2006, 13:05
This could leverage production of chips and possibly reduce costs in development and in manufacturing. Of course there would need to be speed binning... anyhow, my idea is probably silly.
No, it makes a lot of sense. It doesn't only improve yields but also speed binning. It's easier to have two smaller dies that can run at say 600 MHz than have one double sized die that can run at 600 MHz.

That would give them a whole lot of possibilities, with just one chip. Variation in number of cores (dies) and clock frequency. They could even release cards for every market segment at the same time... And when transitioning to 65 nm (or smaller) they can keep the same design. It's a big R&D investment but would pay itself back double if succesful.

trumphsiao
12-Jun-2006, 13:17
No, it makes a lot of sense. It doesn't only improve yields but also speed binning. It's easier to have two smaller dies that can run at say 600 MHz than have one double sized die that can run at 600 MHz.

That would give them a whole lot of possibilities, with just one chip. Variation in number of cores (dies) and clock frequency. They could even release cards for every market segment at the same time... And when transitioning to 65 nm (or smaller) they can keep the same design. It's a big R&D investment but would pay itself back double if succesful.

Why not just tape out 2 chips either Mainstram or Low-End scaled over entire product line for lasting 18 months.I dont believe High End category contain 3 chips is plausible.also you need to mediate and revolve around mammoth of cost issue such as PCB Layout / density and Package/Pad limit .

_xxx_
12-Jun-2006, 14:12
I'd go with just one chip as a min. unit, think of it as one "pipe". These can be combined into bigger arrays and thus give you the horsepower you need. Kinda 100% modular design on the package level. I know it's a pipe dream, but that would be the neatest solution.

Chalnoth
12-Jun-2006, 19:40
No, it makes a lot of sense. It doesn't only improve yields but also speed binning. It's easier to have two smaller dies that can run at say 600 MHz than have one double sized die that can run at 600 MHz.

That would give them a whole lot of possibilities, with just one chip. Variation in number of cores (dies) and clock frequency. They could even release cards for every market segment at the same time... And when transitioning to 65 nm (or smaller) they can keep the same design. It's a big R&D investment but would pay itself back double if succesful.
There's no point in doing this without widening the memory bus, though, and I don't think that's going to happen. Now that both IHV's have fairly mature dual-GPU setups working, there's basically zero reason to develop a dual-core setup.

_xxx_
12-Jun-2006, 20:06
There's no point in doing this without widening the memory bus, though, and I don't think that's going to happen.

Why would they need it? They'll just bond the chips directly in the package, use some fancy logic to share one memory pool and retain the current connections to the world outside the chip. Doable, but I don't know how viable.

Chalnoth
12-Jun-2006, 20:23
Why would they need it? They'll just bond the chips directly in the package, use some fancy logic to share one memory pool and retain the current connections to the world outside the chip. Doable, but I don't know how viable.
Because there's no point to doing it without widening the memory bus. Chips are currently doing fine in yields at a size that is large enough to saturate a 256-bit bus.

I don't buy for an instant that yields are really going to be the limitation on die size in the future anyway. It's mostly going to be power consumption that will be a concern, and going for a split die isn't going to help that one bit.

Nick
12-Jun-2006, 21:49
Because there's no point to doing it without widening the memory bus. Chips are currently doing fine in yields at a size that is large enough to saturate a 256-bit bus.
GDDR4?

Seriously, memory bandwidth is a problem on its own. But the GPU can spend its transistors on bigger chaches and better prefetch prediction and such. Also, since shaders are already starting to use more arithmetic instructions than texture sampling instructions, I believe we need bigger chips with high yields and high clock frequencies to do the processing required by next-generation games. Also think about bigger texture filter kernels...

DegustatoR
12-Jun-2006, 22:03
We're talking about two dies packaged together, not dual-core on one die like Athlon X2.
Why would you want to package two dies together? From the performance point of view it's better to go with one die of the same size but with twice the number of pipelines. From the heat and power point of view it's smarter to have two dies with separate power modules and separate cooling systems. As i've said, there's no point in doing dual core GPU. It's better to do more parallelism in one core or if one core is becoming too hot / too expensive to produce it's better to go with multichip / multicard setup. By going multicore in one package you're getting no benefits at all.

Voltron
12-Jun-2006, 22:03
Some G80 details from a JP Morgan investor conference on 5/24. It is ~ 500 million transistors and will be done when its done, but they are shooting for Septemberish.

tEd
12-Jun-2006, 23:35
wow 500million :shock:

EasyRaider
12-Jun-2006, 23:43
wow 500million :shock:
Seems like the usual progression to me. Much less would be surprising.

Xmas
12-Jun-2006, 23:50
By going multicore in one package you're getting no benefits at all.
You get better yields than with one big die, and you get much higher inter-die bandwidth than with multi-chip/card configurations, which is the reason the latter are so inefficient.

Geo
13-Jun-2006, 02:41
Some G80 details from a JP Morgan investor conference on 5/24. It is ~ 500 million transistors and will be done when its done, but they are shooting for Septemberish.

Do you have any more details on this? Is there a recording of this available? Who presented for NV, and were they the ones to use the 500m number?

Voltron
13-Jun-2006, 04:26
Do you have any more details on this? Is there a recording of this available? Who presented for NV, and were they the ones to use the 500m number?

http://phx.corporate-ir.net/phoenix.zhtml?c=116466&p=irol-EventDetails&EventId=1322060

Not sure if its still up or if that link works. Michael Hara presented, I think. There was another conference a week later, which I could have confused, but I'm pretty sure it was said at this one. As was pointed out, 500 mill is not surprising considering history.

3dcgi
13-Jun-2006, 04:50
Why would you want to package two dies together? From the performance point of view it's better to go with one die of the same size but with twice the number of pipelines. From the heat and power point of view it's smarter to have two dies with separate power modules and separate cooling systems. As i've said, there's no point in doing dual core GPU. It's better to do more parallelism in one core or if one core is becoming too hot / too expensive to produce it's better to go with multichip / multicard setup. By going multicore in one package you're getting no benefits at all.
A dual core GPU gives the same benefit as a dual core CPU. It can run two programs simultaneously. Maybe it would be useful for Vista's UI. Although, I like the idea of one big chip better as the performance will scale in all situations.

DegustatoR
13-Jun-2006, 07:21
You get better yields than with one big die, and you get much higher inter-die bandwidth than with multi-chip/card configurations, which is the reason the latter are so inefficient.
You can get better yields right now by disabling quads and other parts of the chip. It's even better than multicore b/c it's more flexible. And you get much much higher bandwidth if you go with one big core instead of two cores in one package.

A dual core GPU gives the same benefit as a dual core CPU. It can run two programs simultaneously. Maybe it would be useful for Vista's UI. Although, I like the idea of one big chip better as the performance will scale in all situations.
You don't need to have two GPU cores to run two programs simultaneously. I'm not even sure that it's better to have two cores for that.

AlexV
13-Jun-2006, 07:58
How big can a core get befor it`s too big? Although I don`t buy the dual-die in single package idea(yet, as it may be a tad risky and I`m not so sure NV will do something risky with their first DX10 part), it may be a good direction to go to in the future...i`m not sure I want a single huge chip heating the polar ice caps in my rig, that needs it`s own power plant, thank you:)

The GX2 shows that there is an elegant solution for the short-term. The profile aspect is still an impediment, but they can simply go MAXX way and force AFR in everything(that`s a simplistic view on things, as improvements are bound to come from both players in terms of multi-gpu rendering techniques). Although that may hurt performance in a few apps, I assume that most devs are trying to code as multi-gpu friendly as possible, and that will help alleviate most inconvenients. All IMHO

_xxx_
13-Jun-2006, 09:24
Because there's no point to doing it without widening the memory bus. Chips are currently doing fine in yields at a size that is large enough to saturate a 256-bit bus.

I don't buy for an instant that yields are really going to be the limitation on die size in the future anyway. It's mostly going to be power consumption that will be a concern, and going for a split die isn't going to help that one bit.

I'm not saying that the yields will be a limit, just that using more smaller modular parts would make it all much cheaper. You'll get much higher yields with smaller parts, since the distribution of failures in silicon is non-linear. It won't lower the power consumption or bring any other benefits in that regard, but it could allow for more scalable parts.

EDIT: also, since you'd be able to directly interconnect the chips, you'd definitely have HUGE bandwidth between the chips without any external routing on the board or through some bus.

_xxx_
13-Jun-2006, 09:29
You can get better yields right now by disabling quads and other parts of the chip.

Yes, but you throw away lots of silicon that way.

DegustatoR
13-Jun-2006, 10:00
Yes, but you throw away lots of silicon that way.
If you have 1 working core in a package of 2 you'll throw away even more silicon.

_xxx_
13-Jun-2006, 10:04
If you have 1 working core in a package of 2 you'll throw away even more silicon.

No, you'll put only the working cores into the package. The faulty ones are sorted out prior to that. And since the individual chunks are smaller than a huge chip like R580, you throw away less AND also have higher yields.

By disabling stuff, you may end up using just half of the chip and selling it for half the price as a lower SKU which is definitely more of a loss. All those X1800GTO and such are actually losses for ATI, for example.

Arun
13-Jun-2006, 10:12
_xxx_, I disagree. Reconsider what you just said and do the maths. The margins with redundancy and partial disabling remain much higher.
The only advantage of putting 2 chips on a package for GPUs is to reduce one-time R&D costs and tape-out costs etc. - and considering the volume, except for the ultra-high-end, it just doesn't make sense. Sorry.

Uttar

_xxx_
13-Jun-2006, 10:21
_xxx_, I disagree. Reconsider what you just said and do the maths. The margins with redundancy and partial disabling remain much higher.
The only advantage of putting 2 chips on a package for GPUs is to reduce one-time R&D costs and tape-out costs etc. - and considering the volume, except for the ultra-high-end, it just doesn't make sense. Sorry.

Uttar

Again, This wasn't about two chips but the hypotetical modular design where you'd use many little cores instead of a big one. And as always, I'm just guessing without real numbers to support it (and where from, noone did this before anyway).

Also, think of how many whole chips are thrown away because they can't be used even partially.

LeStoffer
13-Jun-2006, 15:02
First tape out done on G80. Could be ready in September, if all goes well:

http://www.theinquirer.net/?article=32385

nAo
13-Jun-2006, 15:04
All we could confirm at this time is that the chip will be DirectX 10 compliant of course but it won't have the full implementation of Shader Model 4.0. It won't do the unified Shader but it will be Vista ready. Most of the chips out today are Vista ready at least the high end ones. it would be nice if Fudo had started to understand what is writing about since many years..

Geo
13-Jun-2006, 15:36
Rest of summary to follow in Industry, but thot I'd throw this one out now. . .

Q: When's your next gen part? A: "Well, you know GPUs are getting so large and so complex the timing of it is almost one of those things that the timing of it is 'it comes out when it comes out'. . .as much as you try to plan it for a certain event it really is on a schedule that says 'when it's done it's done'. . .now the reason why that works, at least for the high-end. . .the first high-end gpu that comes out. . .is because it is going to get purchased by enthusiasts who really don't care if it is Christmas, the middle of summer, or spring. If it has a discernible advantage over the last gpu, they'll buy it. So our schedule right now on the next generation GeForce [Hmm, that's the first time we've heard them confirm it is still 'GeForce'?] is going to be second half, and the objective is to hit it for 'back to school'. But fundamentally we're really just targeting second half. Which means that the current GeForce 7 family, with the exception of the high-end, is really the family you're going to see in the back to school cycle. [So what I got out of that: if next spin goes well, they hit 'back to school' for a high-end part. If not, it'll be later. And no full family simultaneous launch.] But this one is. . .we kind of describe it inside the company as. . . 'this is probably the biggest architectural change in the company's history from one generation to the next' [Hmm!] I'll give you a little bit of insight. . .this device is going to be over half a billion transistors large. It will, without doubt, be the most complex device being built in the semi-conductor business today. So the current schedule right now is to have it in the second half. And we'll do typical, which is a 'hard launch', which means we'll launch it when it's actually available in retail.


http://www.beyond3d.com/forum/showthread.php?p=775737#post775737

Jawed
13-Jun-2006, 16:43
So, what's that, NVidia's now boasting about how it's going to have the BIGGEST GPU evar!

Jawed

nAo
13-Jun-2006, 16:45
So, what's that, NVidia's now boasting about how it's going to have the BIGGEST GPU evar!

most complex != biggest

Geo
13-Jun-2006, 16:49
most complex != biggest

No, but 500M+ on 80nm. . .are you expecting that to come in under 352mm2? Or do you have breaking news on 65nm for us? :wink:

Jawed
13-Jun-2006, 16:49
If he didn't say those two things in the same breath you might have a point Marco.

Jawed

nAo
13-Jun-2006, 16:51
You're off track guys, I was just pointing out that size != complexity, that's it.

trumphsiao
13-Jun-2006, 18:09
No, but 500M+ on 80nm. . .are you expecting that to come in under 352mm2? Or do you have breaking news on 65nm for us? :wink:


I can wager R600 on 65nm TSMC process.

Geo
13-Jun-2006, 18:21
I can wager R600 on 65nm TSMC process.

Yeah? And you expect available for sale when?

trumphsiao
13-Jun-2006, 18:31
Yeah? And you expect available for sale when?

2006 Q1/Q2 or even longer (Q3)

ATI get fair price by charge of per Die base , while TSMC can practice their premature process.compared to NV fully utilise capacity of either CSM or IBM facility.

Xmas
13-Jun-2006, 19:28
You can get better yields right now by disabling quads and other parts of the chip. It's even better than multicore b/c it's more flexible.
How is it more flexible? Yes, redundancy can help improve yields. But you can disable quads in a multi-die configuration as well.

And you get much much higher bandwidth if you go with one big core instead of two cores in one package.
I don't see how. You can surely have enough inter-die bandwidth do make each die able to access the other's memory.

SugarCoat
13-Jun-2006, 20:59
So Nvidia may actually overtake the R600 in transistor amount. (Was actually guessing the R600 to be at about 450ish). Little bit surprising if i may say so myself. I would be interested to the costs per die once these cards launch since one of Kirks comments was that Unified parts cost significantly more due to the complexity of Unified architecture.

Dont forget Nvidia already has a "single card" with 556 Transistors for the core(s). :twisted:

Really hope ATI doesnt scramble for clocks again like the R520 though. Personally i'm not a fan of relying on your available clock speed to determine your overall performance lead.

Also interesting that theinq thinks ATI will respond to the NV50 (yea i'm still calling it that until the official word is otherwise!!!) with a R580 return on GDDR4. Even the fact that they think they could get close to compete with the same core with new clocks surprises me a little bit. I dont see how they could feasibly push it more then 750-800MHz before they look like jackasses in power consumption and yields. Though i personally wouldnt care since i have my own power plant. Just hope they dont take a page from Nvidias book with the barely there 7800GTX 512 though i fear that may come true.

Geo
14-Jun-2006, 01:15
Okay, all you swinging speculator types. . . what feature rabbits might NV pull out of their hat for G80? Better AF and HDR+AA are on the "duh!" list (whether the AF is fully up to ATI's current HQ AF standard or not --or exceeds it?-- is a slightly different question).

I'm thinking something a little more "Oooh, where did that come from?" like Transparency AA.

Still love to see the market move up to single-cycle 4x AA as min for high-end parts, and thus increase the max as well, but I've been saying that for awhile now.

Megadrive1988
14-Jun-2006, 02:58
http://www.megaupload.com/?d=ZRUFCLGL

here's the segment from the Nvidia conference about the next-gen GPU (the G80) which they do not mention by name, but the Nvidia guy did say "over half a billion transistors large". he also says it's the biggest architectural change in the company's history...I remember hearing that about either NV30 or NV40, also.


sorry about the ~45 second wait to start to download the file. the upper righthand side is where you will click to download, for those of you not familar with megaupload.com




okay, so, not counting the NV47 / G70 refresh, which was ~300M transistors,
the NV40 was 222M transistors, so NV40 to G80/NV50 is a transistor leap that is more than 2x (it's ~2.25x)
in a ~2.5 year timeframe (mid 2004 to late 2006).


edit: I completely missed geo's post on page 9 of this thread. :opps

oh well, now you have the text and the audio

3dcgi
14-Jun-2006, 03:04
You don't need to have two GPU cores to run two programs simultaneously. I'm not even sure that it's better to have two cores for that.
I don't know of a GPU that can execute two 3D apps at the same time. I don't mean time slicing, but truly running the apps simultaneously. I'm not sure if this will ever be important, but who knows.

Chalnoth
14-Jun-2006, 03:58
I don't know of a GPU that can execute two 3D apps at the same time. I don't mean time slicing, but truly running the apps simultaneously. I'm not sure if this will ever be important, but who knows.
If you're efficient at the time slicing, there's no reason ever to run two 3D apps simultaneously.

Chalnoth
14-Jun-2006, 04:03
Okay, all you swinging speculator types. . . what feature rabbits might NV pull out of their hat for G80? Better AF and HDR+AA are on the "duh!" list (whether the AF is fully up to ATI's current HQ AF standard or not --or exceeds it?-- is a slightly different question).
Well, the only thing I can think of, though it's on the software side instead of the hardware side, would be automatic driver updates, similar to Windows' automatic update functionality. Can you imagine how many problems that could solve in terms of game tech support?

Bob
14-Jun-2006, 04:07
If you're efficient at the time slicing, there's no reason ever to run two 3D apps simultaneously.
The problem with time slicing is that it pretty much requires you to flush and refill the pipeline to switch contexts (either that, or save and restore huge amounts of state). Although this is fine for CPUs which have really short pipelines and < 1 KB of state, it's less so on GPUs that have exceedingly long pipelines running hundreds if not thousands of threads simultaniously.

You don't context switch all CPUs at the same time in an 8-way SMP system. Why should the GPU (which has many more than 8 threads running) be any different? In a Unified Architecture, you should just be able to switch one shader array at a time, or even one ALU at a time, not the whole chip. Or perhaps even allocate different shader arrays to different contexts.

Treating a massively multithreaded machine the same as a 1-thread machine seems rather shortsighted (or grossly inefficient), in my opinion.

Chalnoth
14-Jun-2006, 04:14
Except that in GPU's, you have a perfect time at which to switch between processes: the buffer swap. There's really not much reason to switch between two processes many times during the rendering of a frame.

Geo
14-Jun-2006, 04:43
Well, the only thing I can think of, though it's on the software side instead of the hardware side, would be automatic driver updates, similar to Windows' automatic update functionality. Can you imagine how many problems that could solve in terms of game tech support?

Now there's an interesting idea. I wonder tho if MS own WHQL gets in the way a bit there tho, so far as being able to patch in small increments.

Driver versioning support would be handy too, come to think of it.

DegustatoR
14-Jun-2006, 04:44
500M do seems like a little bit to much for 80nm. Hmm... Maybe they've decided to go with 65nm? Maybe that's why G80 was pushed back from 1H06 to 2H06+?..

rwolf
14-Jun-2006, 04:49
No, it makes a lot of sense. It doesn't only improve yields but also speed binning. It's easier to have two smaller dies that can run at say 600 MHz than have one double sized die that can run at 600 MHz.

That would give them a whole lot of possibilities, with just one chip. Variation in number of cores (dies) and clock frequency. They could even release cards for every market segment at the same time... And when transitioning to 65 nm (or smaller) they can keep the same design. It's a big R&D investment but would pay itself back double if succesful.

But you need all sorts of circuitry to sync the dies together and to interface to memory and between chips. Interfaces are bottlenecks and you would waste space on pads for pinouts.

trumphsiao
14-Jun-2006, 04:59
500M do seems like a little bit to much for 80nm. Hmm... Maybe they've decided to go with 65nm? Maybe that's why G80 was pushed back from 1H06 to 2H06+?..


G80(500MT+) Chip in 90nm CSM process will be less than G70 die Size.

3dcgi
14-Jun-2006, 05:29
If you're efficient at the time slicing, there's no reason ever to run two 3D apps simultaneously.
Bob addressed technical aspects of this, but I'll address it in another way. Why wouldn't you say the same thing about CPUs? We know there are many reasons to run multiple apps in parallel. At some point someone might want to do this with GPUs. The question will be are the inefficiencies when running a single app and the duplicated logic worth the effort. I agree that right now there is no reason for this, but I never say never as I'm usually wrong when I do.

psurge
14-Jun-2006, 05:32
The problem with time slicing is that it pretty much requires you to flush and refill the pipeline to switch contexts (either that, or save and restore huge amounts of state). Although this is fine for CPUs which have really short pipelines and < 1 KB of state, it's less so on GPUs that have exceedingly long pipelines running hundreds if not thousands of threads simultaniously.

You don't context switch all CPUs at the same time in an 8-way SMP system. Why should the GPU (which has many more than 8 threads running) be any different? In a Unified Architecture, you should just be able to switch one shader array at a time, or even one ALU at a time, not the whole chip. Or perhaps even allocate different shader arrays to different contexts.

Treating a massively multithreaded machine the same as a 1-thread machine seems rather shortsighted (or grossly inefficient), in my opinion.

Bob - what about overlapping GPU pipeline flush of an old context with GPU pipeline fill from a new one? The front end (command processor I guess?) of GPUs would only accept commands from one "source" at a time, but inside a multi-threaded shader array, individual threads might belong to different contexts (in practice, one of 2).

Reason I ask is that switching even a single shader-array (with say 32KB register file) seems quite expensive (with an uncontended 256 bit bus, up to 2000 cycles just to read/write the contents of the register file alone). Also, if you're switching one shader array at a time, how would you deal with the FIFOs feeding all of the fixed function units (e.g. ROPs, setup/rasterizer)?

Chalnoth
14-Jun-2006, 05:37
Bob addressed technical aspects of this, but I'll address it in another way. Why wouldn't you say the same thing about CPUs? We know there are many reasons to run multiple apps in parallel. At some point someone might want to do this with GPUs. The question will be are the inefficiencies when running a single app and the duplicated logic worth the effort. I agree that right now there is no reason for this, but I never say never as I'm usually wrong when I do.
Sure, but the difference is that with CPU's, you don't have this obvious time at which to switch between applications, but fortunately you have short pipelines, so it's not a big deal, and you just switch every millisecond or so and all is well.

With GPU's, you have this obvious time to switch, when the application requests a buffer swap. So you render one app's frame, then another, then another, etc. The only limitation is that the framerates of your various applications are in lock-step. This method has zero loss of efficiency over just having one application have exclusive control of the video card, except that it does require more video memory.

So there's no real reason to have a second GPU to do the processing of application #2 simultaneously with that for #1, because there is no efficiency to be gained (just more video memory to work with). The only thing to be "gained" is that the framerate won't be in lock-step between applications, but I contend that this is actually worse: if the framerate is set to be the same between apps, then the more demanding app will naturally take up more processing power.

Bob
14-Jun-2006, 05:46
Bob - what about overlapping GPU pipeline flush of an old context with GPU pipeline fill from a new one?
Athough this works for a simple architecture, like NV20, it tends to work less well when you have looping with arbitrarily long loops, or when you have a unified architecture (scheduling VS threads from the next context takes up room used to process PS threads from the previous one). It also means that you need to store the context state for 2 (or more) contexts simultaniously, which ends up being a lot of state to drag along. Not impossible, just rather difficult and/or expensive.

_xxx_
14-Jun-2006, 07:38
2006 Q1/Q2 or even longer (Q3)

ATI get fair price by charge of per Die base , while TSMC can practice their premature process.compared to NV fully utilise capacity of either CSM or IBM facility.

I wouldn't expect such a part to appear.

DegustatoR
14-Jun-2006, 07:50
G80(500MT+) Chip in 90nm CSM process will be less than G70 die Size.
I think you're wrong. R580 is about the same size as G70 and it has only 380M.